[OEP] Adding OEP for Multi-instance serving.#584
[OEP] Adding OEP for Multi-instance serving.#584shenoyvvarun wants to merge 1 commit intoome-projects:mainfrom
Conversation
YouNeedCryDear
left a comment
There was a problem hiding this comment.
High level question. How is it MIG going to fit in side the current AcceleratorClass? Are we going to have MIG defined as another set of accelerator?
|
|
||
| The main components are: | ||
|
|
||
| - `pkg/controller/v1beta1/inferenceservice/mig_profile.go`: derives MIG demand, |
There was a problem hiding this comment.
Curious if we introduce mig_profile.go, does it means OME operator is Nvidia dependent?
There was a problem hiding this comment.
OME is working for mostly Nvidia accelartor but, has no lockin to a particular accelerator. I think for the first iteration we can rely on the GPU operator. But, we can make it such that its independent of gpu-operator.
| 4. Detect whether a node needs reconfiguration. | ||
| 5. Serialize changes with a local lock file so only one change runs at a time. | ||
| 6. Optionally inspect configured GPU clients before applying changes. | ||
| 7. Execute `nvidia-mig-parted apply -f <config> -c <desired>` with the host root |
There was a problem hiding this comment.
Is it possible we can re-use NVIDIA GPU Operator's native label-based partitioning? Instead of doing everything on our own? We probably should make it platform agnostic
There was a problem hiding this comment.
Ack, for the first iteration we will simply use the nvidia-mig-manger. I will make it clear in the doc.
| resource such as `nvidia.com/mig-2g.20gb`, the controller derives the | ||
| corresponding MIG config name, selects an eligible node, persists the | ||
| assignment in `InferenceService` annotations, and updates node labels so the | ||
| node-local manager can converge the physical device state. The resulting design |
There was a problem hiding this comment.
Once the inference service is gone do we want to persist the node labels or persist it?
There was a problem hiding this comment.
No its dynamically created and de-allocated
There was a problem hiding this comment.
If multiple MIG inference services are scheduled on the same node, the label should be there until the very last isvc is terminated correct?
| resources after MIG reconfiguration. | ||
| - When dynamic allocation is disabled, OME does not perform node relabeling for | ||
| MIG reconfiguration and relies on preconfigured MIG nodes. | ||
| - When dynamic allocation is enabled, OME selects only nodes explicitly marked |
There was a problem hiding this comment.
Is there a possibility that OME selects the node with MIG label but the node failed with other resource requirement such as CPU and mem?
There was a problem hiding this comment.
I mean the node selection logic needs to ensure all the resource requirements are met for a node.
| from requested resource name to node configuration name. | ||
| 6. Keep the design compatible with NVIDIA GPU Operator and device plugin based | ||
| clusters. | ||
| 7. Provide an option to disable OME-managed dynamic allocation entirely and use |
There was a problem hiding this comment.
What is the difference between the OME-managed Dynamic allocation and the NVIDIA DRA on k8s?
There was a problem hiding this comment.
There was no development on NVIDIA DRA for MIG for a year or so and last month allocation was supported as alpha. But, I believe they will not support A100 and older GPUs (I believe it makes business sense for them to support it newer GPUs). I am not sure whether this a acceptable drawback. But, rethinking, I am more leaning towards simply leveraging dra driver from nvidia.
I think so, MIG devices would be another accelerator profile that would allow a runtime to customize the serving of a model. Do you have any concerns? |
Only concern is there will be combination explosion 🤣 Maybe we should do something on the accelerator class side as well? Also, MIG will be only available for models takes less than 1 GPU correct? Is it possible that a model will take 4 MIGs on 4 different GPU and call it TP4? |
What this PR does
-Support for Multi-instance GPU serving in OME.
Why we need it
Fixes #
How to test
Checklist
make testpasses locally