Skip to content

[OEP] Adding OEP for Multi-instance serving.#584

Open
shenoyvvarun wants to merge 1 commit intoome-projects:mainfrom
shenoyvvarun:vasheno/mig-support-oep
Open

[OEP] Adding OEP for Multi-instance serving.#584
shenoyvvarun wants to merge 1 commit intoome-projects:mainfrom
shenoyvvarun:vasheno/mig-support-oep

Conversation

@shenoyvvarun
Copy link
Copy Markdown
Contributor

What this PR does

-Support for Multi-instance GPU serving in OME.

  • NOTE: This OEP supports only hardware isolated Multi instance GPU serving. Other way is to support multiple inference service is via KVCached which doesn't provide isolation but solves the case where a customer wants to serve multiple InferenceService via the same DAC.

Why we need it

  • Great opportunity to sell GPUs to customers who want predictable performance of DAC but, don't want to commit to full GPU.

Fixes #

How to test

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@shenoyvvarun shenoyvvarun requested a review from slin1237 as a code owner April 23, 2026 19:42
@github-actions github-actions Bot added documentation Documentation changes oep OME Enhancement Proposal labels Apr 23, 2026
@YouNeedCryDear YouNeedCryDear self-requested a review April 24, 2026 18:39
Copy link
Copy Markdown
Collaborator

@YouNeedCryDear YouNeedCryDear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question. How is it MIG going to fit in side the current AcceleratorClass? Are we going to have MIG defined as another set of accelerator?


The main components are:

- `pkg/controller/v1beta1/inferenceservice/mig_profile.go`: derives MIG demand,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if we introduce mig_profile.go, does it means OME operator is Nvidia dependent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OME is working for mostly Nvidia accelartor but, has no lockin to a particular accelerator. I think for the first iteration we can rely on the GPU operator. But, we can make it such that its independent of gpu-operator.

4. Detect whether a node needs reconfiguration.
5. Serialize changes with a local lock file so only one change runs at a time.
6. Optionally inspect configured GPU clients before applying changes.
7. Execute `nvidia-mig-parted apply -f <config> -c <desired>` with the host root
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible we can re-use NVIDIA GPU Operator's native label-based partitioning? Instead of doing everything on our own? We probably should make it platform agnostic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, for the first iteration we will simply use the nvidia-mig-manger. I will make it clear in the doc.

resource such as `nvidia.com/mig-2g.20gb`, the controller derives the
corresponding MIG config name, selects an eligible node, persists the
assignment in `InferenceService` annotations, and updates node labels so the
node-local manager can converge the physical device state. The resulting design
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the inference service is gone do we want to persist the node labels or persist it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No its dynamically created and de-allocated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple MIG inference services are scheduled on the same node, the label should be there until the very last isvc is terminated correct?

resources after MIG reconfiguration.
- When dynamic allocation is disabled, OME does not perform node relabeling for
MIG reconfiguration and relies on preconfigured MIG nodes.
- When dynamic allocation is enabled, OME selects only nodes explicitly marked
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility that OME selects the node with MIG label but the node failed with other resource requirement such as CPU and mem?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the node selection logic needs to ensure all the resource requirements are met for a node.

from requested resource name to node configuration name.
6. Keep the design compatible with NVIDIA GPU Operator and device plugin based
clusters.
7. Provide an option to disable OME-managed dynamic allocation entirely and use
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between the OME-managed Dynamic allocation and the NVIDIA DRA on k8s?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no development on NVIDIA DRA for MIG for a year or so and last month allocation was supported as alpha. But, I believe they will not support A100 and older GPUs (I believe it makes business sense for them to support it newer GPUs). I am not sure whether this a acceptable drawback. But, rethinking, I am more leaning towards simply leveraging dra driver from nvidia.

@shenoyvvarun
Copy link
Copy Markdown
Contributor Author

shenoyvvarun commented Apr 24, 2026

High level question. How is it MIG going to fit in side the current AcceleratorClass? Are we going to have MIG defined as another set of accelerator?

I think so, MIG devices would be another accelerator profile that would allow a runtime to customize the serving of a model. Do you have any concerns?

@YouNeedCryDear
Copy link
Copy Markdown
Collaborator

High level question. How is it MIG going to fit in side the current AcceleratorClass? Are we going to have MIG defined as another set of accelerator?

I think so, MIG devices would be another accelerator profile that would allow a runtime to customize the serving of a model. Do you have any concerns?

Only concern is there will be combination explosion 🤣

Maybe we should do something on the accelerator class side as well?

Also, MIG will be only available for models takes less than 1 GPU correct? Is it possible that a model will take 4 MIGs on 4 different GPU and call it TP4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation changes oep OME Enhancement Proposal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants