[OEP] Adding OEP for Multi-instance serving. by shenoyvvarun · Pull Request #584 · ome-projects/ome

shenoyvvarun · 2026-04-23T19:42:28Z

What this PR does

-Support for Multi-instance GPU serving in OME.

NOTE: This OEP supports only hardware isolated Multi instance GPU serving. Other way is to support multiple inference service is via KVCached which doesn't provide isolation but solves the case where a customer wants to serve multiple InferenceService via the same DAC.

Why we need it

Great opportunity to sell GPUs to customers who want predictable performance of DAC but, don't want to commit to full GPU.

Fixes #

How to test

Checklist

Tests added/updated (if applicable)
Docs updated (if applicable)
make test passes locally

YouNeedCryDear

High level question. How is it MIG going to fit in side the current AcceleratorClass? Are we going to have MIG defined as another set of accelerator?

YouNeedCryDear · 2026-04-24T19:57:12Z

+
+The main components are:
+
+- `pkg/controller/v1beta1/inferenceservice/mig_profile.go`: derives MIG demand,


Curious if we introduce mig_profile.go, does it means OME operator is Nvidia dependent?

OME is working for mostly Nvidia accelartor but, has no lockin to a particular accelerator. I think for the first iteration we can rely on the GPU operator. But, we can make it such that its independent of gpu-operator.

YouNeedCryDear · 2026-04-24T19:59:59Z

+4. Detect whether a node needs reconfiguration.
+5. Serialize changes with a local lock file so only one change runs at a time.
+6. Optionally inspect configured GPU clients before applying changes.
+7. Execute `nvidia-mig-parted apply -f <config> -c <desired>` with the host root


Is it possible we can re-use NVIDIA GPU Operator's native label-based partitioning? Instead of doing everything on our own? We probably should make it platform agnostic

Ack, for the first iteration we will simply use the nvidia-mig-manger. I will make it clear in the doc.

YouNeedCryDear · 2026-04-24T20:02:50Z

+resource such as `nvidia.com/mig-2g.20gb`, the controller derives the
+corresponding MIG config name, selects an eligible node, persists the
+assignment in `InferenceService` annotations, and updates node labels so the
+node-local manager can converge the physical device state. The resulting design


Once the inference service is gone do we want to persist the node labels or persist it?

No its dynamically created and de-allocated

If multiple MIG inference services are scheduled on the same node, the label should be there until the very last isvc is terminated correct?

YouNeedCryDear · 2026-04-24T20:06:17Z

+  resources after MIG reconfiguration.
+- When dynamic allocation is disabled, OME does not perform node relabeling for
+  MIG reconfiguration and relies on preconfigured MIG nodes.
+- When dynamic allocation is enabled, OME selects only nodes explicitly marked


Is there a possibility that OME selects the node with MIG label but the node failed with other resource requirement such as CPU and mem?

I mean the node selection logic needs to ensure all the resource requirements are met for a node.

YouNeedCryDear · 2026-04-24T20:09:17Z

+   from requested resource name to node configuration name.
+6. Keep the design compatible with NVIDIA GPU Operator and device plugin based
+   clusters.
+7. Provide an option to disable OME-managed dynamic allocation entirely and use


What is the difference between the OME-managed Dynamic allocation and the NVIDIA DRA on k8s?

There was no development on NVIDIA DRA for MIG for a year or so and last month allocation was supported as alpha. But, I believe they will not support A100 and older GPUs (I believe it makes business sense for them to support it newer GPUs). I am not sure whether this a acceptable drawback. But, rethinking, I am more leaning towards simply leveraging dra driver from nvidia.

shenoyvvarun · 2026-04-24T21:13:47Z

High level question. How is it MIG going to fit in side the current AcceleratorClass? Are we going to have MIG defined as another set of accelerator?

I think so, MIG devices would be another accelerator profile that would allow a runtime to customize the serving of a model. Do you have any concerns?

YouNeedCryDear · 2026-04-24T23:12:28Z

High level question. How is it MIG going to fit in side the current AcceleratorClass? Are we going to have MIG defined as another set of accelerator?

I think so, MIG devices would be another accelerator profile that would allow a runtime to customize the serving of a model. Do you have any concerns?

Only concern is there will be combination explosion 🤣

Maybe we should do something on the accelerator class side as well?

Also, MIG will be only available for models takes less than 1 GPU correct? Is it possible that a model will take 4 MIGs on 4 different GPU and call it TP4?

[OEP] Adding OEP for Multi-instance serving.

20b77c4

shenoyvvarun requested a review from slin1237 as a code owner April 23, 2026 19:42

github-actions Bot added documentation Documentation changes oep OME Enhancement Proposal labels Apr 23, 2026

YouNeedCryDear self-requested a review April 24, 2026 18:39

YouNeedCryDear reviewed Apr 24, 2026

View reviewed changes


		The main components are:

		- `pkg/controller/v1beta1/inferenceservice/mig_profile.go`: derives MIG demand,

Conversation

shenoyvvarun commented Apr 23, 2026

What this PR does

Why we need it

How to test

Checklist

Uh oh!

YouNeedCryDear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shenoyvvarun commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YouNeedCryDear commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shenoyvvarun commented Apr 24, 2026 •

edited

Loading