Skip to content

Added new model registry#531

Open
kapiljain1989 wants to merge 1 commit intollm-d:mainfrom
kapiljain1989:modelConfig
Open

Added new model registry#531
kapiljain1989 wants to merge 1 commit intollm-d:mainfrom
kapiljain1989:modelConfig

Conversation

@kapiljain1989
Copy link
Copy Markdown

Summary

  • Adds ModelConfig and ModelRegistry to support model-aware KV-cache indexing, enabling HMA (Hybrid Multi-head Attention) vs simple model distinction
  • Introduces AttentionGroupConfig with per-group block size and sliding window support for HMA models like DeepSeek-V3
  • Integrates model registry into Indexer and Pool so event processing can use model-specific configuration
  • Adds ModelAttentionInfo with precomputed attention metadata for efficient scoring lookups

Example
{ "modelConfigs":[ { "name":"DeepSeek-V3", "isHMA":true, "attentionGroups":[ { "groupId":0, "attentionType":"full", "blockSize":64 }, { "groupId":1, "attentionType":"sliding_window", "blockSize":64, "slidingWindowSize":4096 } ] }, { "name":"Qwen/Qwen3-8B", "isHMA":false } ] }

Default Behavior

When no modelConfigs is provided (or the list is empty), the registry treats all models as non-HMA for memory efficiency:

@github-actions github-actions bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 19, 2026
Signed-off-by: Kapil Jain <kapiljain1989@gmail.com>
@vMaroon vMaroon changed the title Added new mode registry Added new model registry Apr 19, 2026
guygir pushed a commit to guygir/llm-d-kv-cache-manager that referenced this pull request Apr 20, 2026
…quired (llm-d#531)

* Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE + fix lint

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* code review

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* update version of GIE, update prefix_disagr_decider accordingly

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix typo

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix PD for short inputs

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update docs/architecture.md

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/always_disaggr_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update pkg/plugins/profile/prefix_disagg_decider.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* updates according the PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e tests

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* changes according the pr comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix e2e test

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* add explanation about pd deciders to disagg_pd doc

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* rename always_disaggr_decider to always_disagg_decider

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Co-authored-by: Shmuel Kallner <kallner@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant