bugfix: fix mtp prefix cache prefill starvation. by phantomlei3 · Pull Request #1264 · jd-opensource/xllm

phantomlei3 · 2026-04-13T02:44:19Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces an explicit reference counting mechanism for physical blocks to improve the accuracy of usage tracking, particularly when prefix caching is enabled. It also updates the prefill scheduler to bypass memory threshold checks when using the prefix cache. The review feedback identifies critical thread-safety risks due to non-atomic updates of the new reference counters and notes a style guide violation regarding the use of auto for primitive types.

yq33victor · 2026-04-14T03:15:06Z


  // const auto block_ids = allocate(num_blocks);
-  num_used_blocks_.fetch_add(num_blocks, std::memory_order_relaxed);
+  add_seq_refs(blocks);


what's this differs from the current implementation.
Currently, when a block is assigned, the class's assignment operator automatically increments the reference count (ref count++). In theory, there's no difference compared to the current approach.

Block& Block::operator=(const Block& other) { if (this != &other) { dec_ref_count(); id_ = other.id_; size_ = other.size_; manager_ = other.manager_; ref_count_ = other.ref_count_; memcpy(hash_value_, other.hash_value_, XXH3_128BITS_HASH_VALUE_LEN); inc_ref_count(); // <--------------------- here } return *this; }

Under what circumstances would current implementation cause a problem?

ref_count and active_seq_refs are counting different things.

Step Holder of physical block B ref_count(B) Active sequences using B

1 Prefix cache only 1 0

2 PrefixCache::match() returns a local shared_blocks vector 2 0

3 sequence->add_kv_blocks(shared_blocks) copies B into the sequence state 3 1

4 Later allocation fails, so we call deallocate(shared_blocks) first 2 1

5 Then sequence->reset() releases the sequence-owned copy 1 0

The problem is that ref_count includes every Block handle copy:

the prefix-cache node

the temporary shared_blocks vector

the sequence-owned vector

But num_used_blocks_ is supposed to mean:
"how many physical blocks are still occupied by live sequences"

So in prefix-cache paths, ref_count is not a reliable proxy for sequence ownership. A temporary copy can increaseref_count even though no extra sequence is using the block. That is why the old ref_count() <= 2 heuristic can drift, while active_seq_refs_ tracks the intended quantity directly.

phantomlei3 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners April 13, 2026 02:44

phantomlei3 mentioned this pull request Apr 13, 2026

[Bug]: MTP with enable_schedule_overlap=true causes overflows #1126

Open

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

Comment thread xllm/core/framework/block/block_manager_impl.cpp Outdated

Comment thread xllm/core/framework/block/block_manager_impl.cpp Outdated

Comment thread xllm/core/framework/block/block_manager_impl.cpp Outdated

RobbieLeung reviewed Apr 13, 2026

View reviewed changes

Comment thread xllm/core/scheduler/prefill_only_scheduler.cpp Outdated

yq33victor reviewed Apr 14, 2026

View reviewed changes

phantomlei3 force-pushed the bugfix/mtp-prefix-cache branch from 5f34a91 to 83b7791 Compare April 16, 2026 13:10

phantomlei3 closed this Apr 18, 2026

phantomlei3 force-pushed the bugfix/mtp-prefix-cache branch from 83b7791 to 801f372 Compare April 18, 2026 02:18

bugfix: fix mtp prefix cache prefill starvation.

d741666

phantomlei3 reopened this Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: fix mtp prefix cache prefill starvation.#1264

bugfix: fix mtp prefix cache prefill starvation.#1264
phantomlei3 wants to merge 1 commit intojd-opensource:mainfrom
phantomlei3:bugfix/mtp-prefix-cache

phantomlei3 commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yq33victor Apr 14, 2026

Uh oh!

phantomlei3 Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Step	Holder of physical block B	ref_count(B)	Active sequences using B
1	Prefix cache only	1	0
2	PrefixCache::match() returns a local shared_blocks vector	2	0
3	sequence->add_kv_blocks(shared_blocks) copies B into the sequence state	3	1
4	Later allocation fails, so we call deallocate(shared_blocks) first	2	1
5	Then sequence->reset() releases the sequence-owned copy	1	0

Conversation

phantomlei3 commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yq33victor Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

phantomlei3 Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phantomlei3 Apr 14, 2026 •

edited

Loading