Skip to content

bugfix: fix mtp prefix cache prefill starvation.#1264

Open
phantomlei3 wants to merge 1 commit intojd-opensource:mainfrom
phantomlei3:bugfix/mtp-prefix-cache
Open

bugfix: fix mtp prefix cache prefill starvation.#1264
phantomlei3 wants to merge 1 commit intojd-opensource:mainfrom
phantomlei3:bugfix/mtp-prefix-cache

Conversation

@phantomlei3
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an explicit reference counting mechanism for physical blocks to improve the accuracy of usage tracking, particularly when prefix caching is enabled. It also updates the prefill scheduler to bypass memory threshold checks when using the prefix cache. The review feedback identifies critical thread-safety risks due to non-atomic updates of the new reference counters and notes a style guide violation regarding the use of auto for primitive types.

Comment thread xllm/core/framework/block/block_manager_impl.cpp Outdated
Comment thread xllm/core/framework/block/block_manager_impl.cpp Outdated
Comment thread xllm/core/framework/block/block_manager_impl.cpp Outdated
Comment thread xllm/core/scheduler/prefill_only_scheduler.cpp Outdated

// const auto block_ids = allocate(num_blocks);
num_used_blocks_.fetch_add(num_blocks, std::memory_order_relaxed);
add_seq_refs(blocks);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this differs from the current implementation.
Currently, when a block is assigned, the class's assignment operator automatically increments the reference count (ref count++). In theory, there's no difference compared to the current approach.

Block& Block::operator=(const Block& other) {
  if (this != &other) {
    dec_ref_count();

    id_ = other.id_;
    size_ = other.size_;
    manager_ = other.manager_;
    ref_count_ = other.ref_count_;

    memcpy(hash_value_, other.hash_value_, XXH3_128BITS_HASH_VALUE_LEN);

    inc_ref_count(); // <--------------------- here
  }
  return *this;
}

Under what circumstances would current implementation cause a problem?

Copy link
Copy Markdown
Collaborator Author

@phantomlei3 phantomlei3 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref_count and active_seq_refs are counting different things.

Step Holder of physical block B ref_count(B) Active sequences using B
1 Prefix cache only 1 0
2 PrefixCache::match() returns a local shared_blocks vector 2 0
3 sequence->add_kv_blocks(shared_blocks) copies B into the sequence state 3 1
4 Later allocation fails, so we call deallocate(shared_blocks) first 2 1
5 Then sequence->reset() releases the sequence-owned copy 1 0

The problem is that ref_count includes every Block handle copy:

  • the prefix-cache node
  • the temporary shared_blocks vector
  • the sequence-owned vector

But num_used_blocks_ is supposed to mean:
"how many physical blocks are still occupied by live sequences"

So in prefix-cache paths, ref_count is not a reliable proxy for sequence ownership. A temporary copy can increaseref_count even though no extra sequence is using the block. That is why the old ref_count() <= 2 heuristic can drift, while active_seq_refs_ tracks the intended quantity directly.

@phantomlei3 phantomlei3 force-pushed the bugfix/mtp-prefix-cache branch from 5f34a91 to 83b7791 Compare April 16, 2026 13:10
@phantomlei3 phantomlei3 force-pushed the bugfix/mtp-prefix-cache branch from 83b7791 to 801f372 Compare April 18, 2026 02:18
@phantomlei3 phantomlei3 reopened this Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants