Skip to content

Conversation

@windreamer
Copy link
Collaborator

@windreamer windreamer commented Nov 28, 2025

Motivation

This PR resolves three critical guided decoding bugs:

  1. Segmentation faults when serving with tensor parallelism (--tp >= 2)
  2. Cross-request contamination when mixing guided and non-guided requests in the same batch
  3. Progressive state corruption after serving many structured output requests, causing old schemas to be reused indefinitely for all subsequent requests

Root Cause Analysis

1. TP Rank Race Conditions (Segmentation Fault)

When TP >= 2, GuidedDecodeMaskLayer and GuidedDecodeUpdateLayer were shared on all ranks, causing each rank to independently modify shared decoding state. This led to race conditions and memory corruption.

2. Uninitialized Bitmask Buffers (Mixed Request Contamination)

The bitmask generation logic failed to properly initialize buffers for non-guided requests in mixed batches. While pure guided or pure non-guided batches worked correctly, mixed scenarios left buffers uninitialized, causing undefined behavior and schema leakage between requests.

3. Grammar State Reuse (Progressive Schema Corruption)

Model request instances are pooled in a free list for reuse. The grammar matcher state was never cleared when returning instances to this pool. After sustained serving, requests would inherit schemas from previously processed requests, causing "stuck" structured output behavior.

Modifications

Fix 1: TP-Aware State Management

  • Eliminated cross-rank shared state by allocating an independent GrammarMatcher instance per TP rank
  • Added rank awareness to decoding pipeline so each thread operates on its own isolated matcher

Fix 2: Universal Buffer Initialization

  • Added explicit bitmask buffer initialization for all requests (both guided and non-guided)

Fix 3: Request Lifecycle Cleanup

  • Implemented grammar state reset when requests complete and return to the free list
  • Added cleanup hook to GrammarMatcher to ensure no state leakage between unrelated requests

Fixes: #4152

@windreamer windreamer self-assigned this Nov 28, 2025
@windreamer windreamer marked this pull request as ready for review November 28, 2025 07:37
@irexyc
Copy link
Collaborator

irexyc commented Nov 28, 2025

The sampled token are not broadcasted from tp_rank0, so all ranks should do the same sampling process to make sure the next token is same on all ranks.

I think the problem may be the state of GrammarMatcher can not be shared by all ranks. Currently, all ranks share the same std::shared_ptr<xgrammar::GrammarMatcher> in the request.

@windreamer
Copy link
Collaborator Author

GrammarMatcher

OK so we need to copy the GrammarMatcher instead ?

@windreamer windreamer marked this pull request as draft November 28, 2025 08:00
@windreamer
Copy link
Collaborator Author

The sampled token are not broadcasted from tp_rank0, so all ranks should do the same sampling process to make sure the next token is same on all ranks.

I think the problem may be the state of GrammarMatcher can not be shared by all ranks. Currently, all ranks share the same std::shared_ptr<xgrammar::GrammarMatcher> in the request.

I believe that a quick fix should be making GuidedDecodeUpdateLayer only executed in rank 0. As we only modify GrammarMatcher here. But I have no idea how to ensure we call GuidedDecodeUpdateLayer::Forward only when all ranks finish there sampling.

@irexyc
Copy link
Collaborator

irexyc commented Nov 28, 2025

I believe that a quick fix should be making GuidedDecodeUpdateLayer only executed in rank 0.

All ranks should have the same next token and the next token are computed by dynamic decoding, so we should make sure the dynamic decoding process are same on all ranks.

I think we can construct n_ranks of matchers here like r->matchers = ... https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/engine/model_request.cc#L131C9-L131C19

and chose the correspond matcher here like matchers_.push_back(r->matchers[rank_]);
https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/layers/sampling_layers/GuidedDecodeMaskLayer.cc#L36
https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/layers/sampling_layers/GuidedDecodeUpdateLayer.cc#L32

@windreamer windreamer force-pushed the fix_guided_decoding_tp branch from 8ada1ea to e7a7055 Compare November 28, 2025 10:05
@windreamer windreamer marked this pull request as ready for review November 28, 2025 10:11
@windreamer windreamer force-pushed the fix_guided_decoding_tp branch from e7a7055 to adb0148 Compare November 28, 2025 10:27
@windreamer windreamer requested a review from lvhan028 November 29, 2025 03:22
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Pass h_tp_group to GuidedDecodeUpdateLayer
  2. h_tp_group->Sync() here so that all ranks completed filling their host mask buffer before any rank tries to update matcher state.
  3. AcceptToken when h_tp_group->rank() == 0

In addition, a stream sync is required after the copy. need_apply as in GuidedDecodeMaskLayer is neede here to avoid the copy / sync cost when guided decoding is not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, not only AcceptToken but also FillNextTokenBitmask modify the shared matcher state. So if we need multiple times of sync and shared bit mask, that will kill the performance I believe.

So I take @irexyc 's advice to just dup the state for each thread.

@lvhan028
Copy link
Collaborator

lvhan028 commented Dec 1, 2025

Vote for making GuidedDecodeUpdateLayer only executed in rank 0 and calling GuidedDecodeUpdateLayer::Forward only when all ranks finish the sampling.
It doesn't make sense to me that ModelRequest holds a field tp_size_.

@tuilakhanh
Copy link
Contributor

If structure output request is processing, another request come in this request will response strange output.

Also after serving a large amount of request (have structure output), inference will reuse structure of a previously processed request for all request (only 1 structure) after that time.

@windreamer
Copy link
Collaborator Author

If structure output request is processing, another request come in this request will response strange output.

Also after serving a large amount of request (have structure output), inference will reuse structure of a previously processed request for all request (only 1 structure) after that time.

Could you elaborate on this a bit more? Did the server crash again? What do you mean by the strange output or reused structure?

@tuilakhanh
Copy link
Contributor

tuilakhanh commented Dec 4, 2025

Could you elaborate on this a bit more? Did the server crash again? What do you mean by the strange output or reused structure?

Server not crash.

Strange output like this:

{"id":"1","object":"chat.completion","created":1764066445,"model":"Qwen3-4B-Instruct-2507","choices":[{"index":0,"message":{"role":"assistant","content":"{\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\": {\"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!","gen_tokens":null,"reasoning_content":null,"tool_calls":[]},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":130,"total_tokens":10000,"completion_tokens":9870}}

It will happen when a request comes in while server is processing another request with structured output. After structure output request is done, server back to normal.

Another issue is reuse structure, occurs when the server processes a certain number of requests; it then starts using the structure from a previous request for all subsequent requests including both structured and unstructured requests.

@windreamer
Copy link
Collaborator Author

@tuilakhanh

I think there might be a hidden bug here: matcher states and requests are not perfectly matched. {\": is often the first token to choose in json_object while ! indicates failed update of states. I think I need more time to investigate this issue.

@windreamer windreamer force-pushed the fix_guided_decoding_tp branch 2 times, most recently from dd5c8d0 to e8c600e Compare December 4, 2025 11:25
@windreamer windreamer force-pushed the fix_guided_decoding_tp branch from e8c600e to 98a9117 Compare December 4, 2025 11:27
@windreamer
Copy link
Collaborator Author

@tuilakhanh we have fixed an issue by adding a synchronization. You might test the latest commit to verify if it fixes.

@tuilakhanh
Copy link
Contributor

screenrecord.mp4

Issue still here.

@windreamer
Copy link
Collaborator Author

screenrecord.mp4
Issue still here.

@tuilakhanh Thank you for the valuable screen recording. After analyzing it, I discovered a hidden bug in our guided decoding bitmask generation: the bitmask buffer may be left uninitialized when mixing guided and non-guided requests. While the current implementation works correctly when all requests use guided decoding (or none do), it fails to properly initialize the buffer in mixed scenarios.

I apologize for this oversight. The latest commit should resolve the issue.

@tuilakhanh
Copy link
Contributor

@tuilakhanh Thank you for the valuable screen recording. After analyzing it, I discovered a hidden bug in our guided decoding bitmask generation: the bitmask buffer may be left uninitialized when mixing guided and non-guided requests. While the current implementation works correctly when all requests use guided decoding (or none do), it fails to properly initialize the buffer in mixed scenarios.

I apologize for this oversight. The latest commit should resolve the issue.

Can comfirm issue from my screenrecord is gone with your latest commit.

@windreamer
Copy link
Collaborator Author

@tuilakhanh Thank you for the valuable screen recording. After analyzing it, I discovered a hidden bug in our guided decoding bitmask generation: the bitmask buffer may be left uninitialized when mixing guided and non-guided requests. While the current implementation works correctly when all requests use guided decoding (or none do), it fails to properly initialize the buffer in mixed scenarios.
I apologize for this oversight. The latest commit should resolve the issue.

Can comfirm issue from my screenrecord is gone with your latest commit.

Thank you for your feedback. It is a silly bug and I feel really sorry for that caused your unnecessary inconvenience. I am working on tests to ensure it doesn't regress.

@tuilakhanh
Copy link
Contributor

tuilakhanh commented Dec 10, 2025

image

Second issue is still there. It use structure of a old request for all request (or maybe input prompt). It happens when I use structured output up to a certain point.
Maybe it's overflow input prompt?

@windreamer windreamer force-pushed the fix_guided_decoding_tp branch from aa84323 to 0e26d72 Compare December 10, 2025 13:04
@windreamer windreamer force-pushed the fix_guided_decoding_tp branch from 0e26d72 to 22f4ea3 Compare December 10, 2025 13:07
@windreamer
Copy link
Collaborator Author

image Second issue is still there. It use structure of a old request for all request (or maybe input prompt). It happens when I use structured output up to a certain point. Maybe it's overflow input prompt?

No, after investigation, it's caused by reusing a model instance for later use instead of releasing it. I didn't realize the model request/instance is added to a free list and will be reused. To fix this, I added a cleanup method to reset the grammar when the request completes, in the latest commit.

@tuilakhanh
Copy link
Contributor

All my issues are resolved with the latest commit. It might be ready for review.

@windreamer windreamer force-pushed the fix_guided_decoding_tp branch from 240f8bd to c413dbe Compare December 11, 2025 04:34
@windreamer windreamer changed the title fix: fix guided decoding state corruption in turbomind when tp>1 fix: Fix Guided Decoding Crashes and State Corruption Issues Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Request cause core dump with --tp >= 2.

5 participants