fix missing output_token_logprobs when using ngram speculative decoding #10702

a4zhangfei · 2025-09-21T03:37:54Z

fix #10660

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-09-21T03:38:08Z

Summary of Changes

Hello @a4zhangfei, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug preventing the correct generation and return of output token log probabilities during lookahead speculative decoding. The changes ensure that all necessary log probability metrics are accurately calculated and appended to request objects, providing complete information for users who require logprob outputs. It also adds flexibility for developers to configure the logprob calculation method via an environment variable, specifically concerning the application of temperature scaling.

Highlights

Log Probability Fix: Resolved an issue where output_token_logprobs were missing when utilizing lookahead speculative decoding, ensuring accurate log probability reporting.
Configurable Logprob Calculation: Introduced a new environment variable, RETURN_ORIGINAL_LOGPROB, allowing control over whether log probabilities are calculated with or without temperature scaling.
Enhanced Logprob Detail: Implemented a dedicated internal method to compute and store comprehensive log probability values, including next token logprobs, top-k logprobs, and specific token ID logprobs, for accepted tokens during the decoding process.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes an issue where output_token_logprobs were missing when using lookahead speculative decoding. The changes introduce a new method _add_logprob_values to correctly calculate and append log probabilities for the generated tokens. The fix is well-targeted and the logic appears correct. I have one suggestion to improve the code's efficiency and readability.

python/sglang/srt/speculative/lookahead_utils.py

Qiaolin-Yu

Why not put this logic into lookahead_worker.py, like what eagle_worker.py did?

a4zhangfei · 2025-11-05T07:16:48Z

@Qiaolin-Yu When can we merge the code?

a4zhangfei requested review from Ying1123, kssteven418 and merrymercy as code owners September 21, 2025 03:37

a4zhangfei mentioned this pull request Sep 21, 2025

[Bug] missing output_token_logprobs when using lookahead speculative decoding #10660

Closed

5 tasks

gemini-code-assist bot reviewed Sep 21, 2025

View reviewed changes

python/sglang/srt/speculative/lookahead_utils.py Outdated Show resolved Hide resolved

zhyncs assigned Qiaolin-Yu Sep 21, 2025

zhyncs added high priority run-ci bug Something isn't working labels Sep 21, 2025

Qiaolin-Yu reviewed Sep 21, 2025

View reviewed changes

a4zhangfei force-pushed the lookahead branch from 71b6ddd to 80d55d4 Compare September 29, 2025 08:22

a4zhangfei changed the title ~~fix missing output_token_logprobs when using lookahead speculative decoding~~ fix missing output_token_logprobs when using ngram speculative decoding Sep 29, 2025

a4zhangfei added 2 commits October 9, 2025 10:23

fix miss output_token_logprobs when use lookahead

aad5326

Add header files to avoid compilation issues for ngram

6c4497f

a4zhangfei force-pushed the lookahead branch from 822e432 to 6c4497f Compare October 9, 2025 02:25

a4zhangfei added 2 commits November 4, 2025 14:57

Merge branch 'main' into lookahead

9cf5b5c

Merge branch 'main' into lookahead

beee6e3

Merge branch 'main' into lookahead

7a0f545

github-actions bot added the speculative-decoding label Nov 10, 2025

merrymercy merged commit 6f08488 into sgl-project:main Nov 10, 2025
13 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix missing output_token_logprobs when using ngram speculative decoding #10702

fix missing output_token_logprobs when using ngram speculative decoding #10702

Uh oh!

a4zhangfei commented Sep 21, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Sep 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Qiaolin-Yu left a comment

Uh oh!

a4zhangfei commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix missing output_token_logprobs when using ngram speculative decoding #10702

fix missing output_token_logprobs when using ngram speculative decoding #10702

Uh oh!

Conversation

a4zhangfei commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Sep 21, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Qiaolin-Yu left a comment

Choose a reason for hiding this comment

Uh oh!

a4zhangfei commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

a4zhangfei commented Sep 21, 2025 •

edited

Loading