Skip to content

[Feat][Eagle]refactor verify phase rebase#616

Merged
pengchengneo merged 10 commits intosgl-project:mainfrom
primatrix:feat/refactor-verify-phase-rebase
Dec 30, 2025
Merged

[Feat][Eagle]refactor verify phase rebase#616
pengchengneo merged 10 commits intosgl-project:mainfrom
primatrix:feat/refactor-verify-phase-rebase

Conversation

@pengchengneo
Copy link
Collaborator

@pengchengneo pengchengneo commented Dec 29, 2025

In this PR, we have restructured some of the logic in Verfiy and DraftExtend to ensure that the shape of the batch remains the same at all stages (especially after Verify) throughout the Eagle Forward process, reducing the number of cases that require precompile, and fixing some bugs in the forward to improve the overall acceptance rate.

Furthermore, We are doing more fine-tuning to reduce data handling for TPU-> CPUs, and optimize kernel performance, and we will add e2e CI testing in the next PR and compare the performance without MTP.

launch server

uv run python3 -u -m sgl_jax.launch_server   --model-path /models/Qwen/Qwen3-32B   --trust-remote-code   --device=tpu  --mem-fraction-static=0.8   --max-prefill-tokens=4096   --max-running-requests=16   --log-requests   --log-requests-level=2 --log-level-http=debug   --show-time-cost --decode-log-interval=1   --enable-request-time-stats-logging   --attention-backend=fa   --dtype=bfloat16  --port 30000 --host 0.0.0.0 --disable-overlap-schedule --tp-size 4 --speculative-algorithm  EAGLE3 --speculative-draft-model-path  AngelSlim/Qwen3-32B_eagle3   --skip-server-warmup --page-size 16 --speculative-eagle-topk 1 --speculative-num-steps 3 --speculative-num-draft-tokens 4 --speculative-draft-model-revision 67caf31f9062d7ab64872e0a111d499bc16cd205

test command

uv run evalscope eval   --model Qwen/Qwen3-32B   --api-url http://127.0.0.1:30000/v1/chat/completions   --datasets gsm8k   --eval-type service    --eval-batch-size 128

Accuracy Test:
image

@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@pengchengneo pengchengneo requested review from Iamleos and aolemila and removed request for aolemila December 29, 2025 07:05
@pengchengneo pengchengneo linked an issue Dec 29, 2025 that may be closed by this pull request
2 tasks
@pengchengneo pengchengneo merged commit 5fcd005 into sgl-project:main Dec 30, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Eagle Performance Optimazation

2 participants