[DSv32] Move deep_gemm.get_paged_mqa_logits_metadata to init time as metadata#15040
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/gemini review |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@YAMY1234 Can you please take a look |
|
@qianlihuang Could you add GPQA & gsm8k 20 shots results in the PR desc? Thanks GSM8K: |
|
@YAMY1234 I have finished the tests and the accuracy looks fine. I've updated the PR description with the results. |
|
Thanks! But the GPQA results look higher than expected, normally the avg should be around 79.9 as reported. Do you have a clue? cc @Fridge003 |
|
@YAMY1234 The GPQA score listed in the doc likely refers to DeepSeek V3.2 exp. |
|
Thanks, LGTM overall. |
|
@YAMY1234 The GOQA for new DeepSeek v3.2 checkpoint is like ~85%, so it's expected |
|
/tag-and-rerun-ci |
|
@qianlihuang Please fix lint |
|
fixed by #15424 |
Motivation
#15025
Changes
paged_mqa_schedule_metadatatoNSAMetadata(batch-level caching).init_forward_metadata()/init_forward_metadata_capture_cuda_graph().init_forward_metadata_replay_cuda_graph().get_indexer_metadata()forwards cached tensor; indexer reuses it with fallback.Accuracy Tests
Benchmarking and Profiling
Benchmark
Before
After
Profile
Before

1765676758.7169704-TP-0.trace.json.gz
After

1765678229.1364799-TP-0.trace.json.gz
Checklist