[Bug] Fixes accuracy issues caused by incorrect use of rope#13439
[Bug] Fixes accuracy issues caused by incorrect use of rope#13439Paiiiiiiiiiiiiii wants to merge 1 commit intosgl-project:mainfrom
Conversation
Summary of ChangesHello @Paiiiiiiiiiiiiii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug that was causing significant accuracy degradation in Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses an accuracy issue in the DeepSeek-V3.2-Exp model by correcting the configuration of its rotary positional embeddings (RoPE). The change to set is_neox_style=True is appropriate for this model architecture. My review includes a suggestion to parameterize this setting to enhance the reusability and maintainability of the Indexer class for other models.
|
May I ask whether the above results is the |
Motivation
During our Ruler testing, we noticed a significant drop in the score of dsv3.2_exp compared to dsv3.1-terminus. We then conducted targeted testing on niah_multikey_3 and found that these cases were almost 100% correct on the DeepSeek official API, but scored low with sglang + dsv3.2.
To address this issue, we followed DeepSeek's advice and ran their inference demo released with the model, but the problem persisted.
After they modified the inference demo, the bad cases passed; https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/commit/1938e9df3dea7218cb36c21fe8287384b99acd96
We referenced their modifications and specifically changed the rope-related code in sglang nsa_indexer.py, and the bad cases also passed.
before
after
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist