[1/2] deepseek deterministic: support deterministic inference for deepseek arch models on a single GPU#12000
Conversation
|
can you add server log to demonstrate radix cache is being used? |
Sure, I added server log for |
|
I made a minor change on the det unit test: #12022 This change allow subclasses to easily override the test model. after this pr merged and you validated dsv3 works with 12000, pls add dsv3-test into unit test |
|
@zminglei Can we handle the default attention backend for dpsk here? |
… into dpsk-deterministic
|
Motivation
Part of this Issue: #10278
As part of deepseek deterministic inference support, this change ensures deterministic inference results for deepseek arch models on a single GPU.
Modifications
Currently only support
fa3andtriton, will follow up to supportflashinferbackend as well later.Accuracy Tests
sglang-ci-dsv3-test(deepseek_v3 fp8) model on a H100FA3 Backend
Without deterministic
Enable deterministic
Triton Backend
Without deterministic
Enable deterministic
DeepSeek-Coder-V2-Lite-Instruct(deepseek_v2 arch) model on a H100FA3 Backend
Without deterministic
Enable deterministic
Triton Backend
Without deterministic
Enable deterministic
Benchmarking and Profiling
Checklist