sync attention, deepseek doc#14335
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Pull request overview
This PR synchronizes and updates documentation for attention backends and DeepSeek model support. The changes focus on improving clarity, adding new deployment guides, and updating technical specifications for various hardware architectures.
Key changes:
- Updated attention backend documentation with refined FA4 specifications and removed outdated warnings
- Enhanced DeepSeek V3/R1 documentation with expanded hardware configurations, new deployment guides, and improved formatting using structured callout blocks
- Updated expert parallelism backend descriptions to use "Blackwell" instead of "SM100+" for better clarity
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| docs/index.rst | Added new documentation entries for multi-modal encoder DP and classify models; reordered references section |
| docs/basic_usage/deepseek_v3.md | Expanded hardware configurations, added deployment guides/blog links, improved documentation structure with callout blocks, and clarified MTP usage |
| docs/advanced_features/expert_parallelism.md | Updated backend descriptions to use "Blackwell" architecture name instead of "SM100+" |
| docs/advanced_features/attention_backend.md | Updated FA4 page size specifications, removed outdated FP8 KV cache warning, and streamlined speculative decoding constraints |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | **FA3 (FlashAttention 3)** | n/a | ❌ | ✅ | ✅ | ⚠️ (page_size=1 only) | | ||
| | **Triton** | n/a | ❌ | ❌ | ✅ | ⚠️ (page_size=1 only) | | ||
| | **FA4** | 128 | ❌ | ❌ | ❌ | ❌ | | ||
| | **FA4** | 1 | ❌ | ❌ | ❌ | ❌ | |
There was a problem hiding this comment.
There's an inconsistency in FA4's page size specification between the MHA and MLA tables. The MHA table (line 20) shows FA4 with page size "128", but the MLA table (line 41) shows FA4 with page size "1". Please verify which is correct and ensure consistency across both tables.
| | **FA4** | 1 | ❌ | ❌ | ❌ | ❌ | | |
| | **FA4** | 128 | ❌ | ❌ | ❌ | ❌ | |
There was a problem hiding this comment.
(its actually like this.)
| | **Quantized weights ([W4A8](https://huggingface.co/novita/Deepseek-R1-0528-W4AFP8))** | 8 x H20/100, 4 x H200 | | ||
| | **Quantized weights ([AWQ](https://huggingface.co/QuixiAI/DeepSeek-R1-0528-AWQ))** | 8 x H100/800/20 | | ||
| | | 8 x A100/A800 | | ||
| | **Quantized weights ([MXFP4](https://huggingface.co/amd/DeepSeek-R1-MXFP4-Preview))** | 8, 4 x MI355X/350X | |
There was a problem hiding this comment.
i have personally tried the w4a8 + mxfp4 combinations, so they work fine.
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
@Fridge003