[Roadmap] Intel CPU Roadmap (2025Q4)

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

Previously, we have added optimized CPU backend for SGLang for Xeon with [AMX](https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html) support, enabled Graph Mode with `torch.compile` and extend model coverage.

In 2025Q4, we will continue optimize CPU backend performance primarily focusing on production deployment:
* Small to medium sized LLMs deployment, e.g. MoE models with activated parameters less than 5B, e.g. **Qwen3-Next-80B-A3B**.
* OCR models (**DeepSeek-OCR**), ASR models (**whisper**), Multimodal models deployment.

### General Optimizations
* **Graph Mode Improvement**: combine pre-compiled batch sizes with explicit user inputs configuration to allow more flexible usage of graph runner for improving overall throughput. @CaoE 
* **Causal Conv1d Support**: add optimized kernels for mamba attention, causal conv1d and flash linear attention, for model support of Qwen3-Next. @mingfeima 
* **MXFP4 Support**: Out of box support of MXFP4 with weight only quantization, dequant MXFP4->BF16 and compute with AMX-BF16 or AVX512-BF16. for model support of GPT-OSS 20B and 120B. @mingfeima 
* **INT4 Support**: 4bit mode is more important for computationally constrained hardware e.g. CPUs. Enable awq INT4 (w4a8), https://github.com/sgl-project/sglang/pull/8225, https://github.com/sgl-project/sglang/pull/8226. @jianan-gu 
* **FP8 KV Cache Support**: enable usage of FP8 kv cache, fallback to compute with BF16. @blzheng 
* **Data Parallel Attention**: enable DP MLA. @chunyuan-w 

### Innovation
* **Software pipelining for AMX / AVX512**: double buffering for dequant and dot product with AMX / AVX512, increase flops for FP8, MXFP4 and INT4 GEMM and MoE.

### User Experience and Testing Enhancement
* **Documentation**: fulfill documentation, and provide BKMs for optimal configurations for prioritized models @ZailiWang 
* **Bug Tracking**: track bugs, and enable more proxy models in test cases @1pikachu 
* **Xeon CI**: Maintain CI stability, UT enhancement by increasing test case coverage and pass rate @1pikachu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] Intel CPU Roadmap (2025Q4) #12802

Checklist

Motivation

General Optimizations

Innovation

User Experience and Testing Enhancement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] Intel CPU Roadmap (2025Q4) #12802

Description

Checklist

Motivation

General Optimizations

Innovation

User Experience and Testing Enhancement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions