Skip to content

[Roadmap] Intel CPU Roadmap (2025Q4) #12802

@mingfeima

Description

@mingfeima

Checklist

Motivation

Previously, we have added optimized CPU backend for SGLang for Xeon with AMX support, enabled Graph Mode with torch.compile and extend model coverage.

In 2025Q4, we will continue optimize CPU backend performance primarily focusing on production deployment:

  • Small to medium sized LLMs deployment, e.g. MoE models with activated parameters less than 5B, e.g. Qwen3-Next-80B-A3B.
  • OCR models (DeepSeek-OCR), ASR models (whisper), Multimodal models deployment.

General Optimizations

  • Graph Mode Improvement: combine pre-compiled batch sizes with explicit user inputs configuration to allow more flexible usage of graph runner for improving overall throughput. @CaoE
  • Causal Conv1d Support: add optimized kernels for mamba attention, causal conv1d and flash linear attention, for model support of Qwen3-Next. @mingfeima
  • MXFP4 Support: Out of box support of MXFP4 with weight only quantization, dequant MXFP4->BF16 and compute with AMX-BF16 or AVX512-BF16. for model support of GPT-OSS 20B and 120B. @mingfeima
  • INT4 Support: 4bit mode is more important for computationally constrained hardware e.g. CPUs. Enable awq INT4 (w4a8), [CPU][INT4] Add AWQ frontend support for CPU  #8225, [CPU][INT4] Add INT4 kernels for CPU  #8226. @jianan-gu
  • FP8 KV Cache Support: enable usage of FP8 kv cache, fallback to compute with BF16. @blzheng
  • Data Parallel Attention: enable DP MLA. @chunyuan-w

Innovation

  • Software pipelining for AMX / AVX512: double buffering for dequant and dot product with AMX / AVX512, increase flops for FP8, MXFP4 and INT4 GEMM and MoE.

User Experience and Testing Enhancement

  • Documentation: fulfill documentation, and provide BKMs for optimal configurations for prioritized models @ZailiWang
  • Bug Tracking: track bugs, and enable more proxy models in test cases @1pikachu
  • Xeon CI: Maintain CI stability, UT enhancement by increasing test case coverage and pass rate @1pikachu

Metadata

Metadata

Assignees

No one assigned

    Labels

    cpucpu backend performance optimizationintel

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions