Skip to content

[Roadmap] Quantization Support #8180

@AniZpZ

Description

@AniZpZ

1. Decouple Quantization Implementation from vLLM

Objective: Refactor the code to enhance the maintainability and extensibility of the quantization module.

2. Quantization on Various Hardware Platforms (Other than GPU)

Objective: Extend sglang's efficient inference capabilities to a broader range of hardware.

3. Non-Linear Module & Communication Quantization

Objective: Optimize components beyond standard linear layers to further improve performance.

  • Attention
    • MLA Quantization
    • GQA/MHA Quantization
  • Improved KV Cache Quantization @Wilbolu
  • Communication Quantization

4. Support for More Features & Novel Formats

Objective: Stay current with cutting-edge quantization techniques and data formats.

  • MXFP4 Quantization

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions