[Roadmap] Quantization Support

### 1. Decouple Quantization Implementation from vLLM
*Objective: Refactor the code to enhance the maintainability and extensibility of the quantization module.*

- **Weight Only Methods**
  - [x] GPTQ https://github.com/sgl-project/sglang/pull/7992 https://github.com/sgl-project/sglang/pull/8112 https://github.com/sgl-project/sglang/pull/9191
  - [x] AWQ https://github.com/sgl-project/sglang/pull/8113
  - [x] Compressed Tensors https://github.com/sgl-project/sglang/pull/10750
- [ ] MoE Quantization Optimization
- [ ] Kernel Optimization
- [ ] fbgemm fp8 https://github.com/sgl-project/sglang/pull/9454
- [ ] gguf https://github.com/sgl-project/sglang/pull/11019

### 2. Quantization on Various Hardware Platforms (Other than GPU)
*Objective: Extend sglang's efficient inference capabilities to a broader range of hardware.*

- **Ascend NPUs**
  - [x] W8A8 INT8 https://github.com/sgl-project/sglang/pull/7791
- **Intel Xeon CPUs**
  - [ ] W8A8

### 3. Non-Linear Module & Communication Quantization
*Objective: Optimize components beyond standard linear layers to further improve performance.*

- [ ] Attention
   - [ ] MLA Quantization
   - [ ] GQA/MHA Quantization
- [ ] Improved KV Cache Quantization @Wilbolu
- [ ] Communication Quantization

### 4. Support for More Features & Novel Formats
*Objective: Stay current with cutting-edge quantization techniques and data formats.*

- [x] MXFP4 Quantization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] Quantization Support #8180

1. Decouple Quantization Implementation from vLLM

2. Quantization on Various Hardware Platforms (Other than GPU)

3. Non-Linear Module & Communication Quantization

4. Support for More Features & Novel Formats

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] Quantization Support #8180

Description

1. Decouple Quantization Implementation from vLLM

2. Quantization on Various Hardware Platforms (Other than GPU)

3. Non-Linear Module & Communication Quantization

4. Support for More Features & Novel Formats

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions