Training with quantized checkpoints (i.e. QAT GRPO)

Currently I don't think it is possible to use quantized checkpoints (containing megatron fakequant layers inserted via modelopt) to learn parameters in the quantized space during GRPO.

This could be done by consuming a PTQ checkpoint exported from `modelopt` or similar, or a quantized HF model (e.g. GPT-OSS) - or even a modelopt model which is then exported to the HF format. The key part is post-training in the quantized space.

I imagine loading the megatron workers is not particularly hard, and what's more likely a challenge is any refit required for passing parameters to vllm for rollout which I imagine does not expect the "fakequant" layers.

The key motivation here would be to evaluate the effectiveness of learning the quantised weights while training on the actual task rather than appending a PTQ, QAT SFT, or similar step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with quantized checkpoints (i.e. QAT GRPO) #1099

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training with quantized checkpoints (i.e. QAT GRPO) #1099

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions