Skip to content

Training with quantized checkpoints (i.e. QAT GRPO) #1099

@nathan-az

Description

@nathan-az

Currently I don't think it is possible to use quantized checkpoints (containing megatron fakequant layers inserted via modelopt) to learn parameters in the quantized space during GRPO.

This could be done by consuming a PTQ checkpoint exported from modelopt or similar, or a quantized HF model (e.g. GPT-OSS) - or even a modelopt model which is then exported to the HF format. The key part is post-training in the quantized space.

I imagine loading the megatron workers is not particularly hard, and what's more likely a challenge is any refit required for passing parameters to vllm for rollout which I imagine does not expect the "fakequant" layers.

The key motivation here would be to evaluate the effectiveness of learning the quantised weights while training on the actual task rather than appending a PTQ, QAT SFT, or similar step.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions