If our project helps you, please give us a star ⭐ and cite our paper!
- 08.26.2025, Code is released.
- Release the model checkpoints
- Release the inference and evaluation code
- Release the training data
- Release the training code
In this project, we
- Investigate Guided GRPO, and provide comprehensive study of various guidance configurations.
- Introduce G2RPO-A, an adaptive algorithm that automatically adjusts guidance length in response to the evolving training state.
G2RPO-A training
ACCELERATE_LOG_LEVEL=info \
accelerate launch \
--config_file recipes/accelerate_configs/zero2.yaml \
--num_processes=7 \
src/open_r1/grpo_code_adagui.py \
--config recipes/Qwen3-1.7B/grpo/qwen38code.yaml
We are grateful for the following awesome projects:
If you find this repository helpful for your project, please consider citing:
@article{guo2025g,
title={G $\^{} 2$ RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance},
author={Guo, Yongxin and Deng, Wenbo and Cheng, Zhenglin and Tang, Xiaoying},
journal={arXiv preprint arXiv:2508.13023},
year={2025}
}
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 src/open_r1/grpo_code_adagui.py --config recipes/Qwen3-1.7B/grpo/qwen38code.yaml
