Skip to content

T-Lab-CUHKSZ/G2RPO-A

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yongxin Guo*, Wenbo Deng*, Zhenglin Cheng, Xiaoying Tang

If our project helps you, please give us a star ⭐ and cite our paper!

News

  • 08.26.2025, Code is released.

TODO

  • Release the model checkpoints
  • Release the inference and evaluation code
  • Release the training data
  • Release the training code

Overview

In this project, we

  • Investigate Guided GRPO, and provide comprehensive study of various guidance configurations.
  • Introduce G2RPO-A, an adaptive algorithm that automatically adjusts guidance length in response to the evolving training state.
Example of Guided GRPO
Guided GRPO

Environments

Model Zoo

Data

Training

G2RPO-A training

ACCELERATE_LOG_LEVEL=info \
accelerate launch \
    --config_file recipes/accelerate_configs/zero2.yaml \
    --num_processes=7 \
    src/open_r1/grpo_code_adagui.py \
    --config recipes/Qwen3-1.7B/grpo/qwen38code.yaml

Inference and Evaluation

Acknowledgement

We are grateful for the following awesome projects:

Bibliography

If you find this repository helpful for your project, please consider citing:

@article{guo2025g,
  title={G $\^{} 2$ RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance},
  author={Guo, Yongxin and Deng, Wenbo and Cheng, Zhenglin and Tang, Xiaoying},
  journal={arXiv preprint arXiv:2508.13023},
  year={2025}
}

Training script

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 src/open_r1/grpo_code_adagui.py --config recipes/Qwen3-1.7B/grpo/qwen38code.yaml

Evaluation

About

[Preprint] G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors