GitHub - T-Lab-CUHKSZ/G2RPO-A: [Preprint] G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

Yongxin Guo, Wenbo Deng, Zhenglin Cheng, Xiaoying Tang

If our project helps you, please give us a star ⭐ and cite our paper!

News

08.26.2025, Code is released.

TODO

Release the model checkpoints
Release the inference and evaluation code
Release the training data
Release the training code

Overview

In this project, we

Investigate Guided GRPO, and provide comprehensive study of various guidance configurations.
Introduce G2RPO-A, an adaptive algorithm that automatically adjusts guidance length in response to the evolving training state.

Guided GRPO

Environments

Model Zoo

Data

Training

G2RPO-A training

ACCELERATE_LOG_LEVEL=info \
accelerate launch \
    --config_file recipes/accelerate_configs/zero2.yaml \
    --num_processes=7 \
    src/open_r1/grpo_code_adagui.py \
    --config recipes/Qwen3-1.7B/grpo/qwen38code.yaml

Inference and Evaluation

Acknowledgement

We are grateful for the following awesome projects:

Bibliography

If you find this repository helpful for your project, please consider citing:

@article{guo2025g,
  title={G $\^{} 2$ RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance},
  author={Guo, Yongxin and Deng, Wenbo and Cheng, Zhenglin and Tang, Xiaoying},
  journal={arXiv preprint arXiv:2508.13023},
  year={2025}
}

Training script

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes=7 src/open_r1/grpo_code_adagui.py --config recipes/Qwen3-1.7B/grpo/qwen38code.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
recipes		recipes
scripts		scripts
src/G2RPO-A		src/G2RPO-A
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

Yongxin Guo, Wenbo Deng, Zhenglin Cheng, Xiaoying Tang

If our project helps you, please give us a star ⭐ and cite our paper!

News

TODO

Overview

Environments

Model Zoo

Data

Training

Inference and Evaluation

Acknowledgement

Bibliography

Training script

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

Yongxin Guo*, Wenbo Deng*, Zhenglin Cheng, Xiaoying Tang

If our project helps you, please give us a star ⭐ and cite our paper!

News

TODO

Overview

Environments

Model Zoo

Data

Training

Inference and Evaluation

Acknowledgement

Bibliography

Training script

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Yongxin Guo, Wenbo Deng, Zhenglin Cheng, Xiaoying Tang

Packages