This is our attempt to implement SCoRe according to google's SCoRe paper.
pip install -r requirements.txt
Use this command
python run.py --task MATH --model_variant meta-llama/Llama-3.2-3B-Instruct --data_path ./data --output_dir ./outputs --mixed_precision --no_bleu --no_rouge
https://github.com/sanowl/Self-Correcting-LLM--Reinforcement-Learning-