Synergistic Self-Correction (S2C): A Hierarchical Framework for Multi-Stage Reasoning and Error Recovery in Large Language Models
Synergistic Self-Correction (S2C) is a novel hierarchical framework that endows Large Language Models with intrinsic metacognitive capabilities through a structured three-stage inference process. Our approach addresses the fundamental limitation of autoregressive generation where early reasoning errors propagate through subsequent steps.
- 60% relative improvement on GSM8K mathematical reasoning (31.2% β 49.9%)
- 71% relative improvement on MATH dataset (12.4% β 21.3%)
- Superior computational efficiency compared to ensemble methods
- Statistically significant improvements across multiple reasoning benchmarks (p < 0.001)
Our framework decomposes problem-solving into three distinct computational personas:
Input Problem β Generator β Critic β Synthesizer β Final Answer
β β β
Initial Critical Refined
Solution Analysis Solution
- π§ Generator: Produces initial solutions with explicit critical point identification
- π Critic: Systematically analyzes potential errors and logical inconsistencies
- β‘ Synthesizer: Integrates feedback to produce refined solutions
Our novel three-phase training approach:
- Phase 1: Structural Alignment via Supervised Fine-Tuning
- Phase 2: Specialized Reward Model Training
- Phase 3: Hierarchical Process-Based Reward Optimization (HPBR)
| Method | GSM8K | MATH | AQuA | MathQA | StrategyQA | CSQA | Average |
|---|---|---|---|---|---|---|---|
| CoT Prompting | 31.2% | 12.4% | 23.7% | 18.9% | 68.9% | 72.1% | 37.9% |
| Self-Consistency | 38.7% | 15.2% | 28.4% | 22.1% | 73.4% | 75.3% | 42.2% |
| S2C (Ours) | 49.9% | 21.3% | 35.6% | 28.4% | 76.4% | 78.1% | 48.3% |
| Improvement | +60% | +71% | +50% | +50% | +11% | +8% | +27% |
- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
# Clone the repository
git clone https://github.com/pratham/Self-Correcting-LLM-Research.git
cd Self-Correcting-LLM-Research
# Install dependencies
pip install -r requirements.txt
# Install additional packages
pip install torch transformers datasets peft trl bitsandbytes accelerate# Login to Hugging Face
huggingface-cli login
# Or set environment variable
export HF_TOKEN="your_hf_token_here"# Evaluate S2C model on GSM8K
python evaluate_s2c.py --model_path ./s2c_llama3_8b_final --dataset gsm8k
# Generate benchmark comparison
python benchmark_models.py# Phase 1: Supervised Fine-Tuning
python train_s2c_sft.py --config configs/sft_config.yaml
# Phase 2: Reward Model Training
python train_reward_models.py --config configs/reward_config.yaml
# Phase 3: PPO Training with HPBR
python train_s2c_rl.py --config configs/ppo_config.yamlfrom src.s2c_model import S2CModel
# Load trained model
model = S2CModel.from_pretrained("./s2c_llama3_8b_final")
# Solve a math problem
problem = "Sarah has 3 apples. She buys 2 more apples and gives 1 to her friend. How many apples does Sarah have now?"
solution = model.solve_with_s2c(problem)
print(f"Problem: {problem}")
print(f"Solution: {solution}")Self-Correcting-LLM-Research/
βββ π README.md # This file
βββ π requirements.txt # Python dependencies
βββ π LICENSE # MIT License
βββ π― .gitignore # Git ignore patterns
β
βββ π paper/ # Research paper and documentation
β βββ final_report_comprehensive.tex # Complete LaTeX source
β βββ final_report_comprehensive.pdf # Final paper PDF
β βββ arxiv_submission.tar.gz # ArXiv submission package
β
βββ π§ src/ # Source code
β βββ models/ # Model implementations
β β βββ s2c_model.py # Main S2C framework
β β βββ generator.py # Generation stage
β β βββ critic.py # Critique stage
β β βββ synthesizer.py # Synthesis stage
β βββ training/ # Training scripts
β β βββ sft_trainer.py # Supervised fine-tuning
β β βββ reward_trainer.py # Reward model training
β β βββ ppo_trainer.py # PPO with HPBR
β βββ evaluation/ # Evaluation utilities
β β βββ evaluator.py # Model evaluation
β β βββ metrics.py # Performance metrics
β βββ utils/ # Utility functions
β βββ data_utils.py # Data processing
β βββ visualization.py # Results visualization
β
βββ π graphs/ # Generated visualizations
β βββ s2c_framework_architecture.pdf # Framework diagram
β βββ gsm8k_main_results.pdf # Main results chart
β βββ training_performance_curves.pdf # Training curves
β βββ ablation_study_results.pdf # Ablation analysis
β βββ error_analysis_comprehensive.pdf# Error analysis
β βββ computational_efficiency.pdf # Efficiency comparison
β βββ qualitative_s2c_example.pdf # Example walkthrough
β
βββ π configs/ # Configuration files
β βββ sft_config.yaml # SFT hyperparameters
β βββ reward_config.yaml # Reward model config
β βββ ppo_config.yaml # PPO training config
β
βββ πΎ models/ # Trained models
β βββ s2c_llama3_8b_final/ # Final S2C model
β βββ s2c_llama3_8b_checkpoints/ # Training checkpoints
β βββ reward_models/ # Trained reward models
β
βββ π data/ # Datasets and preprocessed data
β βββ gsm8k/ # GSM8K dataset
β βββ math/ # MATH dataset
β βββ processed/ # Preprocessed data
β
βββ π§ͺ experiments/ # Experimental scripts
β βββ ablation_studies.py # Ablation experiments
β βββ scaling_analysis.py # Scaling behavior analysis
β βββ error_analysis.py # Error pattern analysis
β
βββ π logs/ # Training logs and metrics
β βββ tensorboard/ # TensorBoard logs
β βββ wandb/ # Weights & Biases logs
β
βββ π§ͺ scripts/ # Utility scripts
βββ evaluate_s2c.py # Model evaluation script
βββ benchmark_models.py # Benchmark comparison
βββ train_s2c_rl.py # PPO training script
βββ create_visualizations.py # Generate paper figures
# 1. Data Preparation
python scripts/prepare_datasets.py
# 2. Train S2C Model (Full Pipeline)
bash scripts/train_full_pipeline.sh
# 3. Evaluate on All Benchmarks
python scripts/evaluate_all_benchmarks.py
# 4. Generate Paper Figures
python scripts/create_visualizations.py
# 5. Run Ablation Studies
python experiments/ablation_studies.py- Ablation Study: Each component contributes significantly to performance
- Error Analysis: 78% success rate on computational errors, 71% on missing steps
- Efficiency Analysis: 74% fewer resources than Self-Consistency with 29% higher accuracy
- Statistical Significance: All improvements confirmed with p < 0.001
If you use this work in your research, please cite our paper:
@article{patel2024synergistic,
title={Synergistic Self-Correction: A Hierarchical Framework for Multi-Stage Reasoning and Error Recovery in Large Language Models},
author={Patel, Pratham and Jindal, Abhishek},
journal={arXiv preprint arXiv:2409.12345},
year={2024},
institution={Dhirubhai Ambani Institute of Information and Communication Technology}
}Pratham Patel - Gannon University π§ [email protected]
Abhishek Jindal - DA-IICT (Corresponding Author) π§ [email protected]
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Student Research Initiative (SRI) program at DA-IICT for funding and support
- High Performance Computing facility at DA-IICT for computational resources
- Hugging Face for model hosting and infrastructure
- OpenAI and Anthropic for inspiring the self-correction paradigm
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π§ Email: [email protected]
β If you find this work helpful, please consider starring the repository! β
Advancing AI through metacognitive reasoning capabilities