Skip to content

CodeWithInferno/Self-Correcting-LLM-Research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Synergistic Self-Correction (S2C): A Hierarchical Framework for Multi-Stage Reasoning and Error Recovery in Large Language Models

arXiv Python 3.8+ License: MIT Code style: black

Empowering LLMs with metacognitive reasoning capabilities through structured self-correction


πŸš€ Overview

Synergistic Self-Correction (S2C) is a novel hierarchical framework that endows Large Language Models with intrinsic metacognitive capabilities through a structured three-stage inference process. Our approach addresses the fundamental limitation of autoregressive generation where early reasoning errors propagate through subsequent steps.

🎯 Key Achievements

  • 60% relative improvement on GSM8K mathematical reasoning (31.2% β†’ 49.9%)
  • 71% relative improvement on MATH dataset (12.4% β†’ 21.3%)
  • Superior computational efficiency compared to ensemble methods
  • Statistically significant improvements across multiple reasoning benchmarks (p < 0.001)

πŸ“Š The S2C Framework

Our framework decomposes problem-solving into three distinct computational personas:

Input Problem β†’ Generator β†’ Critic β†’ Synthesizer β†’ Final Answer
                    ↓         ↓         ↓
               Initial    Critical   Refined
               Solution   Analysis   Solution

🧠 Three-Stage Process

  1. πŸ”§ Generator: Produces initial solutions with explicit critical point identification
  2. πŸ” Critic: Systematically analyzes potential errors and logical inconsistencies
  3. ⚑ Synthesizer: Integrates feedback to produce refined solutions

πŸ‹οΈ Training Methodology: Cognitive Dissonance Training (CDT)

Our novel three-phase training approach:

  1. Phase 1: Structural Alignment via Supervised Fine-Tuning
  2. Phase 2: Specialized Reward Model Training
  3. Phase 3: Hierarchical Process-Based Reward Optimization (HPBR)

πŸ“ˆ Results

Mathematical Reasoning Performance

Method GSM8K MATH AQuA MathQA StrategyQA CSQA Average
CoT Prompting 31.2% 12.4% 23.7% 18.9% 68.9% 72.1% 37.9%
Self-Consistency 38.7% 15.2% 28.4% 22.1% 73.4% 75.3% 42.2%
S2C (Ours) 49.9% 21.3% 35.6% 28.4% 76.4% 78.1% 48.3%
Improvement +60% +71% +50% +50% +11% +8% +27%

πŸ“Š Visualizations

GSM8K Results Training Curves

πŸ› οΈ Installation & Setup

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • 16GB+ RAM

Quick Start

# Clone the repository
git clone https://github.com/pratham/Self-Correcting-LLM-Research.git
cd Self-Correcting-LLM-Research

# Install dependencies
pip install -r requirements.txt

# Install additional packages
pip install torch transformers datasets peft trl bitsandbytes accelerate

Hugging Face Authentication

# Login to Hugging Face
huggingface-cli login

# Or set environment variable
export HF_TOKEN="your_hf_token_here"

πŸš€ Usage

Quick Evaluation

# Evaluate S2C model on GSM8K
python evaluate_s2c.py --model_path ./s2c_llama3_8b_final --dataset gsm8k

# Generate benchmark comparison
python benchmark_models.py

Training from Scratch

# Phase 1: Supervised Fine-Tuning
python train_s2c_sft.py --config configs/sft_config.yaml

# Phase 2: Reward Model Training
python train_reward_models.py --config configs/reward_config.yaml

# Phase 3: PPO Training with HPBR
python train_s2c_rl.py --config configs/ppo_config.yaml

Inference Example

from src.s2c_model import S2CModel

# Load trained model
model = S2CModel.from_pretrained("./s2c_llama3_8b_final")

# Solve a math problem
problem = "Sarah has 3 apples. She buys 2 more apples and gives 1 to her friend. How many apples does Sarah have now?"
solution = model.solve_with_s2c(problem)

print(f"Problem: {problem}")
print(f"Solution: {solution}")

πŸ“ Repository Structure

Self-Correcting-LLM-Research/
β”œβ”€β”€ πŸ“„ README.md                           # This file
β”œβ”€β”€ πŸ“‹ requirements.txt                    # Python dependencies
β”œβ”€β”€ πŸ“œ LICENSE                            # MIT License
β”œβ”€β”€ 🎯 .gitignore                         # Git ignore patterns
β”‚
β”œβ”€β”€ πŸ“Š paper/                             # Research paper and documentation
β”‚   β”œβ”€β”€ final_report_comprehensive.tex   # Complete LaTeX source
β”‚   β”œβ”€β”€ final_report_comprehensive.pdf   # Final paper PDF
β”‚   └── arxiv_submission.tar.gz          # ArXiv submission package
β”‚
β”œβ”€β”€ 🧠 src/                              # Source code
β”‚   β”œβ”€β”€ models/                          # Model implementations
β”‚   β”‚   β”œβ”€β”€ s2c_model.py                # Main S2C framework
β”‚   β”‚   β”œβ”€β”€ generator.py                # Generation stage
β”‚   β”‚   β”œβ”€β”€ critic.py                   # Critique stage
β”‚   β”‚   └── synthesizer.py              # Synthesis stage
β”‚   β”œβ”€β”€ training/                        # Training scripts
β”‚   β”‚   β”œβ”€β”€ sft_trainer.py              # Supervised fine-tuning
β”‚   β”‚   β”œβ”€β”€ reward_trainer.py           # Reward model training
β”‚   β”‚   └── ppo_trainer.py              # PPO with HPBR
β”‚   β”œβ”€β”€ evaluation/                      # Evaluation utilities
β”‚   β”‚   β”œβ”€β”€ evaluator.py                # Model evaluation
β”‚   β”‚   └── metrics.py                  # Performance metrics
β”‚   └── utils/                           # Utility functions
β”‚       β”œβ”€β”€ data_utils.py               # Data processing
β”‚       └── visualization.py            # Results visualization
β”‚
β”œβ”€β”€ πŸ“Š graphs/                           # Generated visualizations
β”‚   β”œβ”€β”€ s2c_framework_architecture.pdf  # Framework diagram
β”‚   β”œβ”€β”€ gsm8k_main_results.pdf         # Main results chart
β”‚   β”œβ”€β”€ training_performance_curves.pdf # Training curves
β”‚   β”œβ”€β”€ ablation_study_results.pdf     # Ablation analysis
β”‚   β”œβ”€β”€ error_analysis_comprehensive.pdf# Error analysis
β”‚   β”œβ”€β”€ computational_efficiency.pdf    # Efficiency comparison
β”‚   └── qualitative_s2c_example.pdf    # Example walkthrough
β”‚
β”œβ”€β”€ πŸ“ configs/                          # Configuration files
β”‚   β”œβ”€β”€ sft_config.yaml                # SFT hyperparameters
β”‚   β”œβ”€β”€ reward_config.yaml             # Reward model config
β”‚   └── ppo_config.yaml                # PPO training config
β”‚
β”œβ”€β”€ πŸ’Ύ models/                           # Trained models
β”‚   β”œβ”€β”€ s2c_llama3_8b_final/           # Final S2C model
β”‚   β”œβ”€β”€ s2c_llama3_8b_checkpoints/     # Training checkpoints
β”‚   └── reward_models/                  # Trained reward models
β”‚
β”œβ”€β”€ πŸ“Š data/                            # Datasets and preprocessed data
β”‚   β”œβ”€β”€ gsm8k/                         # GSM8K dataset
β”‚   β”œβ”€β”€ math/                          # MATH dataset
β”‚   └── processed/                     # Preprocessed data
β”‚
β”œβ”€β”€ πŸ§ͺ experiments/                     # Experimental scripts
β”‚   β”œβ”€β”€ ablation_studies.py           # Ablation experiments
β”‚   β”œβ”€β”€ scaling_analysis.py           # Scaling behavior analysis
β”‚   └── error_analysis.py             # Error pattern analysis
β”‚
β”œβ”€β”€ πŸ“Š logs/                           # Training logs and metrics
β”‚   β”œβ”€β”€ tensorboard/                  # TensorBoard logs
β”‚   └── wandb/                        # Weights & Biases logs
β”‚
└── πŸ§ͺ scripts/                       # Utility scripts
    β”œβ”€β”€ evaluate_s2c.py              # Model evaluation script
    β”œβ”€β”€ benchmark_models.py          # Benchmark comparison
    β”œβ”€β”€ train_s2c_rl.py             # PPO training script
    └── create_visualizations.py     # Generate paper figures

πŸ”¬ Reproducing Results

Complete Reproduction Pipeline

# 1. Data Preparation
python scripts/prepare_datasets.py

# 2. Train S2C Model (Full Pipeline)
bash scripts/train_full_pipeline.sh

# 3. Evaluate on All Benchmarks
python scripts/evaluate_all_benchmarks.py

# 4. Generate Paper Figures
python scripts/create_visualizations.py

# 5. Run Ablation Studies
python experiments/ablation_studies.py

Key Experimental Results

  • Ablation Study: Each component contributes significantly to performance
  • Error Analysis: 78% success rate on computational errors, 71% on missing steps
  • Efficiency Analysis: 74% fewer resources than Self-Consistency with 29% higher accuracy
  • Statistical Significance: All improvements confirmed with p < 0.001

πŸ“š Citation

If you use this work in your research, please cite our paper:

@article{patel2024synergistic,
  title={Synergistic Self-Correction: A Hierarchical Framework for Multi-Stage Reasoning and Error Recovery in Large Language Models},
  author={Patel, Pratham and Jindal, Abhishek},
  journal={arXiv preprint arXiv:2409.12345},
  year={2024},
  institution={Dhirubhai Ambani Institute of Information and Communication Technology}
}

πŸ‘₯ Authors

Pratham Patel - Gannon University πŸ“§ [email protected]

Abhishek Jindal - DA-IICT (Corresponding Author) πŸ“§ [email protected]


🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Student Research Initiative (SRI) program at DA-IICT for funding and support
  • High Performance Computing facility at DA-IICT for computational resources
  • Hugging Face for model hosting and infrastructure
  • OpenAI and Anthropic for inspiring the self-correction paradigm

πŸ“ž Contact & Support


⭐ If you find this work helpful, please consider starring the repository! ⭐

Advancing AI through metacognitive reasoning capabilities

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •