Skip to content

spartan8806/neural-foam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Foam

Dynamic neuron growth for transformers. Grow new neurons during training instead of using fixed architectures. Preserves existing knowledge while adding new capabilities.

Results

Trained on Qwen2.5-1.5B-Instruct, adding tool use + identity + autonomous reasoning while preserving science reasoning:

Capability Neural Foam Raw Qwen 1.5B Delta
ARC-Easy (log-likelihood) 74.0% 77.5% -3.5
ARC-Challenge 65.0% 70.0% -5.0
Tool Use 10/10 0/10 +10
Custom Identity 5/5 0/5 +5
Autonomous Reasoning 5/5 0/5 +5

Only 3.5% reasoning drop while adding 4 new capability dimensions. Standard fine-tuning on the same data drops ARC-Easy to 29.5%.

Trained model: spartan8806/chimera-v3-qwen-1.5b

How It Works

Neural Foam replaces standard nn.Linear layers with GrowableLinear layers that can dynamically add neurons during training:

  1. Percentile-based triggers — Only the top 10% of gradient pressure triggers growth (not every layer, not every step)
  2. GradMax initialization — New neurons are initialized from high-gradient sources, not random
  3. Contrastive growth — New neurons are orthogonalized against existing ones to ensure diversity
  4. Neuron maturation — Young neurons get higher learning rates, can't be pruned until mature
  5. Growth cooldown — Minimum 200 steps between growth events (growth should be rare and deliberate)
  6. Memory replay — Buffer of old examples mixed into training to prevent catastrophic forgetting

The core insight: growth should be like neurogenesis, not cancer. Rare, deliberate, and targeted.

Install

pip install neural-foam

# With training dependencies
pip install neural-foam[training]

Or from source:

git clone https://github.com/spartan8806/neural-foam.git
cd neural-foam
pip install -e .

Reproducing Results

Run the ablation study to verify growth vs non-growth on your hardware:

cd examples
python ablation_study.py

This trains two models on identical data (tool use, identity, autonomy) with ARC memory replay:

  • Baseline: Standard fine-tuning (growth OFF)
  • Neural Foam: Growth enabled

Results saved to ablation_results/ablation_comparison.json with side-by-side metrics.

Expected runtime: ~30-40 min on RTX 3060 12GB.

Quick Start

Replace any nn.Linear with a growable version

from neural_foam import GrowableLinear

# Drop-in replacement for nn.Linear
layer = GrowableLinear(768, 3072, enable_replacement=True)

# During training, periodically check for growth
should_grow, source_indices = layer.check_growth()
if should_grow:
    result = layer.perform_growth(source_indices)
    print(f"Grew {result['born']} neurons, replaced {result['replaced']}")

Wrap a full Qwen model

from neural_foam import GrowableQwen

model = GrowableQwen(
    "Qwen/Qwen2.5-1.5B-Instruct",
    enable_growth=True,
    enable_replacement=True,   # V3: recycle dead neurons
    freeze_attention=True,     # Only train FFN (where growth happens)
)

# In training loop:
model.update_loss(loss.item())

if step % 100 == 0:
    result = model.check_and_grow()
    # result = {'born': 5, 'replaced': 2, 'died': 0}

Memory replay to prevent forgetting

from neural_foam import MemoryReplayBuffer

buffer = MemoryReplayBuffer(max_size=10000)

# Add examples you want to preserve
buffer.add("What is photosynthesis?", "Photosynthesis is...", loss=0.3)

# Mix old and new training data
mixed_batch = buffer.get_replay_batch(new_examples, replay_ratio=0.2)

Curriculum learning (easy to hard)

from neural_foam import CurriculumLearning

curriculum = CurriculumLearning()
sorted_data = curriculum.sort_by_difficulty(training_examples)
easy, medium, hard = curriculum.get_phases(training_examples)

V3 vs Grow-Only Mode

V3 (replacement ON): Best for single-domain or when starting from a fine-tuned base. Recycles dead neurons. First version to beat standard fine-tuning.

model = GrowableQwen(enable_replacement=True)  # V3

Grow-only (replacement OFF, default): Better for multi-domain continual learning. Never replaces existing neurons, only adds new ones.

model = GrowableQwen(enable_replacement=False)  # Grow-only

Key Parameters

Parameter Default Description
gradient_percentile 90.0 Only top 10% gradients trigger growth
max_growth_per_step 10 Neurons added per growth event
growth_cooldown 200 Min steps between growth events
maturation_age 500 Steps before neuron is "mature"
enable_replacement False Recycle dead neurons (V3 mode)

How Growth Works

Growth is triggered when all of these conditions are met:

  1. Cooldown passed: steps_since_last_growth ≥ growth_cooldown (default: 200 steps)
  2. High gradient pressure: gradient_ema[i] > quantile(gradient_ema, 0.90) — only top 10% of neurons by gradient
  3. Optional plateau: Loss change over last 100 steps < threshold (disabled by default)

The gradient EMA is updated per-step:

gradient_ema[i] = 0.9 × gradient_ema[i] + 0.1 × |∇w[i]|

When triggered, new neurons are initialized via GradMax:

w_new = w_source + noise
w_new = orthogonalize(w_new, existing_neurons)  # contrastive growth

Young neurons (age < 500 steps) get 2× learning rate and cannot be pruned/replaced.

This ensures growth is:

  • Rare (cooldown + percentile threshold)
  • Targeted (high-gradient sources only)
  • Diverse (orthogonalization prevents duplicates)

Research References

  • RigL (Google, 2020) — gradient-based regrowth
  • GradMax (2022) — SVD initialization for new neurons
  • NICE (CVPR 2024) — neuron maturation
  • Wanda (2024) — pruning criterion
  • NeurRev (ICLR 2024) — dormant neuron prevention

Hardware

The Chimera V3 model (1.5B params) trains in ~13 minutes on a single RTX 3060 12GB using bfloat16 + 8-bit Adam.

License

Apache 2.0

About

Dynamic neuron growth for transformers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages