Dynamic neuron growth for transformers. Grow new neurons during training instead of using fixed architectures. Preserves existing knowledge while adding new capabilities.
Trained on Qwen2.5-1.5B-Instruct, adding tool use + identity + autonomous reasoning while preserving science reasoning:
| Capability | Neural Foam | Raw Qwen 1.5B | Delta |
|---|---|---|---|
| ARC-Easy (log-likelihood) | 74.0% | 77.5% | -3.5 |
| ARC-Challenge | 65.0% | 70.0% | -5.0 |
| Tool Use | 10/10 | 0/10 | +10 |
| Custom Identity | 5/5 | 0/5 | +5 |
| Autonomous Reasoning | 5/5 | 0/5 | +5 |
Only 3.5% reasoning drop while adding 4 new capability dimensions. Standard fine-tuning on the same data drops ARC-Easy to 29.5%.
Trained model: spartan8806/chimera-v3-qwen-1.5b
Neural Foam replaces standard nn.Linear layers with GrowableLinear layers that can dynamically add neurons during training:
- Percentile-based triggers — Only the top 10% of gradient pressure triggers growth (not every layer, not every step)
- GradMax initialization — New neurons are initialized from high-gradient sources, not random
- Contrastive growth — New neurons are orthogonalized against existing ones to ensure diversity
- Neuron maturation — Young neurons get higher learning rates, can't be pruned until mature
- Growth cooldown — Minimum 200 steps between growth events (growth should be rare and deliberate)
- Memory replay — Buffer of old examples mixed into training to prevent catastrophic forgetting
The core insight: growth should be like neurogenesis, not cancer. Rare, deliberate, and targeted.
pip install neural-foam
# With training dependencies
pip install neural-foam[training]Or from source:
git clone https://github.com/spartan8806/neural-foam.git
cd neural-foam
pip install -e .Run the ablation study to verify growth vs non-growth on your hardware:
cd examples
python ablation_study.pyThis trains two models on identical data (tool use, identity, autonomy) with ARC memory replay:
- Baseline: Standard fine-tuning (growth OFF)
- Neural Foam: Growth enabled
Results saved to ablation_results/ablation_comparison.json with side-by-side metrics.
Expected runtime: ~30-40 min on RTX 3060 12GB.
from neural_foam import GrowableLinear
# Drop-in replacement for nn.Linear
layer = GrowableLinear(768, 3072, enable_replacement=True)
# During training, periodically check for growth
should_grow, source_indices = layer.check_growth()
if should_grow:
result = layer.perform_growth(source_indices)
print(f"Grew {result['born']} neurons, replaced {result['replaced']}")from neural_foam import GrowableQwen
model = GrowableQwen(
"Qwen/Qwen2.5-1.5B-Instruct",
enable_growth=True,
enable_replacement=True, # V3: recycle dead neurons
freeze_attention=True, # Only train FFN (where growth happens)
)
# In training loop:
model.update_loss(loss.item())
if step % 100 == 0:
result = model.check_and_grow()
# result = {'born': 5, 'replaced': 2, 'died': 0}from neural_foam import MemoryReplayBuffer
buffer = MemoryReplayBuffer(max_size=10000)
# Add examples you want to preserve
buffer.add("What is photosynthesis?", "Photosynthesis is...", loss=0.3)
# Mix old and new training data
mixed_batch = buffer.get_replay_batch(new_examples, replay_ratio=0.2)from neural_foam import CurriculumLearning
curriculum = CurriculumLearning()
sorted_data = curriculum.sort_by_difficulty(training_examples)
easy, medium, hard = curriculum.get_phases(training_examples)V3 (replacement ON): Best for single-domain or when starting from a fine-tuned base. Recycles dead neurons. First version to beat standard fine-tuning.
model = GrowableQwen(enable_replacement=True) # V3Grow-only (replacement OFF, default): Better for multi-domain continual learning. Never replaces existing neurons, only adds new ones.
model = GrowableQwen(enable_replacement=False) # Grow-only| Parameter | Default | Description |
|---|---|---|
gradient_percentile |
90.0 | Only top 10% gradients trigger growth |
max_growth_per_step |
10 | Neurons added per growth event |
growth_cooldown |
200 | Min steps between growth events |
maturation_age |
500 | Steps before neuron is "mature" |
enable_replacement |
False | Recycle dead neurons (V3 mode) |
Growth is triggered when all of these conditions are met:
- Cooldown passed:
steps_since_last_growth ≥ growth_cooldown(default: 200 steps) - High gradient pressure:
gradient_ema[i] > quantile(gradient_ema, 0.90)— only top 10% of neurons by gradient - Optional plateau: Loss change over last 100 steps < threshold (disabled by default)
The gradient EMA is updated per-step:
gradient_ema[i] = 0.9 × gradient_ema[i] + 0.1 × |∇w[i]|
When triggered, new neurons are initialized via GradMax:
w_new = w_source + noise
w_new = orthogonalize(w_new, existing_neurons) # contrastive growth
Young neurons (age < 500 steps) get 2× learning rate and cannot be pruned/replaced.
This ensures growth is:
- Rare (cooldown + percentile threshold)
- Targeted (high-gradient sources only)
- Diverse (orthogonalization prevents duplicates)
- RigL (Google, 2020) — gradient-based regrowth
- GradMax (2022) — SVD initialization for new neurons
- NICE (CVPR 2024) — neuron maturation
- Wanda (2024) — pruning criterion
- NeurRev (ICLR 2024) — dormant neuron prevention
The Chimera V3 model (1.5B params) trains in ~13 minutes on a single RTX 3060 12GB using bfloat16 + 8-bit Adam.
Apache 2.0