"What if a Transformer had a metabolism?"
This is a research fork of Nanochat that replaces standard static weights with computational analogs of synaptic proteins, implementing biologically-grounded mechanisms for working memory, attention modulation, and neural architecture search.
Standard LLMs are "frozen crystals"βstatic matrices of float16 numbers that never change once training is done. Bio-Inspired Nanochat is a "living fluid". Its connections grow, shrink, fatigue, recover, and even reproduce during inference, mimicking the energy-constrained efficiency of the biological brain.
This is an active research project implementing 11+ bio-inspired mechanisms with systematic evaluation and optimization. See our comprehensive planning documents:
- π Full Roadmap - 69 tasks across 7 epics (Beads tracker)
- 𧬠CMA-ES Optimization Plan - Systematic hyperparameter tuning for 48 parameters
- π― Feature Predictions - Evidence-based analysis of which mechanisms will work
- π New Features Roadmap - Detailed specs for upcoming mechanisms
Implementation Status:
- β Core Synaptic Mechanisms (Presynaptic, Postsynaptic, Structural) - Fully implemented
- β Triton GPU Kernels - 375-line fused presynaptic kernel
- β Rust CPU Kernels - PyO3-based native implementation (50-90% complete)
- π§ Extended Bio Features - Stochastic release, BDNF, dual weights (in progress)
- π§ Systematic Optimization - CMA-ES framework for 48 hyperparameters (planned)
- π§ Rigorous Evaluation - Bio vs vanilla benchmarks with statistical testing (planned)
| Feature | Standard Transformer | Bio-Inspired Nanochat |
|---|---|---|
| Weights | π§ Static: Fixed after training. | π Fluid: Evolve in real-time during inference. |
| Memory | π Context Window: Limited by seq_len. |
π§ Associative: Fast-weights "remember" patterns locally. |
| Diversity | π² Randomness: Temperature sampling. | π Metabolism: Synapses "tire out", forcing new paths. |
| Capacity | ποΈ Fixed: Pre-allocated size (e.g., 32 layers). | ποΈ Elastic: Experts multiply/die based on demand. |
| Learning | π« Offline: Only learns during Backprop. | β‘ Online: "Learns" context via Hebbian consolidation. |
| Optimization | π― Grid Search: Manual hyperparameter tuning. | 𧬠Evolution: CMA-ES optimizes 48 parameters systematically. |
| Kernels | π Python/CUDA: Single backend. | β‘ Multi-Backend: Triton GPU + Rust CPU + Python reference. |
We map specific cellular mechanisms from the Synaptic Cleft directly to tensor operations. This architecture is grounded in neuroscience literature and the blueprints found in prompts/.
The mechanism of "Fatigue" and "Boredom"
The Biology: Neurons run on batteries (ATP). If a neuron shouts too much (fires continuously), it runs out of neurotransmitter vesicles (chemical ammo). It must rest to reload.
The Math: We track a fluid reservoir RRP (Readily Releasable Pool) for every attention head. High attention scores drain the pool.
The Effect: A physically-grounded frequency penalty. The model literally cannot attend to the same token endlessly. It gets "bored" (depleted) and naturally shifts focus to novel information.
Implementation: Three backends for production use:
- Triton GPU Kernel (
bio_inspired_nanochat/kernels/presyn_fused.py): 375-line fused kernel, 3 passes over attention - Rust CPU Kernel (
rust_src/src/presyn.rs): PyO3-native implementation for CPU inference - Python Reference (
tests/test_rust_kernels.py): 130-line pure Python for validation
graph LR
A[Logits] -->|Drive| B(Calcium Influx)
B -->|Activates| C{Synaptotagmin Sensor}
D[Vesicle Pool] -->|Limits| E(Release Probability)
C -->|Gates| E
E -->|Attenuates| A
E -->|Consumes| D
style D fill:#ff9999,stroke:#333,stroke-width:2px
The mechanism of "Working Memory"
The Biology: "Neurons that fire together, wire together." A transient thought becomes a memory only if it is important (high activity) and the brain has energy to "write" it down (Consolidation).
The Math: Weights are split into
The Effect: Infinite local context. The model can define a variable at the start of a sentence and "remember" it at the end via the fast weights, without needing to attend back to it.
New Mechanisms (in progress):
- BDNF Metaplasticity: Activity-dependent learning rate modulation (90% implemented!)
- CaMKII/PP1 Bistable Latch: Hysteretic consolidation gate with self-excitation
- Dual-Weight Differentiation: Separate timescales for fast cache vs slow storage
The mechanism of "Economy & Efficiency"
The Biology: The brain is a ruthlessly efficient economy. It doesn't keep billions of idle neurons on payroll. Useful regions get more resources (Neurogenesis); idle regions are demolished (Pruning).
The Math: A Synaptic Mixture-of-Experts (MoE) where experts have a "Bank Account" (Energy).
- Taxation: Every forward pass costs Energy.
- Income: Being routed to earns Energy.
-
Bankruptcy: Experts with
$E \approx 0$ are killed (Merged). - IPO: Wealthy, overworked experts clone themselves (Split).
The Effect: Neural Architecture Search. The model starts small and grows capacity exactly where the data complexity demands it.
graph TD
Start((Birth)) --> Healthy[π’ Healthy Expert]
Healthy -->|High Usage + Energy| Split{β‘ Split?}
Split -->|Yes| Clones[Clone into 2 Experts]
Healthy -->|Low Usage| Starving[π΄ Starving Expert]
Starving -->|Energy < 0| Merge{π Merge?}
Merge -->|Yes| Absorb[Absorbed by Stronger Neighbor]
Clones --> Healthy
Absorb --> Healthy
Beyond the core mechanisms, we're systematically implementing 11 additional biologically-grounded features:
- Stochastic Vesicle Release - Binomial/Gumbel-Sigmoid stochastic path with STE for training
- Vesicle Endocytosis Ring Buffer - Delayed refill with optional Rab5/7 staging
- Septin-Style Lateral Inhibition - Windowed inhibition on logits/router for sharpening
- Rab/SNARE Code-Based Routing - Token cargo codes vs expert t-SNARE compatibility
- Doc2 Dual Sync/Async Channels - Parallel Syt1 (fast) and Doc2 (slow) release paths
- Synaptic Genome Embedding - Low-dim Xi per expert decoded to kinetic parameters
- CaMKII/PP1 Bistable Latch - Hill-term ODE with hysteresis for consolidation
- Cellular Automata Initialization - Rule 30/116 variance-corrected weight init
Synaptic Genome Embedding (Xi): Each MoE expert owns a compact genome vector Xi (size SynapticConfig.xi_dim). A decoder maps Xi β phenotype scalars that control expert-specific kinetics (e.g., metabolism EMA rates and CaMKII/PP1 plasticity gains). This keeps per-expert learnable parameters at O(num_experts Β· xi_dim) rather than O(num_experts Β· num_kinetics) if every expert had its own full kinetic parameter set.
- Cross-Pollination with Gauge-Reversible Networks - Integration of measure-preserving ideas
- Simplicial/Higher-Order Attention - k-body interactions beyond pairwise
- Ultrametric Routing - Hierarchical expert organization
Each feature is:
- π Documented with biological rationale, implementation plan, and success criteria
- π§ͺ Testable via ablation studies and statistical validation
- βοΈ Toggleable via
SynapticConfigflags for clean experiments - π Benchmarked against vanilla transformers with rigorous metrics
See NEW_RADICALLY_NEW_BIO_INSPIRED_FEATURES_TO_ADD_IN_MODULAR_WAY.md for detailed specifications.
For the researchers, here are the governing equations implemented in synaptic.py and neuroscore.py.
Calcium
The probability
Where
The actual synaptic weight
This non-linear clamping is what physically enforces the frequency penalty.
The postsynaptic weight update follows a gated Hebbian rule. We maintain low-rank eligibility traces
The gate opens only when CaMKII (Write signal) > PP1 (Erase signal).
In neuroscore.py, we calculate the evolutionary fitness of each expert using three metrics:
- Efficiency: Performance per unit of metabolic cost. $$ \text{Eff}_i = \frac{\text{Contribution}_i}{\text{Energy}_i + \epsilon} $$
- Specialization: How unique is the expert's input distribution compared to the global average? $$ \text{Spec}i = 1 - \cos(\mu{expert}, \mu_{global}) $$
- Resilience: Stability of the expert's contribution over time (inverse variance). $$ \text{Res}_i = \frac{1}{\text{Var}(\text{Contribution}_i) + \epsilon} $$
Experts with high NeuroScores are cloned (Split); those with low scores are cannibalized (Merge).
Manually tuning 48 interacting biological hyperparameters (time constants, enzyme affinities, energy costs) is intractable for humans. We employ CMA-ES (Covariance Matrix Adaptation Evolution Strategy) for systematic, derivative-free optimization.
Our parameter space includes:
- 10 Calcium Dynamics Parameters (tau_c, alpha_ca, buffering rates, etc.)
- 12 Vesicle Trafficking Parameters (RRP refill, priming, endocytosis rates)
- 8 Postsynaptic Plasticity Parameters (Hebbian gains, CaMKII/PP1, BDNF)
- 6 Structural Plasticity Parameters (energy costs, split/merge thresholds)
- 12 Rust Kernel Compatibility Parameters (tau_buf, tau_prime, etc.)
These parameters interact non-linearly across:
- Multiple timescales (ms to seconds)
- Competing objectives (quality vs performance)
- Stochastic dynamics (vesicle release noise)
Phase 1: Critical Parameters (10D, ~$500) Focus on the top-10 most influential parameters identified via sensitivity analysis:
tau_rrp_log- Vesicle refill timescalelambda_loge- Eligibility trace decaycamkii_up_log- LTP strengthpp1_up_log- LTD strengthenergy_cost_rel_log- Metabolic taxation- (Plus 5 more... see full plan)
Phase 2: Subgroup Searches (38D staged, ~$2000) With Phase 1 winners fixed, optimize subgroups in parallel:
- Calcium Group (8 params): Buffering, sensor kinetics
- Vesicle Group (9 params): Priming, endocytosis, SNARE
- Postsynaptic Group (7 params): Hebbian, BDNF, CaMKII/PP1
- Structural Group (8 params): Energy, health, routing
- Kernel Compat Group (6 params): Rust-specific parameters
Objective Function: Multi-objective composite balancing:
- Quality (70%): Perplexity, NIAH accuracy, calibration (ECE)
- Performance (30%): Tokens/sec, memory efficiency
See PLAN_TO_USE_CMAES_FOR_HYPERPARAMETER_EXPLORATION_AND_OPTIMIZATION_ACROSS_ALL_BIO_INSPIRED_FEATURES.md for the complete 15,000-word plan including:
- Detailed parameter inventory with biological justification
- Search space design and encoding strategies
- Fast proxy objective with learning-curve extrapolation
- Distributed evaluation harness design
- Budget tracking and go/no-go checkpoints
- Risk mitigation and sensitivity analysis
# (Recommended) Sanity gate before expensive runs
uv run python -m scripts.tune_bio_params sanity --seed 1 --device cpu
# Phase 1: Optimize top-10 parameters (10D)
uv run python -m scripts.tune_bio_params optimize \
--seed 1337 --device cuda --generations 50 --popsize 10 \
--run-dir runs/cmaes/top10
# Resume from the latest checkpoint
uv run python -m scripts.tune_bio_params optimize --run-dir runs/cmaes/top10 --resume
# Stagnation / early-stop policy (defaults: 20 gens, <1% improvement, action=stop)
uv run python -m scripts.tune_bio_params optimize \
--run-dir runs/cmaes/top10 --stagnation-action sigma_resetThis will:
- β
Support
torchrun --distributedfor multi-GPU population eval (rank0 controller) - β
Save
progress.jsonl,best_params.json, andes_latest.pkl(+ per-gen checkpoints) under--run-dir - β
Log scalars/histograms/covariance heatmap to TensorBoard under
--run-dir/tb/
Bio-Inspired Nanochat is optimized for dual RTX 4090 training/inference with three kernel backends:
-
Triton GPU Kernels (Production)
- Location:
bio_inspired_nanochat/kernels/presyn_fused.py - 375-line fused presynaptic dynamics kernel
- 3 passes over attention (optimization opportunity identified)
- FlexAttention compatibility for O(N) memory vs O(NΒ²)
- Location:
-
Rust CPU Kernels (Production)
- Location:
rust_src/src/presyn.rs,rust_src/src/moe.rs - PyO3-based native extensions
- Type-safe with explicit dimensionality checks
- Fallback for CPU-only deployment
- Location:
-
Python Reference (Validation)
- Location:
tests/test_rust_kernels.py::presyn_step_python_ref - 130-line pure implementation
- Used for kernel correctness testing
- Location:
Our dual-4090 optimization roadmap includes:
- π§ FlexAttention/FlashAttention Evaluation - Compare SDPA vs FlexAttention vs FlashAttn2/3
- π§ NCCL/P2P Tuning - Optimize DDP for PCIe (no NVLink) with bucket sizes and grad overlap
- π§ Memory Optimizations - bf16, activation checkpointing, torch.compile modes
- π§ Triton Kernel Fusion - Reduce 3-pass to single-pass attention
- π§ Inference Fastpath - KV cache reuse + cudagraphs for steady-state decode
- π§ CI Performance Guardrails - Automated regression testing
Target: 90%+ GPU utilization on dual 4090s for both training and inference.
We're implementing systematic bio vs vanilla evaluation with statistical rigor:
- Benchmark matrix design:
docs/eval_benchmark_matrix.md - Standardized run harness:
python -m scripts.eval_matrix --help
Quality Metrics:
- Perplexity - Validation loss on FineWeb-Edu
- Long-Context - Needle-in-a-Haystack (NIAH) accuracy at 4k/8k tokens
- Calibration - Expected Calibration Error (ECE)
- MoE Health - Expert specialization (Gini), dead expert fraction
- Memory - Associative recall on synthetic tasks
Performance Metrics:
- Training - Tokens/sec, GPU utilization, peak memory
- Inference - Latency (prompt + decode), throughput, KV cache efficiency
- Configs: Vanilla GPT, bio-all, per-feature toggles (11 ablations)
- Seeds: 2-3 seeds per config for statistical significance
- Tests: Paired t-tests, 95% confidence intervals
- Budget: Fixed token budget per run (~10B tokens for small-scale)
All benchmarks are:
- β Deterministic - Fixed seeds, documented NCCL/CUDA flags
- β Scripted - Single command to run full matrix
- β Logged - JSONL/CSV output with run metadata
- β Versioned - Checkpoint/config stored with results
Example:
# Run CORE benchmark evaluation
uv run scripts/base_eval.pyIf the eval bundle download fails (e.g. HTTP 403), point the script at a local bundle or a mirror:
uv run python -m scripts.base_eval --eval-bundle-zip /path/to/eval_bundle.zip
# or
uv run python -m scripts.base_eval --eval-bundle-dir /path/to/eval_bundle/See our evaluation roadmap in .beads/ (Epic: bio_inspired_nanochat-gzm).
Every aspect of the synapse can be tuned via SynapticConfig. These parameters act as the "genome" of the artificial brain.
| Parameter | Default | Bio-Analog | Effect on Model |
|---|---|---|---|
tau_c |
4.0 | Calcium Decay | How long a neuron stays "excited" after firing. Higher = longer bursts. |
tau_rrp |
40.0 | Vesicle Refill | Recovery time from fatigue. Higher = prone to "writer's block" if repetitive. |
alpha_ca |
0.25 | Calcium Influx | Sensitivity to attention scores. Higher = easier to trigger release. |
syt_fast_kd |
0.4 | Synaptotagmin |
The threshold for rapid release. Lower = more trigger-happy. |
stochastic_train_frac |
0.12 | Thermal Noise | Fraction of query positions that use stochastic vesicle release during training. |
stochastic_mode |
normal_reparam |
Sampler | Fast stochastic sampling mode (normal_reparam, gumbel_sigmoid_ste, or straight_through). |
stochastic_tau |
1.0 | Temperature | Relaxation temperature for gumbel_sigmoid_ste (lower = harder). |
stochastic_count_cap |
8 | Count Cap | Max vesicles per edge for stochastic sampling (higher = more compute). |
tau_buf |
4.0 | Calcium Buffer | Buffering timescale. Higher = slower calcium dynamics. |
tau_prime |
5.0 | SNARE Priming | Vesicle priming timescale. Affects release readiness. |
| Parameter | Default | Bio-Analog | Effect on Model |
|---|---|---|---|
rank_eligibility |
16 | PSD Complexity | Rank of the Hebbian update. Higher = more complex associative patterns. |
rho_elig |
0.95 | Trace Decay | How long the "scratchpad" memory lasts. 0.95 |
camkii_gain |
1.5 | LTP Strength | "Write" speed for long-term memory. Higher = learns faster from context. |
pp1_gain |
1.0 | LTD Strength | "Erase" speed. Higher = forgets useless context faster. |
bdnf_gamma |
0.0 | Metaplasticity | BDNF-driven LR modulation. Higher = activity-dependent learning boost. |
| Parameter | Default | Bio-Analog | Effect on Model |
|---|---|---|---|
energy_cost_rel |
0.015 | Metabolic Cost | The tax paid for firing. Higher = leaner, smaller networks. |
split_health_min |
0.80 | Mitosis Threshold | How healthy an expert must be to clone. Lower = faster growth. |
router_contrastive_push |
0.1 | Lateral Inhibition | Forces experts to specialize. Higher = sharper specialization. |
Total Parameters: 48 (see full inventory in CMA-ES plan)
Parameter Categories:
- β‘ Critical (Top-10): Largest impact on quality/performance
- π§ͺ Subgroup (38): Domain-specific tuning (Calcium, Vesicle, Post, Structural, Kernel)
You can tweak the personality of the brain by adjusting its chemical balance via CLI overrides.
| If the model is... | It means... | You should tweak... | Action |
|---|---|---|---|
| Repetitive / Stuck | Synapses aren't tiring fast enough. | tau_rrp (Refill Time) |
β¬οΈ Increase |
| Forgetful | Short-term memory is fading too fast. | camkii_gain (Write Strength) |
β¬οΈ Increase |
| Scatterbrained | Firing is too noisy/random. | syt_fast_kd (Sensor Sensitivity) |
β¬οΈ Decrease |
| Too Small / Dumb | Experts aren't reproducing. | split_health_min (Birth Bar) |
β¬οΈ Decrease |
| Bloated / Slow | Too many lazy experts. | energy_cost_rel (Metabolic Tax) |
β¬οΈ Increase |
Pro Tip: Try this "ADHD Mode" override to force high novelty seeking:
python -m scripts.base_train --syn_cfg.tau_rrp=100.0 --syn_cfg.energy_cost_rel=0.05- Python: 3.14
- UV: Latest version for fast dependency resolution
- GPU: NVIDIA with CUDA 12.4+ (dual RTX 4090 recommended)
- RAM: 32GB+ for large models
# Clone the repository
git clone https://github.com/Dicklesworthstone/bio_inspired_nanochat.git
cd bio_inspired_nanochat
# Create environment with UV
uv venv .venv --python 3.14.2
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies (GPU)
uv sync --extra gpu
# OR for CPU-only
uv sync --extra cpu
# Build Rust kernels (optional, for CPU acceleration)
uv run maturin developBefore pushing changes, run the fast quality gate on the files you touched:
# Staged changes (pre-commit style)
uv run python -m scripts.quality_gate --mode staged
# Branch diff vs main (pre-push style)
uv run python -m scripts.quality_gate --mode branch --base origin/mainWhat it enforces:
uv run ruff check --fix --unsafe-fixes(and fails if it had to modify files)uvx ty check(type errors fail; warnings are allowed)- UBS resource-lifecycle scan (runs via
ubs --category=resource-lifecycle --staged/--diffwhere possible; branch/CI may scan the whole repo)
Exemptions: if a tool reports a false positive, prefer a narrow, documented suppression
(# noqa: ..., # type: ignore[...], or a scoped ty.toml exclusion) and create a Beads issue
explaining why the exemption is correct.
Train a small bio-model (~4 hours on dual 4090s).
python -m scripts.base_train \
--synapses=1 \ # Enable biology
--depth=12 \ # Layers
--width=768 \ # Hidden size
--splitmerge_every=1000 \ # Run "Life Cycle" every 1k steps
--batch_size=32 \ # Adjust for your GPU memory
--max_steps=50000Key Training Flags:
--synapses=1- Enable all bio mechanisms (0 = vanilla transformer)--syn_cfg.stochastic_train_frac=0.12- Enable stochastic vesicle release--syn_cfg.stochastic_mode=normal_reparam- Fast stochastic release (Gaussian approximation)--syn_cfg.stochastic_mode=gumbel_sigmoid_ste- Discrete Binomial sampling via Gumbel-Sigmoid straight-through--syn_cfg.stochastic_tau=1.0- Stochastic relaxation temperature (lower = harder)--syn_cfg.bdnf_gamma=0.1- Enable BDNF metaplasticity--splitmerge_every=N- Expert lifecycle interval (0 = disable)
tensorboard --logdir runs/Key Metrics to Watch:
- π Heartbeat:
energy_mean(Should stay > 0.5) - π§ Map:
router_embedding(Should show distinct clusters of expertise) - π³ Family Tree:
lineage(Watch experts split and branch out) - π Calcium:
calcium_mean,rrp_mean(Presynaptic dynamics) - π― Hebbian:
fast_weight_norm(Postsynaptic plasticity)
# Launch web chat interface
python -m scripts.chat_web --source sft --port 8000# Run CORE benchmark evaluation
uv run scripts/base_eval.pybio_inspired_nanochat/synaptic.pyβ‘ The Physics Engine: 48-parameterSynapticConfig+ core dynamicsbio_inspired_nanochat/gpt_synaptic.pyποΈ The Body: Transformer skeleton with synaptic organsbio_inspired_nanochat/synaptic_splitmerge.pyπΌ The God Hand: Surgical controller for expert lifecyclebio_inspired_nanochat/neuroscore.pyπ The Credit Score: Expert fitness metrics (Efficiency, Specialization, Resilience)
bio_inspired_nanochat/kernels/presyn_fused.pyπ₯ GPU Kernel: 375-line Triton implementationrust_src/src/presyn.rsπ¦ CPU Kernel: PyO3-native Rust implementationrust_src/src/moe.rsπ¦ MoE Kernel: Expert routing and metabolismtests/test_rust_kernels.pyβ Reference: Python validation implementation
bio_inspired_nanochat/neuroviz.pyπΈ The MRI: Visualizations of brain internal statescripts/dashboard.pyπ State Inspector: Interactive exploration
scripts/tune_bio_params.py𧬠The Evolver: CMA-ES optimizerscripts/base_eval.pyπ Evaluation: CORE benchmark evaluation
scripts/enable_synapses.pyπ The Injector: Checkpoint conversion utilityscripts/base_train.pyπ Training Loop: Main training scriptscripts/chat_web.pyπ¬ Chat UI: Web-based inference interface
prompts/π The DNA: Theoretical blueprints and research proposals.beads/π Project Management: 69 tasks across 7 epics- Planning docs (root): CMA-ES plan, feature roadmap, predictions
-
Bio-Inspired Modular Features (11 tasks, P1)
- Stochastic release, BDNF, dual weights, lifecycle, buffers, etc.
- Goal: Modular, toggleable bio mechanisms for clean ablation studies
-
CMA-ES Hyperparameter Optimization (10 tasks, P1)
- Systematic optimization of 48 parameters across 2 phases
- Goal: Discover optimal bio configs for different model scales
-
Bio vs Vanilla Evaluation (5 tasks, P1)
- Rigorous benchmarking with statistical significance
- Goal: Quantify quality/performance tradeoffs of bio mechanisms
-
Dual-4090 Performance Optimization (7 tasks, P1)
- FlexAttention, NCCL tuning, kernel fusion, cudagraphs
- Goal: 90%+ GPU utilization on training and inference
-
Training Visualization & Insight (3 tasks, P1)
- Rich dashboards, attention/energy maps, pedagogical notebooks
- Goal: Understand and communicate bio mechanisms effectively
-
Cross-Pollination with Model Guided Research (4 tasks, P1)
- Integration of gauge-reversible, simplicial, ultrametric ideas
- Goal: Explore synergies between bio and mathematical constraints
-
Infrastructure & CI (29 tasks, P2-P3)
- Metrics schema, budgeting, seeds, lint/type/UBS gates, perf guardrails
- Goal: Research velocity and code health
Q1 2025:
- β Complete Rust kernel implementation
- β Document comprehensive roadmap (this README!)
- π― Implement top-3 bio features (stochastic, BDNF, ring buffer)
- π― Run Phase 1 CMA-ES optimization
Q2 2025:
- π― Complete bio vs vanilla benchmark matrix
- π― Publish initial research findings
- π― Dual-4090 performance target (90% utilization)
Q3 2025:
- π― Phase 2 CMA-ES (subgroup optimization)
- π― Cross-pollination prototypes
- π― Cellular automata initialization experiments
Use .beads/ (Beads tracker) to explore the full dependency graph and task details.
- Tsodyks, M., & Markram, H. (1997). "The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability." PNAS.
- Hebb, D. O. (1949). "The Organization of Behavior." Wiley.
- Takeuchi, T., et al. (2014). "The synaptic plasticity and memory hypothesis." Neuron.
- Vaswani, A., et al. (2017). "Attention is All You Need." NeurIPS.
- Schlag, I., et al. (2021). "Linear Transformers Are Secretly Fast Weight Programmers." ICML.
- Fedus, W., et al. (2022). "Switch Transformers." JMLR.
- Hansen, N. (2016). "The CMA Evolution Strategy: A Tutorial." arXiv:1604.00772.
- Nanochat - Original minimal GPT implementation
- FlashAttention - Fast attention kernels
- Model Guided Research - Mathematical geometry for LLMs
(Inherited from the base Nanochat repo)
This repo remains fully compatible with the original "silicon" workflows:
speedrun.sh: Train a standard static GPT-2.scripts/chat_web.py: Chat UI.- To disable biology, just run without
--synapsesflag.
MIT License (with OpenAI/Anthropic Rider) β see LICENSE for details.
- Andrej Karpathy - For the original Nanochat codebase
- Neuroscience Community - For decades of synaptic research
- PyTorch Team - For Triton and FlexAttention
- Anthropic - For Claude Sonnet 4.5 which assisted with planning and documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Twitter/X: @dicklesworthstone
Built with β€οΈ and π§ at the intersection of neuroscience and machine learning