MIDI-grep

Extract piano riffs from audio files or YouTube videos and generate Strudel code for live coding.

Audio/YouTube → Stem Separation → MIDI Transcription → Strudel Code

Features

YouTube Support: Paste a URL, get playable code
AI-Powered Separation: Demucs isolates melodic/bass/drums/vocals stems
Drum Pattern Detection: Extracts kick, snare, hi-hat patterns automatically
Accurate Transcription: Spotify's Basic Pitch for audio-to-MIDI
BPM & Key Detection: Automatic tempo and musical key analysis with confidence scores
Detection Candidates: Shows top 5 candidates for key, BPM, time signature, and style in output header
Smart Caching: Stems cached by URL/file hash; auto-invalidates when processing scripts change
Chord Mode: Alternative chord-based generation for electronic/funk music (--chords)
Loop Detection: Automatically identifies repeating patterns (1, 2, 4, or 8 bar loops) with confidence scoring
Genre Auto-Detection: Automatically detects genre and uses specialized generators:
- Brazilian Funk: BPM 130-145 (136 typical), mid-heavy spectrum, vocal chops → uses tamborzão templates
- Brazilian Phonk: BPM 80-100 or 145-180, darker sound → phonk-style drums
- Retro Wave/Synthwave: BPM 130-170, longer synth notes → synthwave style
- Style Detection: Detects style based on BPM, key (minor/major), and note density
Deep Learning Genre Detection: CLAP (Contrastive Language-Audio Pretraining) model for zero-shot audio classification (enabled by default)
Manual Genre Override: --genre flag to force specific genre when auto-detection fails
Genre-Aware Sound RAG: LLM prompts include only ~15 genre-relevant sounds (not all 196), reducing hallucinated sound names and saving ~760 tokens per call
AI-Driven Audio Rendering: Synthesize WAV previews with AI-suggested mix parameters
- Spectral/dynamics/timbre analysis of original audio
- Automatic effect parameter optimization
- Rendered vs original comparison with similarity scoring
Self-Contained HTML Report: Single-file report with everything embedded
- Audio Studio Player: Two-section stem mixer (Original + Rendered)
- Solo/Mute controls per stem with waveform visualization
- A/B comparison mode (toggle between original and rendered)
- Per-stem comparison charts (bass, drums, melodic)
- Visual comparison charts (spectrograms, chromagrams, frequency bands)
- Copyable analysis data tables
- Strudel code with copy button
Dynamic Strudel Output: Rich patterns with per-voice effects
- .velocity() patterns with dynamic range expansion for expressive dynamics
- Style-specific accent patterns (downbeat, backbeat, offbeat)
- Compressor effect for dynamics control (electronic)
- Per-voice filtering (HPF/LPF by register)
- Stereo panning with style-appropriate LFO shapes (sine, perlin, saw)
- ADSR envelopes for synth/orchestral styles
- Voice-appropriate reverb, delay, phaser, vibrato
- Style-specific FX: bitcrush/coarse for lofi, distort for electronic
- FM synthesis (.fm, .fmh, .fmdecay) for synth/electronic richness
- Tremolo/amplitude modulation for synth/orchestral movement
- Filter envelope (.lpenv, .lpattack, .lpdecay) for dynamic sweeps
- Sidechain/ducking (.duck, .duckattack, .duckdepth) for electronic pumping
- Ring modulation (.ring, .ringfreq) for metallic timbres
- Chorus (.chorus, .chorusDepth) for warm width (jazz/soul)
- Leslie effect (.leslie, .leslieSpeed) for organ-style modulation
- Shape/saturation (.shape) for harmonic warmth
- Pitch envelope (.pitchenv, .pitchattack) for synth bass punch
- Pattern transforms: swing for jazz, degradeBy for lofi, iter for electronic/lofi variation
- .clip() for note duration control (staccato/legato/sustained)
- .echo() for rhythmic repeats
- .superimpose() for detuned layering (synth richness)
- .off() for harmonic layering with time offset
- .layer() for parallel transformations (orchestral octave doubling)
- .echoWith() for sophisticated pitch-shifted echoes (electronic)
- .scale() for key-aware quantization
- Section detection with time markers
Sound Style Presets: 20+ styles with auto-detection
- GM Soundfont: piano, synth, orchestral, electronic, jazz, lofi
- Groove Styles: funk, soul, house, trance (auto-detected by BPM/key)
- Raw Oscillators: raw, chiptune, ambient, drone, plus supersaw, ZZFX synths
- Sample-based: mallets, plucked, keys, pad, percussive
- Genre-specific: synthwave, darkwave, minimal, industrial, newage
Extended Sound Palette:
- Supersaw: Fat detuned sawtooth for bass/leads
- ZZFX Synths: 8-bit style (z_sawtooth, z_square, z_triangle, z_tan, z_noise)
- Wavetables: Digital waveforms (wt_digital, wt_vgame, wt_piano)
- Noise: Textural elements (white, pink, brown, crackle)
Web Interface: HTMX-powered UI, no JavaScript frameworks
CLI Tool: Full-featured command-line interface

Architecture

                              MIDI-grep Pipeline

   INPUT            SEPARATE          ANALYZE          AI GENERATE        OUTPUT
     │                 │                 │                  │                │
     ▼                 ▼                 ▼                  ▼                ▼
┌─────────┐     ┌────────────┐     ┌───────────┐     ┌──────────────┐    ┌─────────┐
│ YouTube │     │  Demucs    │     │  librosa  │     │ Ollama LLM   │    │ Strudel │
│ yt-dlp  │────▶│   stem     │────▶│  BPM/Key  │────▶│ AI code gen  │───▶│  code   │
│ WAV/MP3 │     │ separation │     │  + CLAP   │     │ + iteration  │    │  + WAV  │
└─────────┘     └────────────┘     └───────────┘     └──────────────┘    └─────────┘
                    │                  │                    │
                    ▼                  ▼                    ▼
              ┌──────────┐      ┌───────────┐       ┌──────────────┐
              │ melodic  │      │  Genre    │       │  BlackHole   │
              │ bass     │      │ Detection │       │  Recorder    │
              │ drums    │      │           │       │ (real audio) │
              │ vocals   │      └───────────┘       └──────────────┘
              └──────────┘

────────────────────────────────────────────────────────────────────────────────
                              Tech Stack

  Go 1.21+                              Python 3.11+
  ├── CLI (Cobra)                       ├── demucs      - stem separation
  ├── HTTP (Chi)                        ├── basic-pitch - audio to MIDI
  └── Pipeline orchestration            ├── librosa     - audio analysis
                                        ├── ollama_agent - agentic LLM
                                        └── ai_improver - iterative improvement

  TypeScript/Node.js                    External
  ├── BlackHole recorder (primary)      ├── yt-dlp     - YouTube download
  └── Puppeteer + ffmpeg                ├── ffmpeg     - audio conversion
                                        ├── Ollama     - local LLM
  Frontend                              └── ClickHouse - learning DB
  ├── HTMX
  ├── PicoCSS
  └── SSE updates

Complete Orchestration Flow

This diagram shows the full pipeline including AI-driven iteration (worst case: all iterations run).

flowchart TB
    %% ===== INPUT & SEPARATION =====
    CLI["CLI: extract --url/--file"] --> YT["yt-dlp download"]
    YT --> CheckCache{"Cached stems?"}
    CheckCache -->|No| Demucs["Demucs: melodic + drums + bass + vocals"]
    CheckCache -->|Yes| LoadStems["Load cached stems"]
    Demucs --> Analysis
    LoadStems --> Analysis

    %% ===== ANALYSIS =====
    Analysis["librosa + CLAP"] --> Profile["AudioProfile extraction:<br/>BPM, key, genre, spectrum"]
    Profile --> Features["Spectrum, dynamics,<br/>rhythm, timbre, structure"]

    %% ===== TRANSCRIPTION =====
    Features --> BasicPitch["Basic Pitch → MIDI notes"]
    Features --> DrumDetect["Drum onset → bd/sd/hh"]
    BasicPitch --> Cleanup["Quantize + Simplify"]
    DrumDetect --> Cleanup

    %% ===== INITIAL GENERATION (LLM-first) =====
    Cleanup --> CheckBest{"Best previous run<br/>in ClickHouse?"}
    CheckBest --> QueryBest["Query best code + similarity<br/>for this track_hash"]
    QueryBest --> CH[("ClickHouse<br/>runs + knowledge")]

    CH -->|Found| StartFromBest["Start from best code"]
    CH -->|Not found| GenreRAG["Genre Sound RAG<br/>retrieve_genre_context()<br/>~15 sounds per genre"]
    GenreRAG --> FreshGen["Ollama generates fresh<br/>Strudel code<br/>(with genre palette)"]

    StartFromBest --> InitCode["output.strudel"]
    FreshGen --> InitCode

    %% ===== RENDER via BlackHole =====
    InitCode --> Render["BlackHole Recorder<br/>(Puppeteer + ffmpeg)<br/>100% real Strudel audio"]
    Render --> StemSep["Demucs: separate render<br/>into bass/drums/melodic"]
    StemSep --> Compare["compare_audio.py<br/>(MAE, not cosine)"]
    Compare --> Metrics["Per-stem diffs:<br/>bass, drums, melodic<br/>+ per-band analysis"]

    %% ===== TWO-PHASE IMPROVEMENT LOOP =====
    Metrics --> LoopCheck{"iteration < N AND<br/>similarity < target?"}

    LoopCheck -->|Yes| Phase1["Phase 1: Deterministic<br/>optimize_parameters()<br/>gain/lpf/hpf/room math"]
    Phase1 --> Phase1Check{"Changes made?"}
    Phase1Check -->|Yes| QuickRender["BlackHole render<br/>(full track)"]
    Phase1Check -->|No| Phase2["Phase 2: Constrained LLM<br/>+ Genre Sound RAG<br/>(~15 valid sounds per genre)"]
    Phase2 --> QuickRender

    QuickRender --> Regress{"new > best?"}
    Regress -->|No| Revert["REVERT to best code"]
    Regress -->|Yes| Accept["Accept + store to ClickHouse"]
    Revert --> LoopCheck
    Accept --> LoopCheck

    %% ===== VALIDATION =====
    Phase2 --> Validate{"Valid Strudel?<br/>No .peak/.volume/.eq<br/>Valid sound names?"}
    Validate -->|No| LoopCheck
    Validate -->|Yes| QuickRender

    %% ===== OUTPUT =====
    LoopCheck -->|Done| BatchStems["Batch Demucs: separate<br/>each iteration render<br/>into melodic/drums/bass"]
    BatchStems --> FinalRender["Full-length BlackHole render<br/>+ Demucs stem separation"]
    FinalRender --> Report["generate_report.py"]
    Report --> Output["vNNN/<br/>├── output.strudel<br/>├── render.wav + stems<br/>├── render_v*_{melodic,drums,bass}.mp3<br/>├── iterations.json (with stem paths)<br/>├── comparison.json<br/>├── stem_comparison.json<br/>└── report.html"]

    %% ===== PER-TRACK LEARNING =====
    CH -.->|"Best runs + learned<br/>parameter deltas"| Phase1

    %% ===== STYLING =====
    style CLI fill:#1565c0,color:#fff
    style Demucs fill:#2e7d32,color:#fff
    style Analysis fill:#7b1fa2,color:#fff
    style Profile fill:#7b1fa2,color:#fff
    style Render fill:#512da8,color:#fff
    style StemSep fill:#2e7d32,color:#fff
    style Compare fill:#00796b,color:#fff
    style Phase1 fill:#e65100,color:#fff
    style Phase2 fill:#c62828,color:#fff
    style CH fill:#303f9f,color:#fff
    style FinalRender fill:#512da8,color:#fff
    style Report fill:#6a1b9a,color:#fff
    style Output fill:#1b5e20,color:#fff
    style GenreRAG fill:#00897b,color:#fff
    style FreshGen fill:#f9a825,color:#000

External Dependencies

Component	Technology	Purpose
yt-dlp	Python CLI	YouTube audio download
ffmpeg/ffprobe	C binary	Audio codec, duration, BlackHole recording
Demucs	PyTorch	Stem separation (melodic/drums/bass/vocals)
Basic Pitch	TensorFlow	Audio → MIDI transcription
librosa	Python	BPM, key, onset detection
CLAP	PyTorch	Zero-shot genre classification
Ollama	Go binary	Local LLM (llama3:8b) for code gen + iteration
ClickHouse	C++ binary	Learning database (runs + knowledge)
Puppeteer	Node.js	Browser automation for BlackHole recording
BlackHole	macOS driver	Virtual audio device for recording
strudel.dygy.app	Web	Self-hosted Strudel embed (recording target)

Module Structure

midi-grep/
├── cmd/midi-grep/           # CLI entrypoint (Go)
│   └── main.go              # Cobra commands: extract, serve, generative
│
├── internal/                # Go packages
│   ├── audio/               # Input validation, YouTube download, stem separation
│   ├── analysis/            # BPM & key detection (calls Python)
│   ├── midi/                # Transcription & cleanup (calls Python)
│   ├── pipeline/            # Orchestrates full flow (calls Python AI for code gen)
│   ├── server/              # HTTP server, HTMX templates, SSE
│   ├── exec/                # Python subprocess runner
│   ├── cache/               # Stem + output caching with versioning
│   ├── report/              # Go HTML report generation
│   └── generative/          # RAVE neural synthesizer pipeline
│
├── scripts/
│   ├── midi-grep.sh         # Main CLI wrapper (Bash)
│   ├── node/                # TypeScript BlackHole recorder + rendering
│   │   ├── src/
│   │   │   └── record-strudel-blackhole.ts  # BlackHole recorder (primary)
│   │   ├── dist/            # Compiled JavaScript output
│   │   ├── package.json     # puppeteer, typescript
│   │   └── tsconfig.json    # TypeScript configuration
│   └── python/              # Python ML + AI scripts
│       ├── separate.py      # Demucs stem separation
│       ├── transcribe.py    # Basic Pitch audio → MIDI
│       ├── analyze.py       # librosa BPM/key detection
│       ├── cleanup.py       # MIDI quantization & filtering
│       ├── detect_drums.py  # Drum onset detection & classification
│       ├── detect_genre_dl.py    # CLAP deep learning genre detection
│       ├── ai_improver.py   # AI-driven iterative improvement (Ollama/Claude)
│       ├── ollama_codegen.py # Ollama LLM code gen with genre sound RAG
│       ├── ollama_agent.py  # Agentic LLM with ClickHouse memory + genre RAG
│       ├── sound_selector.py     # Sound catalog (67 drums, 128 GM) + genre RAG retrieval
│       ├── compare_audio.py      # Rendered vs original comparison (MAE)
│       ├── generate_report.py    # Self-contained HTML report
│       ├── analyze_synth_params.py  # AI audio analysis for synthesis
│       ├── render_audio.py       # WAV synthesis (fallback renderer)
│       ├── render_with_models.py # Render with granular models
│       └── thin_patterns.py      # Pattern density control
│
└── context/                 # AWOS documentation
    ├── product/             # Product definition, roadmap
    └── spec/                # Feature specifications

Data Flow

1. INPUT
   YouTube URL ──▶ yt-dlp ──▶ audio.wav
   Local file ─────────────▶ audio.wav

2. CACHE CHECK
   URL/file hash ──▶ Check .cache/stems/ ──▶ Use cached if valid

3. STEM SEPARATION
   audio.wav ──▶ Demucs ──▶ melodic.mp3 + drums.mp3 + bass.mp3 + vocals.mp3

4. ANALYSIS
   melodic.mp3 ──▶ librosa + CLAP ──▶ { bpm: 136, key: "C# minor", genre: "brazilian_funk" }

5. TRANSCRIPTION (parallel)
   melodic.mp3 ──▶ Basic Pitch ──▶ raw.mid ──▶ cleanup ──▶ notes.json
   drums.mp3 ──▶ onset detection ──▶ drum_hits.json

6. AI CODE GENERATION (LLM-first)
   notes + drums + analysis ──▶ Genre Sound RAG (~15 sounds) ──▶ Ollama ──▶ code.strudel
   ClickHouse best runs ──▶ start from best known code

7. RENDER & ITERATE
   code.strudel ──▶ BlackHole Recorder ──▶ render.wav
   render.wav ──▶ compare_audio.py ──▶ similarity score
   similarity < target? ──▶ AI improver ──▶ iterate

Quick Start

# Install dependencies
./scripts/midi-grep.sh install

# Extract from YouTube
./scripts/midi-grep.sh extract --url "https://youtu.be/Q4801HzWZfg"

# Extract from local file
./scripts/midi-grep.sh extract --file track.wav --output riff.strudel

# Start web interface
./scripts/midi-grep.sh serve --port 8080

Installation

Prerequisites

Go 1.21+: brew install go
Python 3.11+: brew install python@3.11
yt-dlp (for YouTube): brew install yt-dlp
ffmpeg (for audio processing): brew install ffmpeg

Install

# Clone the repository
git clone https://github.com/dygy/midi-grep.git
cd midi-grep

# Install all dependencies (Go + Python)
./scripts/midi-grep.sh install

This installs:

Python packages: demucs, basic-pitch, librosa, pretty_midi
Builds the Go binary

Usage

Command Line

# Basic extraction from YouTube
./scripts/midi-grep.sh extract --url "https://youtu.be/VIDEO_ID"

# With options
./scripts/midi-grep.sh extract \
  --url "https://youtu.be/VIDEO_ID" \
  --quantize 8 \
  --output riff.strudel \
  --midi piano.mid

# From local file
./scripts/midi-grep.sh extract --file track.wav

# Copy to clipboard (macOS)
./scripts/midi-grep.sh extract --url "..." --copy

CLI Options

Option	Short	Description
`--url`	`-u`	YouTube URL to extract from
`--file`	`-f`	Local audio file (WAV/MP3)
`--output`	`-o`	Output file for Strudel code
`--quantize`	`-q`	Quantization: 4, 8, or 16 (default: 16)
`--midi`	`-m`	Also save cleaned MIDI file
`--copy`	`-c`	Copy result to clipboard
`--verbose`	`-v`	Show verbose output
`--render`	-	Render audio to WAV (default: `auto` saves to cache, `none` to disable). Always outputs 3 stems
`--quality`	-	Stem separation quality: `fast`, `normal` (default), `high`, `best`
`--chords`	-	Use chord-based generation (better for electronic/funk)
`--no-cache`	-	Skip stem cache, force fresh extraction
`--drums`	-	Include drum patterns (default: on)
`--drums-only`	-	Extract only drums (skip melodic processing)
`--drum-kit`	-	Drum kit: tr808, tr909, linn, acoustic, lofi
`--style`	-	Sound style (auto, piano, synth, electronic, house, etc.)
`--brazilian-funk`	-	Force Brazilian funk mode (auto-detected normally)
`--genre`	-	Manual genre override: `brazilian_funk`, `brazilian_phonk`, `retro_wave`, `synthwave`, `trance`, `house`, `lofi`, `jazz`
`--deep-genre`	`true`	Use deep learning (CLAP) for genre detection (skipped when `--genre` is specified)
`--iterate`	`5`	AI-driven improvement iterations (default: 5, always enabled)
`--target-similarity`	`0.85`	Target similarity score to stop iteration (default: 0.85)
`--ollama`	`true`	Use Ollama (free local LLM) for AI improvement
`--ollama-model`	-	Ollama model to use (default: `llama3:8b`)

Default Analysis Features

All analysis features are enabled by default:

Stem Rendering: Outputs 3 separate stems (render_bass.wav, render_drums.wav, render_melodic.wav)
Per-Stem Comparison: Generates charts comparing each rendered stem vs original
Overall Comparison: Combined frequency/MFCC/chroma comparison chart
AI Improvement: 5 iterations targeting 85% similarity
Iteration Stem Separation: Batch Demucs on each iteration render → per-iteration melodic/drums/bass stems
HTML Report: Audio studio with Solo/Mute controls, per-iteration stem tracks, A/B comparison, shimmer loading, waveforms

AI-Driven Code Improvement

MIDI-grep can iteratively improve Strudel code using AI analysis:

# Run 5 iterations of AI improvement (uses Ollama by default - free & local)
./bin/midi-grep extract --url "..." --iterate 5

# Target 75% similarity, max 10 iterations
./bin/midi-grep extract --url "..." --iterate 10 --target-similarity 0.75

# Use a different Ollama model
./bin/midi-grep extract --url "..." --iterate 5 --ollama-model llama3:8b

How it works:

Render initial Strudel code to WAV
Compare against original audio (MFCC, chroma, frequency bands)
Genre Sound RAG: Inject ~15 genre-appropriate sounds into LLM prompt (not the full 196-sound catalog)
LLM analyzes gaps and suggests changes using only valid genre sounds
Apply changes and repeat until target reached
Batch stem separation: Demucs splits each iteration render into melodic/drums/bass
Store all runs in ClickHouse for learning
Generate report with per-iteration stem tracks (mute buttons, shimmer loading)

Ollama Setup (one-time, free):

brew install ollama
ollama serve
ollama pull llama3:8b  # 3.8GB download

ClickHouse Learning Storage:

All improvement runs are stored in ClickHouse for incremental learning:

midi_grep.runs - Every render attempt with similarity scores
midi_grep.knowledge - Learned parameter improvements that transfer to future tracks

ClickHouse Local is auto-downloaded and requires no setup. Data stored in .clickhouse/db/.

# Query your improvement history
./bin/clickhouse local --path .clickhouse/db \
  --query "SELECT track_hash, version, similarity_overall FROM midi_grep.runs ORDER BY created_at DESC LIMIT 5"

Web Interface

./scripts/midi-grep.sh serve --port 8080

Open http://localhost:8080 in your browser:

Drag & drop audio files
Paste YouTube URLs
Real-time progress updates
Copy Strudel code with one click

Direct Binary Usage

# Build
make build

# CLI
./bin/midi-grep extract --input track.wav
./bin/midi-grep extract --url "https://youtu.be/..."
./bin/midi-grep serve --port 8080

# Help
./bin/midi-grep --help
./bin/midi-grep extract --help

Example Output

Default Format (Bar Arrays + Effect Functions)

The default output uses bar arrays for easy mixing and matching:

// MIDI-grep output
// BPM: 136, Key: C# minor
// Notes: 497 (bass: 17, mid: 371, high: 109)
// Drums: 287 hits (bd: 72, sd: 78, hh: 5)
// Kit: tr808
// Style: house

setcps(136/60/4)

// Bar arrays - mix & match freely
let bass = [
  "cs2 ~*7 cs2",
  "~*6 cs2",
  "~*7 cs2"
]

let mid = [
  "cs4 ~*3 e4 ~*2 cs4",
  "~*4 cs4 ~*2 ds4",
  "cs4 ~*3 e4"
]

let drums = [
  "bd ~ sd ~ bd sd ~ ~",
  "bd ~ sd ~ bd ~ sd bd",
  "~ oh ~ sd ~ oh"
]

// Effects (applied at playback)
let bassFx = p => p.sound("supersaw").lpf(800).room(0.12)
let midFx = p => p.sound("gm_pad_poly").lpf(4000).room(0.21)
let drumsFx = p => p.bank("RolandTR808").room(0.15)

// Play all
$: stack(
  bassFx(cat(...bass.map(b => note(b)))),
  midFx(cat(...mid.map(b => note(b)))),
  drumsFx(cat(...drums.map(b => s(b))))
)

// Mix & match:
// $: bassFx(note(bass[0]))
// $: cat(...bass.slice(0,4).map(b => note(b)))

Brazilian Funk/Phonk (auto-detected)

When the tool detects Brazilian funk characteristics (BPM 125-155, vocal chop transcription patterns, low bass content), it automatically switches to template-based generation:

// MIDI-grep output (Brazilian Funk mode)
// BPM: 136, Key: C# minor
// Genre: Brazilian Funk / Phonk
// Pattern: Tamborzão

setcps(136/60/4)

// Tamborzão drum pattern (2 bars)
let tamborzao = `
  bd ~ ~ bd ~ ~ bd ~ | ~ bd ~ ~ bd ~ ~ ~
`

let snare = `
  ~ ~ sd ~ ~ ~ sd ~ | ~ ~ sd ~ ~ ~ sd ~
`

let hats = `
  hh hh hh hh hh hh hh hh | hh hh oh hh hh hh oh hh
`

// 808 Bass (follows kick pattern)
let bassPattern = `
  c#1 ~ ~ c#1 ~ ~ c#1 ~ | ~ c#1 ~ ~ g#1 ~ ~ ~
`

// Synth stab (phonk style)
let stab = `
  [c#4,e4,g#4] ~ ~ ~ [c#4,e4,g#4] ~ ~ ~ | ~ ~ [c#4,e4,g#4] ~ ~ ~ ~ ~
`

// Effects
let drumFx = p => p.bank("RolandTR808").room(0.1).gain(1.0)
let bassFx = p => p.sound("sawtooth")
    .lpf(200).gain(1.2).distort(0.3)
    .attack(0.001).decay(0.3).sustain(0.2).release(0.1)
let stabFx = p => p.sound("square")
    .lpf(2000).gain(0.6).distort(0.2)
    .attack(0.01).decay(0.1).sustain(0.3).release(0.05)
    .room(0.2)

// Play all
$: stack(
  drumFx(s(tamborzao)),
  drumFx(s(snare)),
  drumFx(s(hats).gain(0.6)),
  bassFx(note(bassPattern)),
  stabFx(note(stab))
)

Jazz Style (with swing, perlin LFO, vibrato)

// MIDI-grep output
// BPM: 89, Key: E minor
// Notes: 48 (bass: 12, mid: 28, high: 8)
// Style: jazz
// Duration: 32.0 beats
// Sections: 0:00 intro | 0:08 main

setcps(89/60/4)

$: stack(
  // bass (12 notes)
  note("e2 ~ b2 ~ | e2 g2 ~ b2")
    .sound("gm_acoustic_bass")
    .velocity("0.75 ~ 0.68 ~ | 0.82 0.60 ~ 0.70")
    .gain(1.20)
    .pan(0.5).hpf(50).lpf(800).vib(3.0).vibmod(0.08).room(0.20).size(0.30)
    .swing(0.10),

  // mid (28 notes)
  note("[e4,g4] [b3,d4] fs4 [a3,cs4]")
    .sound("gm_epiano1")
    .velocity("0.65 0.72 0.80 0.58")
    .pan(perlin.range(0.44,0.56).slow(4)).hpf(200).lpf(4000).vib(3.0).vibmod(0.08).room(0.35).size(0.50)
    .swing(0.10),

  // high (8 notes)
  note("b5 ~ ~ e5 | fs5 ~ gs5 ~")
    .sound("gm_vibraphone")
    .velocity("0.55 ~ ~ 0.62 | 0.70 ~ 0.65 ~")
    .gain(0.80)
    .pan(perlin.range(0.38,0.62).slow(3)).hpf(400).lpf(10000).vib(3.0).vibmod(0.08).room(0.40).size(0.60).delay(0.15).delaytime(0.375).delayfeedback(0.30)
    .swing(0.10)
)

Lofi Style (with bitcrush, coarse, echo, superimpose, degradeBy)

// Style: lofi
$: stack(
  note("c3 ~ g2 ~ | c3 e3 ~ g2")
    .sound("gm_electric_bass_finger")
    .velocity("0.70 ~ 0.65 ~")
    .pan(0.5).hpf(50).lpf(800).clip(1.10).crush(10).coarse(4).room(0.18).size(0.27).echo(2,0.125,0.40)
    .superimpose(add(0.03))
    .swing(0.05).degradeBy(0.05),
  // ...
)

Synth Style (with FM synthesis, phaser, envelope, off, superimpose)

// Style: synth
$: stack(
  note("c3 ~ g3 ~ | c3 e3 ~ g3")
    .sound("gm_synth_bass_1")
    .velocity("0.75 ~ 0.70 ~")
    .gain(1.30)
    .pan(saw.range(0.43,0.57).slow(4)).hpf(50).lpf(800)
    .attack(0.003).decay(0.08).sustain(0.90).release(0.05)
    .phaser(0.50).phaserdepth(0.30).vib(4.0).vibmod(0.10)
    .fm(1.5).fmh(1.0).fmdecay(0.30).fmsustain(0.50)
    .room(0.16).size(0.24).echo(2,0.125,0.40)
    .superimpose(add(0.03))
    .off(0.125, add(12)),
  // ...
)

Raw Style (pure oscillators with FM and filter envelope)

// Style: raw - using Strudel's built-in oscillators
$: stack(
  note("c2 ~ g2 ~ | c2 e2 ~ g2")
    .sound("sawtooth")
    .velocity("0.80 ~ 0.75 ~")
    .gain(0.80)
    .pan(0.5)
    .hpf(50)
    .lpf(800)
    .lpattack(0.010)
    .lpdecay(0.20)
    .lpsustain(0.50)
    .lprelease(0.15)
    .lpenv(3000)
    .attack(0.003)
    .decay(0.05)
    .sustain(0.70)
    .release(0.10)
    .phaser(0.60)
    .phaserdepth(0.40)
    .fm(2.0)
    .fmh(1.5)
    .room(0.10)
    .size(0.15)
    .superimpose(add(0.03))
    .off(0.125, add(12)),

  note("c4 e4 g4 b4")
    .sound("square")
    .velocity("0.65 0.70 0.75 0.68")
    .gain(0.60)
    .pan(saw.range(0.40,0.60).slow(4))
    .lpf(4000)
    .attack(0.005)
    .decay(0.10)
    .sustain(0.70)
    .release(0.20)
    .room(0.25)
    .size(0.35),

  note("c5 ~ e5 g5")
    .sound("triangle")
    .velocity("0.55 ~ 0.60 0.58")
    .gain(0.50)
    .pan(saw.range(0.30,0.70).slow(3))
    .lpf(8000)
    .room(0.30)
    .size(0.40)
    .delay(0.20)
    .delaytime(0.375)
    .delayfeedback(0.35)
)

Chiptune Style (8-bit retro with crush and coarse)

// Style: chiptune - 8-bit video game aesthetic
$: stack(
  note("c3 c3 g2 g2 | a2 a2 e2 e2")
    .sound("square")
    .velocity("0.70 0.65 0.70 0.65")
    .gain(0.70)
    .pan(0.5)
    .attack(0.001)
    .decay(0.05)
    .sustain(0.60)
    .release(0.10)
    .clip(0.70)
    .crush(8)
    .coarse(8)
    .room(0.09)
    .size(0.14)
    .iter(2),

  note("e4 g4 c5 g4 | a4 c5 e5 c5")
    .sound("square")
    .velocity("0.50 0.55 0.60 0.55")
    .gain(0.50)
    .crush(8)
    .coarse(8)
    .room(0.15)
    .size(0.21)
    .echo(3, 0.125, 0.50)
)

Ambient Style (sine waves with heavy reverb)

// Style: ambient - atmospheric pads
$: stack(
  note("c2 ~ ~ ~ | e2 ~ ~ ~")
    .sound("sine")
    .velocity("0.60 ~ ~ ~")
    .gain(1.00)
    .pan(0.5)
    .attack(0.50)
    .decay(1.00)
    .sustain(0.80)
    .release(2.00)
    .vib(2.0)
    .vibmod(0.05)
    .clip(2.00)
    .tremolo(16.0)
    .tremolodepth(0.25)
    .room(0.60)
    .size(0.78)
    .superimpose(add(0.03))
    .off(0.125, add(12)),

  note("[c4,e4,g4] ~ ~ [d4,f4,a4]")
    .sound("triangle")
    .velocity("0.45 ~ ~ 0.50")
    .gain(0.70)
    .pan(sine.range(0.30,0.70).slow(8))
    .attack(0.50)
    .decay(1.00)
    .sustain(0.80)
    .release(2.00)
    .clip(2.00)
    .room(0.65)
    .size(0.84)
    .delay(0.30)
    .delaytime(0.500)
    .delayfeedback(0.45)
)

Paste this into Strudel and press Ctrl+Enter to play!

How It Works

Input: Audio file or YouTube URL (downloaded via yt-dlp)
Cache Check: Skip re-processing if stems already cached
Stem Separation: Demucs AI model extracts melodic, bass, drums, vocals stems
Analysis: librosa detects BPM and musical key
Genre Detection: Auto-detect Brazilian funk/phonk based on audio characteristics
Transcription: Basic Pitch converts melodic audio to MIDI
Drum Detection: Onset detection extracts kick, snare, hi-hat patterns
Cleanup: Quantization, velocity filtering, noise removal
Loop Detection: Identifies repeating patterns with confidence scoring
Generation: Notes + drums converted to Strudel bar arrays with effect functions
Caching: Save versioned outputs for iteration
Audio Rendering: Optionally synthesize WAV preview

Audio Rendering & AI Analysis

Generate a WAV preview of the Strudel patterns without opening a browser:

# Render to cache directory (recommended)
./bin/midi-grep extract --url "..." --render auto

# Render to specific file
./bin/midi-grep extract --url "..." --render output.wav

BlackHole Recording (RECOMMENDED - 100% Accurate)

For perfect audio reproduction, record REAL Strudel playback using BlackHole virtual audio device:

# One-time setup
brew install blackhole-2ch  # Requires reboot

# Record Strudel playback (runs in background, no UI disturbance)
node scripts/node/dist/record-strudel-blackhole.js input.strudel -o output.wav -d 30

This approach:

Uses the real Strudel engine (not emulation)
100% accurate sound reproduction
Works with all Strudel features (samples, effects, etc.)
No endless gain/filter tuning needed
Runs invisibly - browser window hidden offscreen, doesn't steal focus
Automatic audio routing via Web Audio API setSinkId()

How it works:

Starts ffmpeg recording from BlackHole device
Opens strudel.dygy.app/embed in Puppeteer (window hidden at -32000,-32000)
Grants audio permissions, finds BlackHole device ID
Inserts code via CodeMirror dispatch API
Clicks Play, routes audio to BlackHole via getAudioContext().setSinkId()
Waits for samples to load, records for specified duration
Stops playback, closes browser

Browser runs invisibly:

Position: -32000,-32000 (far offscreen)
Size: 1x1 pixels
AppleScript hides Chromium process
Background throttling disabled

Node.js Synthesis Engine (`render-strudel-node.ts`)

The primary renderer uses TypeScript with proper Strudel pattern parsing:

Pattern Parsing:

Uses @strudel/mini v1.1.0 for accurate mini-notation parsing
Handles rests (~*N), chords ([a,b,c]), and sequences

Synthesis:

Kick drums: 808-style with pitch envelope (150→40Hz), amp decay, click transient
Snare: Dual-sine body (180Hz + 330Hz) + high-passed noise
Hi-hats: Metallic multi-frequency noise (open/closed variants)
Bass: Sawtooth + sub-octave sine, low-pass filtered
Lead (mids): Detuned saws + triangle with filter envelope
High: Odd-harmonic square wave + saw for brightness

Mix Balance:

Tuned for melodic content (mids 3x, highs 2.5x, bass 0.08x, drums 0.15x)
80Hz high-pass filter on master to reduce mud
Achieves ~79% similarity against melodic stems

Output: Mono 44.1kHz 16-bit WAV.

Python Synthesis Engine (`render_audio.py`)

Fallback renderer for AI-driven iterative improvement:

Kick drums: Pitch envelope with distortion (808 style)
Snare: Body tone + high-passed noise
Hi-hats: Filtered noise with decay envelope
Bass: Sawtooth + sub-octave, low-pass filtered
Vocal chops: Square wave with fast attack
Chord stabs: Filtered sawtooth
Lead: Triangle wave with vibrato

Output: Stereo 44.1kHz 16-bit WAV.

AI-Driven Mix Parameters (`audio_to_strudel_params.py`)

Analyzes the original audio to suggest optimal Strudel effect parameters:

Spectral analysis: Determines filter cutoffs, brightness
Dynamics analysis: Suggests compression, gain staging
Timbre matching: Recommends distortion, FM synthesis amounts
Spatial analysis: Determines reverb size, delay times
Envelope detection: Suggests ADSR values

The analysis feeds back into the renderer for better mix balance.

Audio Comparison (`compare_audio.py`)

Compares rendered output against original stems:

[7/7] Comparing rendered audio with original...

Metrics computed:

Spectral similarity: Centroid, bandwidth, rolloff comparison
Rhythmic similarity: Onset alignment, tempo consistency
Timbral similarity: MFCC distance, spectral flatness
Overall score: Weighted combination (0-100%)

This feedback loop helps improve the synthesis quality.

Output Caching

All outputs are cached for iteration and comparison:

.cache/stems/yt_VIDEO_ID/
├── piano.wav              # Separated piano/melodic stem
├── drums.wav              # Separated drums stem
├── output_v001.strudel    # Version 1 Strudel code
├── output_v001.json       # Version 1 metadata (BPM, key, style)
├── output_v002.strudel    # Version 2 (after re-running)
├── output_v002.json
├── output_latest.strudel  # Always points to latest
├── render_v001.wav        # Rendered audio for v1
└── render_v002.wav        # Rendered audio for v2

Each run creates a new version, allowing you to:

Compare different generations
Track improvements over time
Iterate on the output without losing previous work

Metadata stored (output_vXXX.json):

{
  "code": "// MIDI-grep output...",
  "bpm": 136,
  "key": "C# minor",
  "style": "brazilian_funk",
  "genre": "brazilian_funk",
  "notes": 497,
  "drum_hits": 287,
  "version": 1,
  "created_at": "2025-02-03T01:24:00Z"
}

Project Structure

midi-grep/
├── cmd/midi-grep/          # CLI entrypoint
├── internal/
│   ├── audio/              # File validation, stems, YouTube
│   ├── analysis/           # BPM & key detection
│   ├── midi/               # Transcription & cleanup
│   ├── pipeline/           # Orchestration (calls Python AI for code gen)
│   ├── server/             # Web interface (HTMX)
│   ├── cache/              # Stem + output caching
│   └── report/             # Go HTML report generation
├── scripts/
│   ├── midi-grep.sh        # Main CLI wrapper
│   ├── node/               # BlackHole recorder (TypeScript)
│   └── python/             # AI + ML scripts (code gen, comparison, LLM)
├── context/                # AWOS product docs
├── Makefile
├── Dockerfile
└── README.md

Configuration

Quantization

Controls note timing precision:

4 = Quarter notes (simplified, loose timing)
8 = Eighth notes (moderate detail)
16 = Sixteenth notes (full detail, default)

Audio Requirements

Formats: WAV, MP3
Max size: 100MB
Best results: Clear piano recordings, minimal background noise

Docker

# Build image
docker build -t midi-grep .

# Run extraction
docker run -v $(pwd):/data midi-grep extract --input /data/track.wav

# Run server
docker run -p 8080:8080 midi-grep serve

Development

# Build
make build

# Run tests
make test

# Install deps
make deps

# Start dev server
make serve

Tech Stack

Backend: Go 1.21+, Chi router
Frontend: HTMX, PicoCSS (no JavaScript frameworks)
Audio Processing:
- Demucs (stem separation)
- Basic Pitch (audio-to-MIDI)
- librosa (analysis)
CLI: Cobra

Troubleshooting

"Demucs failed" or "Spleeter not installed"

./scripts/midi-grep.sh install

"yt-dlp not found"

brew install yt-dlp
# or
pip install yt-dlp

Python version issues

brew install python@3.11

Slow processing

Stem separation takes 1-2 minutes for a 3-minute track
First run downloads ML models (~1GB)

License

MIT

Credits

Demucs - Meta's audio source separation
Basic Pitch - Spotify's audio-to-MIDI
librosa - Audio analysis
Strudel - Live coding environment
HTMX - HTML-driven interactivity

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.awos		.awos
.claude		.claude
clickhouse-init		clickhouse-init
cmd/midi-grep		cmd/midi-grep
context		context
docs		docs
internal		internal
scripts		scripts
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
Modelfile		Modelfile
README.md		README.md
ROADMAP-DYNAMICS.md		ROADMAP-DYNAMICS.md
TEST_TRACKS.md		TEST_TRACKS.md
docker-compose.clickhouse.yml		docker-compose.clickhouse.yml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

MIDI-grep

Features

Architecture

Complete Orchestration Flow

External Dependencies

Module Structure

Data Flow

Quick Start

Installation

Prerequisites

Install

Usage

Command Line

CLI Options

Default Analysis Features

AI-Driven Code Improvement

Web Interface

Direct Binary Usage

Example Output

Default Format (Bar Arrays + Effect Functions)

Brazilian Funk/Phonk (auto-detected)

Jazz Style (with swing, perlin LFO, vibrato)

Lofi Style (with bitcrush, coarse, echo, superimpose, degradeBy)

Synth Style (with FM synthesis, phaser, envelope, off, superimpose)

Raw Style (pure oscillators with FM and filter envelope)

Chiptune Style (8-bit retro with crush and coarse)

Ambient Style (sine waves with heavy reverb)

How It Works

Audio Rendering & AI Analysis

BlackHole Recording (RECOMMENDED - 100% Accurate)

Node.js Synthesis Engine (render-strudel-node.ts)

Python Synthesis Engine (render_audio.py)

AI-Driven Mix Parameters (audio_to_strudel_params.py)

Audio Comparison (compare_audio.py)

Output Caching

Project Structure

Configuration

Quantization

Audio Requirements

Docker

Development

Tech Stack

Troubleshooting

"Demucs failed" or "Spleeter not installed"

"yt-dlp not found"

Python version issues

Slow processing

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Node.js Synthesis Engine (`render-strudel-node.ts`)

Python Synthesis Engine (`render_audio.py`)

AI-Driven Mix Parameters (`audio_to_strudel_params.py`)

Audio Comparison (`compare_audio.py`)

Packages