Skip to content

dygy/MIDI-grep

Repository files navigation

MIDI-grep

Extract piano riffs from audio files or YouTube videos and generate Strudel code for live coding.

Audio/YouTube → Stem Separation → MIDI Transcription → Strudel Code

Features

  • YouTube Support: Paste a URL, get playable code
  • AI-Powered Separation: Demucs isolates melodic/bass/drums/vocals stems
  • Drum Pattern Detection: Extracts kick, snare, hi-hat patterns automatically
  • Accurate Transcription: Spotify's Basic Pitch for audio-to-MIDI
  • BPM & Key Detection: Automatic tempo and musical key analysis with confidence scores
  • Detection Candidates: Shows top 5 candidates for key, BPM, time signature, and style in output header
  • Smart Caching: Stems cached by URL/file hash; auto-invalidates when processing scripts change
  • Chord Mode: Alternative chord-based generation for electronic/funk music (--chords)
  • Loop Detection: Automatically identifies repeating patterns (1, 2, 4, or 8 bar loops) with confidence scoring
  • Genre Auto-Detection: Automatically detects genre and uses specialized generators:
    • Brazilian Funk: BPM 130-145 (136 typical), mid-heavy spectrum, vocal chops → uses tamborzão templates
    • Brazilian Phonk: BPM 80-100 or 145-180, darker sound → phonk-style drums
    • Retro Wave/Synthwave: BPM 130-170, longer synth notes → synthwave style
    • Style Detection: Detects style based on BPM, key (minor/major), and note density
  • Deep Learning Genre Detection: CLAP (Contrastive Language-Audio Pretraining) model for zero-shot audio classification (enabled by default)
  • Manual Genre Override: --genre flag to force specific genre when auto-detection fails
  • Genre-Aware Sound RAG: LLM prompts include only ~15 genre-relevant sounds (not all 196), reducing hallucinated sound names and saving ~760 tokens per call
  • AI-Driven Audio Rendering: Synthesize WAV previews with AI-suggested mix parameters
    • Spectral/dynamics/timbre analysis of original audio
    • Automatic effect parameter optimization
    • Rendered vs original comparison with similarity scoring
  • Self-Contained HTML Report: Single-file report with everything embedded
    • Audio Studio Player: Two-section stem mixer (Original + Rendered)
    • Solo/Mute controls per stem with waveform visualization
    • A/B comparison mode (toggle between original and rendered)
    • Per-stem comparison charts (bass, drums, melodic)
    • Visual comparison charts (spectrograms, chromagrams, frequency bands)
    • Copyable analysis data tables
    • Strudel code with copy button
  • Dynamic Strudel Output: Rich patterns with per-voice effects
    • .velocity() patterns with dynamic range expansion for expressive dynamics
    • Style-specific accent patterns (downbeat, backbeat, offbeat)
    • Compressor effect for dynamics control (electronic)
    • Per-voice filtering (HPF/LPF by register)
    • Stereo panning with style-appropriate LFO shapes (sine, perlin, saw)
    • ADSR envelopes for synth/orchestral styles
    • Voice-appropriate reverb, delay, phaser, vibrato
    • Style-specific FX: bitcrush/coarse for lofi, distort for electronic
    • FM synthesis (.fm, .fmh, .fmdecay) for synth/electronic richness
    • Tremolo/amplitude modulation for synth/orchestral movement
    • Filter envelope (.lpenv, .lpattack, .lpdecay) for dynamic sweeps
    • Sidechain/ducking (.duck, .duckattack, .duckdepth) for electronic pumping
    • Ring modulation (.ring, .ringfreq) for metallic timbres
    • Chorus (.chorus, .chorusDepth) for warm width (jazz/soul)
    • Leslie effect (.leslie, .leslieSpeed) for organ-style modulation
    • Shape/saturation (.shape) for harmonic warmth
    • Pitch envelope (.pitchenv, .pitchattack) for synth bass punch
    • Pattern transforms: swing for jazz, degradeBy for lofi, iter for electronic/lofi variation
    • .clip() for note duration control (staccato/legato/sustained)
    • .echo() for rhythmic repeats
    • .superimpose() for detuned layering (synth richness)
    • .off() for harmonic layering with time offset
    • .layer() for parallel transformations (orchestral octave doubling)
    • .echoWith() for sophisticated pitch-shifted echoes (electronic)
    • .scale() for key-aware quantization
    • Section detection with time markers
  • Sound Style Presets: 20+ styles with auto-detection
    • GM Soundfont: piano, synth, orchestral, electronic, jazz, lofi
    • Groove Styles: funk, soul, house, trance (auto-detected by BPM/key)
    • Raw Oscillators: raw, chiptune, ambient, drone, plus supersaw, ZZFX synths
    • Sample-based: mallets, plucked, keys, pad, percussive
    • Genre-specific: synthwave, darkwave, minimal, industrial, newage
  • Extended Sound Palette:
    • Supersaw: Fat detuned sawtooth for bass/leads
    • ZZFX Synths: 8-bit style (z_sawtooth, z_square, z_triangle, z_tan, z_noise)
    • Wavetables: Digital waveforms (wt_digital, wt_vgame, wt_piano)
    • Noise: Textural elements (white, pink, brown, crackle)
  • Web Interface: HTMX-powered UI, no JavaScript frameworks
  • CLI Tool: Full-featured command-line interface

Architecture

                              MIDI-grep Pipeline

   INPUT            SEPARATE          ANALYZE          AI GENERATE        OUTPUT
     │                 │                 │                  │                │
     ▼                 ▼                 ▼                  ▼                ▼
┌─────────┐     ┌────────────┐     ┌───────────┐     ┌──────────────┐    ┌─────────┐
│ YouTube │     │  Demucs    │     │  librosa  │     │ Ollama LLM   │    │ Strudel │
│ yt-dlp  │────▶│   stem     │────▶│  BPM/Key  │────▶│ AI code gen  │───▶│  code   │
│ WAV/MP3 │     │ separation │     │  + CLAP   │     │ + iteration  │    │  + WAV  │
└─────────┘     └────────────┘     └───────────┘     └──────────────┘    └─────────┘
                    │                  │                    │
                    ▼                  ▼                    ▼
              ┌──────────┐      ┌───────────┐       ┌──────────────┐
              │ melodic  │      │  Genre    │       │  BlackHole   │
              │ bass     │      │ Detection │       │  Recorder    │
              │ drums    │      │           │       │ (real audio) │
              │ vocals   │      └───────────┘       └──────────────┘
              └──────────┘

────────────────────────────────────────────────────────────────────────────────
                              Tech Stack

  Go 1.21+                              Python 3.11+
  ├── CLI (Cobra)                       ├── demucs      - stem separation
  ├── HTTP (Chi)                        ├── basic-pitch - audio to MIDI
  └── Pipeline orchestration            ├── librosa     - audio analysis
                                        ├── ollama_agent - agentic LLM
                                        └── ai_improver - iterative improvement

  TypeScript/Node.js                    External
  ├── BlackHole recorder (primary)      ├── yt-dlp     - YouTube download
  └── Puppeteer + ffmpeg                ├── ffmpeg     - audio conversion
                                        ├── Ollama     - local LLM
  Frontend                              └── ClickHouse - learning DB
  ├── HTMX
  ├── PicoCSS
  └── SSE updates

Complete Orchestration Flow

This diagram shows the full pipeline including AI-driven iteration (worst case: all iterations run).

flowchart TB
    %% ===== INPUT & SEPARATION =====
    CLI["CLI: extract --url/--file"] --> YT["yt-dlp download"]
    YT --> CheckCache{"Cached stems?"}
    CheckCache -->|No| Demucs["Demucs: melodic + drums + bass + vocals"]
    CheckCache -->|Yes| LoadStems["Load cached stems"]
    Demucs --> Analysis
    LoadStems --> Analysis

    %% ===== ANALYSIS =====
    Analysis["librosa + CLAP"] --> Profile["AudioProfile extraction:<br/>BPM, key, genre, spectrum"]
    Profile --> Features["Spectrum, dynamics,<br/>rhythm, timbre, structure"]

    %% ===== TRANSCRIPTION =====
    Features --> BasicPitch["Basic Pitch → MIDI notes"]
    Features --> DrumDetect["Drum onset → bd/sd/hh"]
    BasicPitch --> Cleanup["Quantize + Simplify"]
    DrumDetect --> Cleanup

    %% ===== INITIAL GENERATION (LLM-first) =====
    Cleanup --> CheckBest{"Best previous run<br/>in ClickHouse?"}
    CheckBest --> QueryBest["Query best code + similarity<br/>for this track_hash"]
    QueryBest --> CH[("ClickHouse<br/>runs + knowledge")]

    CH -->|Found| StartFromBest["Start from best code"]
    CH -->|Not found| GenreRAG["Genre Sound RAG<br/>retrieve_genre_context()<br/>~15 sounds per genre"]
    GenreRAG --> FreshGen["Ollama generates fresh<br/>Strudel code<br/>(with genre palette)"]

    StartFromBest --> InitCode["output.strudel"]
    FreshGen --> InitCode

    %% ===== RENDER via BlackHole =====
    InitCode --> Render["BlackHole Recorder<br/>(Puppeteer + ffmpeg)<br/>100% real Strudel audio"]
    Render --> StemSep["Demucs: separate render<br/>into bass/drums/melodic"]
    StemSep --> Compare["compare_audio.py<br/>(MAE, not cosine)"]
    Compare --> Metrics["Per-stem diffs:<br/>bass, drums, melodic<br/>+ per-band analysis"]

    %% ===== TWO-PHASE IMPROVEMENT LOOP =====
    Metrics --> LoopCheck{"iteration < N AND<br/>similarity < target?"}

    LoopCheck -->|Yes| Phase1["Phase 1: Deterministic<br/>optimize_parameters()<br/>gain/lpf/hpf/room math"]
    Phase1 --> Phase1Check{"Changes made?"}
    Phase1Check -->|Yes| QuickRender["BlackHole render<br/>(full track)"]
    Phase1Check -->|No| Phase2["Phase 2: Constrained LLM<br/>+ Genre Sound RAG<br/>(~15 valid sounds per genre)"]
    Phase2 --> QuickRender

    QuickRender --> Regress{"new > best?"}
    Regress -->|No| Revert["REVERT to best code"]
    Regress -->|Yes| Accept["Accept + store to ClickHouse"]
    Revert --> LoopCheck
    Accept --> LoopCheck

    %% ===== VALIDATION =====
    Phase2 --> Validate{"Valid Strudel?<br/>No .peak/.volume/.eq<br/>Valid sound names?"}
    Validate -->|No| LoopCheck
    Validate -->|Yes| QuickRender

    %% ===== OUTPUT =====
    LoopCheck -->|Done| BatchStems["Batch Demucs: separate<br/>each iteration render<br/>into melodic/drums/bass"]
    BatchStems --> FinalRender["Full-length BlackHole render<br/>+ Demucs stem separation"]
    FinalRender --> Report["generate_report.py"]
    Report --> Output["vNNN/<br/>├── output.strudel<br/>├── render.wav + stems<br/>├── render_v*_{melodic,drums,bass}.mp3<br/>├── iterations.json (with stem paths)<br/>├── comparison.json<br/>├── stem_comparison.json<br/>└── report.html"]

    %% ===== PER-TRACK LEARNING =====
    CH -.->|"Best runs + learned<br/>parameter deltas"| Phase1

    %% ===== STYLING =====
    style CLI fill:#1565c0,color:#fff
    style Demucs fill:#2e7d32,color:#fff
    style Analysis fill:#7b1fa2,color:#fff
    style Profile fill:#7b1fa2,color:#fff
    style Render fill:#512da8,color:#fff
    style StemSep fill:#2e7d32,color:#fff
    style Compare fill:#00796b,color:#fff
    style Phase1 fill:#e65100,color:#fff
    style Phase2 fill:#c62828,color:#fff
    style CH fill:#303f9f,color:#fff
    style FinalRender fill:#512da8,color:#fff
    style Report fill:#6a1b9a,color:#fff
    style Output fill:#1b5e20,color:#fff
    style GenreRAG fill:#00897b,color:#fff
    style FreshGen fill:#f9a825,color:#000
Loading

External Dependencies

Component Technology Purpose
yt-dlp Python CLI YouTube audio download
ffmpeg/ffprobe C binary Audio codec, duration, BlackHole recording
Demucs PyTorch Stem separation (melodic/drums/bass/vocals)
Basic Pitch TensorFlow Audio → MIDI transcription
librosa Python BPM, key, onset detection
CLAP PyTorch Zero-shot genre classification
Ollama Go binary Local LLM (llama3:8b) for code gen + iteration
ClickHouse C++ binary Learning database (runs + knowledge)
Puppeteer Node.js Browser automation for BlackHole recording
BlackHole macOS driver Virtual audio device for recording
strudel.dygy.app Web Self-hosted Strudel embed (recording target)

Module Structure

midi-grep/
├── cmd/midi-grep/           # CLI entrypoint (Go)
│   └── main.go              # Cobra commands: extract, serve, generative
│
├── internal/                # Go packages
│   ├── audio/               # Input validation, YouTube download, stem separation
│   ├── analysis/            # BPM & key detection (calls Python)
│   ├── midi/                # Transcription & cleanup (calls Python)
│   ├── pipeline/            # Orchestrates full flow (calls Python AI for code gen)
│   ├── server/              # HTTP server, HTMX templates, SSE
│   ├── exec/                # Python subprocess runner
│   ├── cache/               # Stem + output caching with versioning
│   ├── report/              # Go HTML report generation
│   └── generative/          # RAVE neural synthesizer pipeline
│
├── scripts/
│   ├── midi-grep.sh         # Main CLI wrapper (Bash)
│   ├── node/                # TypeScript BlackHole recorder + rendering
│   │   ├── src/
│   │   │   └── record-strudel-blackhole.ts  # BlackHole recorder (primary)
│   │   ├── dist/            # Compiled JavaScript output
│   │   ├── package.json     # puppeteer, typescript
│   │   └── tsconfig.json    # TypeScript configuration
│   └── python/              # Python ML + AI scripts
│       ├── separate.py      # Demucs stem separation
│       ├── transcribe.py    # Basic Pitch audio → MIDI
│       ├── analyze.py       # librosa BPM/key detection
│       ├── cleanup.py       # MIDI quantization & filtering
│       ├── detect_drums.py  # Drum onset detection & classification
│       ├── detect_genre_dl.py    # CLAP deep learning genre detection
│       ├── ai_improver.py   # AI-driven iterative improvement (Ollama/Claude)
│       ├── ollama_codegen.py # Ollama LLM code gen with genre sound RAG
│       ├── ollama_agent.py  # Agentic LLM with ClickHouse memory + genre RAG
│       ├── sound_selector.py     # Sound catalog (67 drums, 128 GM) + genre RAG retrieval
│       ├── compare_audio.py      # Rendered vs original comparison (MAE)
│       ├── generate_report.py    # Self-contained HTML report
│       ├── analyze_synth_params.py  # AI audio analysis for synthesis
│       ├── render_audio.py       # WAV synthesis (fallback renderer)
│       ├── render_with_models.py # Render with granular models
│       └── thin_patterns.py      # Pattern density control
│
└── context/                 # AWOS documentation
    ├── product/             # Product definition, roadmap
    └── spec/                # Feature specifications

Data Flow

1. INPUT
   YouTube URL ──▶ yt-dlp ──▶ audio.wav
   Local file ─────────────▶ audio.wav

2. CACHE CHECK
   URL/file hash ──▶ Check .cache/stems/ ──▶ Use cached if valid

3. STEM SEPARATION
   audio.wav ──▶ Demucs ──▶ melodic.mp3 + drums.mp3 + bass.mp3 + vocals.mp3

4. ANALYSIS
   melodic.mp3 ──▶ librosa + CLAP ──▶ { bpm: 136, key: "C# minor", genre: "brazilian_funk" }

5. TRANSCRIPTION (parallel)
   melodic.mp3 ──▶ Basic Pitch ──▶ raw.mid ──▶ cleanup ──▶ notes.json
   drums.mp3 ──▶ onset detection ──▶ drum_hits.json

6. AI CODE GENERATION (LLM-first)
   notes + drums + analysis ──▶ Genre Sound RAG (~15 sounds) ──▶ Ollama ──▶ code.strudel
   ClickHouse best runs ──▶ start from best known code

7. RENDER & ITERATE
   code.strudel ──▶ BlackHole Recorder ──▶ render.wav
   render.wav ──▶ compare_audio.py ──▶ similarity score
   similarity < target? ──▶ AI improver ──▶ iterate

Quick Start

# Install dependencies
./scripts/midi-grep.sh install

# Extract from YouTube
./scripts/midi-grep.sh extract --url "https://youtu.be/Q4801HzWZfg"

# Extract from local file
./scripts/midi-grep.sh extract --file track.wav --output riff.strudel

# Start web interface
./scripts/midi-grep.sh serve --port 8080

Installation

Prerequisites

  • Go 1.21+: brew install go
  • Python 3.11+: brew install python@3.11
  • yt-dlp (for YouTube): brew install yt-dlp
  • ffmpeg (for audio processing): brew install ffmpeg

Install

# Clone the repository
git clone https://github.com/dygy/midi-grep.git
cd midi-grep

# Install all dependencies (Go + Python)
./scripts/midi-grep.sh install

This installs:

  • Python packages: demucs, basic-pitch, librosa, pretty_midi
  • Builds the Go binary

Usage

Command Line

# Basic extraction from YouTube
./scripts/midi-grep.sh extract --url "https://youtu.be/VIDEO_ID"

# With options
./scripts/midi-grep.sh extract \
  --url "https://youtu.be/VIDEO_ID" \
  --quantize 8 \
  --output riff.strudel \
  --midi piano.mid

# From local file
./scripts/midi-grep.sh extract --file track.wav

# Copy to clipboard (macOS)
./scripts/midi-grep.sh extract --url "..." --copy

CLI Options

Option Short Description
--url -u YouTube URL to extract from
--file -f Local audio file (WAV/MP3)
--output -o Output file for Strudel code
--quantize -q Quantization: 4, 8, or 16 (default: 16)
--midi -m Also save cleaned MIDI file
--copy -c Copy result to clipboard
--verbose -v Show verbose output
--render - Render audio to WAV (default: auto saves to cache, none to disable). Always outputs 3 stems
--quality - Stem separation quality: fast, normal (default), high, best
--chords - Use chord-based generation (better for electronic/funk)
--no-cache - Skip stem cache, force fresh extraction
--drums - Include drum patterns (default: on)
--drums-only - Extract only drums (skip melodic processing)
--drum-kit - Drum kit: tr808, tr909, linn, acoustic, lofi
--style - Sound style (auto, piano, synth, electronic, house, etc.)
--brazilian-funk - Force Brazilian funk mode (auto-detected normally)
--genre - Manual genre override: brazilian_funk, brazilian_phonk, retro_wave, synthwave, trance, house, lofi, jazz
--deep-genre true Use deep learning (CLAP) for genre detection (skipped when --genre is specified)
--iterate 5 AI-driven improvement iterations (default: 5, always enabled)
--target-similarity 0.85 Target similarity score to stop iteration (default: 0.85)
--ollama true Use Ollama (free local LLM) for AI improvement
--ollama-model - Ollama model to use (default: llama3:8b)

Default Analysis Features

All analysis features are enabled by default:

  • Stem Rendering: Outputs 3 separate stems (render_bass.wav, render_drums.wav, render_melodic.wav)
  • Per-Stem Comparison: Generates charts comparing each rendered stem vs original
  • Overall Comparison: Combined frequency/MFCC/chroma comparison chart
  • AI Improvement: 5 iterations targeting 85% similarity
  • Iteration Stem Separation: Batch Demucs on each iteration render → per-iteration melodic/drums/bass stems
  • HTML Report: Audio studio with Solo/Mute controls, per-iteration stem tracks, A/B comparison, shimmer loading, waveforms

AI-Driven Code Improvement

MIDI-grep can iteratively improve Strudel code using AI analysis:

# Run 5 iterations of AI improvement (uses Ollama by default - free & local)
./bin/midi-grep extract --url "..." --iterate 5

# Target 75% similarity, max 10 iterations
./bin/midi-grep extract --url "..." --iterate 10 --target-similarity 0.75

# Use a different Ollama model
./bin/midi-grep extract --url "..." --iterate 5 --ollama-model llama3:8b

How it works:

  1. Render initial Strudel code to WAV
  2. Compare against original audio (MFCC, chroma, frequency bands)
  3. Genre Sound RAG: Inject ~15 genre-appropriate sounds into LLM prompt (not the full 196-sound catalog)
  4. LLM analyzes gaps and suggests changes using only valid genre sounds
  5. Apply changes and repeat until target reached
  6. Batch stem separation: Demucs splits each iteration render into melodic/drums/bass
  7. Store all runs in ClickHouse for learning
  8. Generate report with per-iteration stem tracks (mute buttons, shimmer loading)

Ollama Setup (one-time, free):

brew install ollama
ollama serve
ollama pull llama3:8b  # 3.8GB download

ClickHouse Learning Storage:

All improvement runs are stored in ClickHouse for incremental learning:

  • midi_grep.runs - Every render attempt with similarity scores
  • midi_grep.knowledge - Learned parameter improvements that transfer to future tracks

ClickHouse Local is auto-downloaded and requires no setup. Data stored in .clickhouse/db/.

# Query your improvement history
./bin/clickhouse local --path .clickhouse/db \
  --query "SELECT track_hash, version, similarity_overall FROM midi_grep.runs ORDER BY created_at DESC LIMIT 5"

Web Interface

./scripts/midi-grep.sh serve --port 8080

Open http://localhost:8080 in your browser:

  • Drag & drop audio files
  • Paste YouTube URLs
  • Real-time progress updates
  • Copy Strudel code with one click

Direct Binary Usage

# Build
make build

# CLI
./bin/midi-grep extract --input track.wav
./bin/midi-grep extract --url "https://youtu.be/..."
./bin/midi-grep serve --port 8080

# Help
./bin/midi-grep --help
./bin/midi-grep extract --help

Example Output

Default Format (Bar Arrays + Effect Functions)

The default output uses bar arrays for easy mixing and matching:

// MIDI-grep output
// BPM: 136, Key: C# minor
// Notes: 497 (bass: 17, mid: 371, high: 109)
// Drums: 287 hits (bd: 72, sd: 78, hh: 5)
// Kit: tr808
// Style: house

setcps(136/60/4)

// Bar arrays - mix & match freely
let bass = [
  "cs2 ~*7 cs2",
  "~*6 cs2",
  "~*7 cs2"
]

let mid = [
  "cs4 ~*3 e4 ~*2 cs4",
  "~*4 cs4 ~*2 ds4",
  "cs4 ~*3 e4"
]

let drums = [
  "bd ~ sd ~ bd sd ~ ~",
  "bd ~ sd ~ bd ~ sd bd",
  "~ oh ~ sd ~ oh"
]

// Effects (applied at playback)
let bassFx = p => p.sound("supersaw").lpf(800).room(0.12)
let midFx = p => p.sound("gm_pad_poly").lpf(4000).room(0.21)
let drumsFx = p => p.bank("RolandTR808").room(0.15)

// Play all
$: stack(
  bassFx(cat(...bass.map(b => note(b)))),
  midFx(cat(...mid.map(b => note(b)))),
  drumsFx(cat(...drums.map(b => s(b))))
)

// Mix & match:
// $: bassFx(note(bass[0]))
// $: cat(...bass.slice(0,4).map(b => note(b)))

Brazilian Funk/Phonk (auto-detected)

When the tool detects Brazilian funk characteristics (BPM 125-155, vocal chop transcription patterns, low bass content), it automatically switches to template-based generation:

// MIDI-grep output (Brazilian Funk mode)
// BPM: 136, Key: C# minor
// Genre: Brazilian Funk / Phonk
// Pattern: Tamborzão

setcps(136/60/4)

// Tamborzão drum pattern (2 bars)
let tamborzao = `
  bd ~ ~ bd ~ ~ bd ~ | ~ bd ~ ~ bd ~ ~ ~
`

let snare = `
  ~ ~ sd ~ ~ ~ sd ~ | ~ ~ sd ~ ~ ~ sd ~
`

let hats = `
  hh hh hh hh hh hh hh hh | hh hh oh hh hh hh oh hh
`

// 808 Bass (follows kick pattern)
let bassPattern = `
  c#1 ~ ~ c#1 ~ ~ c#1 ~ | ~ c#1 ~ ~ g#1 ~ ~ ~
`

// Synth stab (phonk style)
let stab = `
  [c#4,e4,g#4] ~ ~ ~ [c#4,e4,g#4] ~ ~ ~ | ~ ~ [c#4,e4,g#4] ~ ~ ~ ~ ~
`

// Effects
let drumFx = p => p.bank("RolandTR808").room(0.1).gain(1.0)
let bassFx = p => p.sound("sawtooth")
    .lpf(200).gain(1.2).distort(0.3)
    .attack(0.001).decay(0.3).sustain(0.2).release(0.1)
let stabFx = p => p.sound("square")
    .lpf(2000).gain(0.6).distort(0.2)
    .attack(0.01).decay(0.1).sustain(0.3).release(0.05)
    .room(0.2)

// Play all
$: stack(
  drumFx(s(tamborzao)),
  drumFx(s(snare)),
  drumFx(s(hats).gain(0.6)),
  bassFx(note(bassPattern)),
  stabFx(note(stab))
)

Jazz Style (with swing, perlin LFO, vibrato)

// MIDI-grep output
// BPM: 89, Key: E minor
// Notes: 48 (bass: 12, mid: 28, high: 8)
// Style: jazz
// Duration: 32.0 beats
// Sections: 0:00 intro | 0:08 main

setcps(89/60/4)

$: stack(
  // bass (12 notes)
  note("e2 ~ b2 ~ | e2 g2 ~ b2")
    .sound("gm_acoustic_bass")
    .velocity("0.75 ~ 0.68 ~ | 0.82 0.60 ~ 0.70")
    .gain(1.20)
    .pan(0.5).hpf(50).lpf(800).vib(3.0).vibmod(0.08).room(0.20).size(0.30)
    .swing(0.10),

  // mid (28 notes)
  note("[e4,g4] [b3,d4] fs4 [a3,cs4]")
    .sound("gm_epiano1")
    .velocity("0.65 0.72 0.80 0.58")
    .pan(perlin.range(0.44,0.56).slow(4)).hpf(200).lpf(4000).vib(3.0).vibmod(0.08).room(0.35).size(0.50)
    .swing(0.10),

  // high (8 notes)
  note("b5 ~ ~ e5 | fs5 ~ gs5 ~")
    .sound("gm_vibraphone")
    .velocity("0.55 ~ ~ 0.62 | 0.70 ~ 0.65 ~")
    .gain(0.80)
    .pan(perlin.range(0.38,0.62).slow(3)).hpf(400).lpf(10000).vib(3.0).vibmod(0.08).room(0.40).size(0.60).delay(0.15).delaytime(0.375).delayfeedback(0.30)
    .swing(0.10)
)

Lofi Style (with bitcrush, coarse, echo, superimpose, degradeBy)

// Style: lofi
$: stack(
  note("c3 ~ g2 ~ | c3 e3 ~ g2")
    .sound("gm_electric_bass_finger")
    .velocity("0.70 ~ 0.65 ~")
    .pan(0.5).hpf(50).lpf(800).clip(1.10).crush(10).coarse(4).room(0.18).size(0.27).echo(2,0.125,0.40)
    .superimpose(add(0.03))
    .swing(0.05).degradeBy(0.05),
  // ...
)

Synth Style (with FM synthesis, phaser, envelope, off, superimpose)

// Style: synth
$: stack(
  note("c3 ~ g3 ~ | c3 e3 ~ g3")
    .sound("gm_synth_bass_1")
    .velocity("0.75 ~ 0.70 ~")
    .gain(1.30)
    .pan(saw.range(0.43,0.57).slow(4)).hpf(50).lpf(800)
    .attack(0.003).decay(0.08).sustain(0.90).release(0.05)
    .phaser(0.50).phaserdepth(0.30).vib(4.0).vibmod(0.10)
    .fm(1.5).fmh(1.0).fmdecay(0.30).fmsustain(0.50)
    .room(0.16).size(0.24).echo(2,0.125,0.40)
    .superimpose(add(0.03))
    .off(0.125, add(12)),
  // ...
)

Raw Style (pure oscillators with FM and filter envelope)

// Style: raw - using Strudel's built-in oscillators
$: stack(
  note("c2 ~ g2 ~ | c2 e2 ~ g2")
    .sound("sawtooth")
    .velocity("0.80 ~ 0.75 ~")
    .gain(0.80)
    .pan(0.5)
    .hpf(50)
    .lpf(800)
    .lpattack(0.010)
    .lpdecay(0.20)
    .lpsustain(0.50)
    .lprelease(0.15)
    .lpenv(3000)
    .attack(0.003)
    .decay(0.05)
    .sustain(0.70)
    .release(0.10)
    .phaser(0.60)
    .phaserdepth(0.40)
    .fm(2.0)
    .fmh(1.5)
    .room(0.10)
    .size(0.15)
    .superimpose(add(0.03))
    .off(0.125, add(12)),

  note("c4 e4 g4 b4")
    .sound("square")
    .velocity("0.65 0.70 0.75 0.68")
    .gain(0.60)
    .pan(saw.range(0.40,0.60).slow(4))
    .lpf(4000)
    .attack(0.005)
    .decay(0.10)
    .sustain(0.70)
    .release(0.20)
    .room(0.25)
    .size(0.35),

  note("c5 ~ e5 g5")
    .sound("triangle")
    .velocity("0.55 ~ 0.60 0.58")
    .gain(0.50)
    .pan(saw.range(0.30,0.70).slow(3))
    .lpf(8000)
    .room(0.30)
    .size(0.40)
    .delay(0.20)
    .delaytime(0.375)
    .delayfeedback(0.35)
)

Chiptune Style (8-bit retro with crush and coarse)

// Style: chiptune - 8-bit video game aesthetic
$: stack(
  note("c3 c3 g2 g2 | a2 a2 e2 e2")
    .sound("square")
    .velocity("0.70 0.65 0.70 0.65")
    .gain(0.70)
    .pan(0.5)
    .attack(0.001)
    .decay(0.05)
    .sustain(0.60)
    .release(0.10)
    .clip(0.70)
    .crush(8)
    .coarse(8)
    .room(0.09)
    .size(0.14)
    .iter(2),

  note("e4 g4 c5 g4 | a4 c5 e5 c5")
    .sound("square")
    .velocity("0.50 0.55 0.60 0.55")
    .gain(0.50)
    .crush(8)
    .coarse(8)
    .room(0.15)
    .size(0.21)
    .echo(3, 0.125, 0.50)
)

Ambient Style (sine waves with heavy reverb)

// Style: ambient - atmospheric pads
$: stack(
  note("c2 ~ ~ ~ | e2 ~ ~ ~")
    .sound("sine")
    .velocity("0.60 ~ ~ ~")
    .gain(1.00)
    .pan(0.5)
    .attack(0.50)
    .decay(1.00)
    .sustain(0.80)
    .release(2.00)
    .vib(2.0)
    .vibmod(0.05)
    .clip(2.00)
    .tremolo(16.0)
    .tremolodepth(0.25)
    .room(0.60)
    .size(0.78)
    .superimpose(add(0.03))
    .off(0.125, add(12)),

  note("[c4,e4,g4] ~ ~ [d4,f4,a4]")
    .sound("triangle")
    .velocity("0.45 ~ ~ 0.50")
    .gain(0.70)
    .pan(sine.range(0.30,0.70).slow(8))
    .attack(0.50)
    .decay(1.00)
    .sustain(0.80)
    .release(2.00)
    .clip(2.00)
    .room(0.65)
    .size(0.84)
    .delay(0.30)
    .delaytime(0.500)
    .delayfeedback(0.45)
)

Paste this into Strudel and press Ctrl+Enter to play!

How It Works

  1. Input: Audio file or YouTube URL (downloaded via yt-dlp)
  2. Cache Check: Skip re-processing if stems already cached
  3. Stem Separation: Demucs AI model extracts melodic, bass, drums, vocals stems
  4. Analysis: librosa detects BPM and musical key
  5. Genre Detection: Auto-detect Brazilian funk/phonk based on audio characteristics
  6. Transcription: Basic Pitch converts melodic audio to MIDI
  7. Drum Detection: Onset detection extracts kick, snare, hi-hat patterns
  8. Cleanup: Quantization, velocity filtering, noise removal
  9. Loop Detection: Identifies repeating patterns with confidence scoring
  10. Generation: Notes + drums converted to Strudel bar arrays with effect functions
  11. Caching: Save versioned outputs for iteration
  12. Audio Rendering: Optionally synthesize WAV preview

Audio Rendering & AI Analysis

Generate a WAV preview of the Strudel patterns without opening a browser:

# Render to cache directory (recommended)
./bin/midi-grep extract --url "..." --render auto

# Render to specific file
./bin/midi-grep extract --url "..." --render output.wav

BlackHole Recording (RECOMMENDED - 100% Accurate)

For perfect audio reproduction, record REAL Strudel playback using BlackHole virtual audio device:

# One-time setup
brew install blackhole-2ch  # Requires reboot

# Record Strudel playback (runs in background, no UI disturbance)
node scripts/node/dist/record-strudel-blackhole.js input.strudel -o output.wav -d 30

This approach:

  • Uses the real Strudel engine (not emulation)
  • 100% accurate sound reproduction
  • Works with all Strudel features (samples, effects, etc.)
  • No endless gain/filter tuning needed
  • Runs invisibly - browser window hidden offscreen, doesn't steal focus
  • Automatic audio routing via Web Audio API setSinkId()

How it works:

  1. Starts ffmpeg recording from BlackHole device
  2. Opens strudel.dygy.app/embed in Puppeteer (window hidden at -32000,-32000)
  3. Grants audio permissions, finds BlackHole device ID
  4. Inserts code via CodeMirror dispatch API
  5. Clicks Play, routes audio to BlackHole via getAudioContext().setSinkId()
  6. Waits for samples to load, records for specified duration
  7. Stops playback, closes browser

Browser runs invisibly:

  • Position: -32000,-32000 (far offscreen)
  • Size: 1x1 pixels
  • AppleScript hides Chromium process
  • Background throttling disabled

Node.js Synthesis Engine (render-strudel-node.ts)

The primary renderer uses TypeScript with proper Strudel pattern parsing:

Pattern Parsing:

  • Uses @strudel/mini v1.1.0 for accurate mini-notation parsing
  • Handles rests (~*N), chords ([a,b,c]), and sequences

Synthesis:

  • Kick drums: 808-style with pitch envelope (150→40Hz), amp decay, click transient
  • Snare: Dual-sine body (180Hz + 330Hz) + high-passed noise
  • Hi-hats: Metallic multi-frequency noise (open/closed variants)
  • Bass: Sawtooth + sub-octave sine, low-pass filtered
  • Lead (mids): Detuned saws + triangle with filter envelope
  • High: Odd-harmonic square wave + saw for brightness

Mix Balance:

  • Tuned for melodic content (mids 3x, highs 2.5x, bass 0.08x, drums 0.15x)
  • 80Hz high-pass filter on master to reduce mud
  • Achieves ~79% similarity against melodic stems

Output: Mono 44.1kHz 16-bit WAV.

Python Synthesis Engine (render_audio.py)

Fallback renderer for AI-driven iterative improvement:

  • Kick drums: Pitch envelope with distortion (808 style)
  • Snare: Body tone + high-passed noise
  • Hi-hats: Filtered noise with decay envelope
  • Bass: Sawtooth + sub-octave, low-pass filtered
  • Vocal chops: Square wave with fast attack
  • Chord stabs: Filtered sawtooth
  • Lead: Triangle wave with vibrato

Output: Stereo 44.1kHz 16-bit WAV.

AI-Driven Mix Parameters (audio_to_strudel_params.py)

Analyzes the original audio to suggest optimal Strudel effect parameters:

  • Spectral analysis: Determines filter cutoffs, brightness
  • Dynamics analysis: Suggests compression, gain staging
  • Timbre matching: Recommends distortion, FM synthesis amounts
  • Spatial analysis: Determines reverb size, delay times
  • Envelope detection: Suggests ADSR values

The analysis feeds back into the renderer for better mix balance.

Audio Comparison (compare_audio.py)

Compares rendered output against original stems:

[7/7] Comparing rendered audio with original...

Metrics computed:

  • Spectral similarity: Centroid, bandwidth, rolloff comparison
  • Rhythmic similarity: Onset alignment, tempo consistency
  • Timbral similarity: MFCC distance, spectral flatness
  • Overall score: Weighted combination (0-100%)

This feedback loop helps improve the synthesis quality.

Output Caching

All outputs are cached for iteration and comparison:

.cache/stems/yt_VIDEO_ID/
├── piano.wav              # Separated piano/melodic stem
├── drums.wav              # Separated drums stem
├── output_v001.strudel    # Version 1 Strudel code
├── output_v001.json       # Version 1 metadata (BPM, key, style)
├── output_v002.strudel    # Version 2 (after re-running)
├── output_v002.json
├── output_latest.strudel  # Always points to latest
├── render_v001.wav        # Rendered audio for v1
└── render_v002.wav        # Rendered audio for v2

Each run creates a new version, allowing you to:

  • Compare different generations
  • Track improvements over time
  • Iterate on the output without losing previous work

Metadata stored (output_vXXX.json):

{
  "code": "// MIDI-grep output...",
  "bpm": 136,
  "key": "C# minor",
  "style": "brazilian_funk",
  "genre": "brazilian_funk",
  "notes": 497,
  "drum_hits": 287,
  "version": 1,
  "created_at": "2025-02-03T01:24:00Z"
}

Project Structure

midi-grep/
├── cmd/midi-grep/          # CLI entrypoint
├── internal/
│   ├── audio/              # File validation, stems, YouTube
│   ├── analysis/           # BPM & key detection
│   ├── midi/               # Transcription & cleanup
│   ├── pipeline/           # Orchestration (calls Python AI for code gen)
│   ├── server/             # Web interface (HTMX)
│   ├── cache/              # Stem + output caching
│   └── report/             # Go HTML report generation
├── scripts/
│   ├── midi-grep.sh        # Main CLI wrapper
│   ├── node/               # BlackHole recorder (TypeScript)
│   └── python/             # AI + ML scripts (code gen, comparison, LLM)
├── context/                # AWOS product docs
├── Makefile
├── Dockerfile
└── README.md

Configuration

Quantization

Controls note timing precision:

  • 4 = Quarter notes (simplified, loose timing)
  • 8 = Eighth notes (moderate detail)
  • 16 = Sixteenth notes (full detail, default)

Audio Requirements

  • Formats: WAV, MP3
  • Max size: 100MB
  • Best results: Clear piano recordings, minimal background noise

Docker

# Build image
docker build -t midi-grep .

# Run extraction
docker run -v $(pwd):/data midi-grep extract --input /data/track.wav

# Run server
docker run -p 8080:8080 midi-grep serve

Development

# Build
make build

# Run tests
make test

# Install deps
make deps

# Start dev server
make serve

Tech Stack

  • Backend: Go 1.21+, Chi router
  • Frontend: HTMX, PicoCSS (no JavaScript frameworks)
  • Audio Processing:
    • Demucs (stem separation)
    • Basic Pitch (audio-to-MIDI)
    • librosa (analysis)
  • CLI: Cobra

Troubleshooting

"Demucs failed" or "Spleeter not installed"

./scripts/midi-grep.sh install

"yt-dlp not found"

brew install yt-dlp
# or
pip install yt-dlp

Python version issues

brew install python@3.11

Slow processing

  • Stem separation takes 1-2 minutes for a 3-minute track
  • First run downloads ML models (~1GB)

License

MIT

Credits

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors