🎤 Scope Audio Transcription Plugin

Voice-controlled AI video generation for Daydream Scope

Speak into your microphone and watch AI-generated imagery transform in real time. Say "butterfly" and a butterfly appears. Say "ocean sunset" and the scene shifts. Your voice becomes the paintbrush.

Built for live performance and interactive installation. Based on The Mirror's Echo by Krista Faist.

How It Works

Microphone → faster-whisper → spaCy NLP → StreamDiffusion
    ↓              ↓              ↓              ↓
 48kHz audio   "that's my    [Freddy,      AI-generated
 capture       little guy    little guy]    imagery of
               Freddy"       (nouns only)   Freddy

The plugin runs as a preprocessor in front of StreamDiffusion:

Captures microphone audio in 3-second chunks at 48kHz
Resamples to 16kHz and transcribes with faster-whisper (CPU, int8 quantized)
Extracts concrete nouns and noun phrases using spaCy NLP
Injects extracted nouns as the generation prompt with cache reset
Filters filler speech — "um okay so like" produces no prompt change
Falls back to the UI prompt box when no voice nouns are detected

Features

Real-time voice-to-image — speak and see results in ~2 seconds
Noun extraction — only concrete nouns drive the image, not filler words
UI prompt fallback — text box prompt stays active when you're not speaking
Whisper on CPU — faster-whisper int8 keeps your GPU free for StreamDiffusion
Cache reset on change — clean transitions between prompts, no ghosting
LIFO audio queue — always transcribes the most recent speech, not a backlog
Prompt monitor — included tkinter overlay shows what's driving the video

Installation

Prerequisites

Daydream Scope installed
Python 3.10+ (Scope's bundled Python works)
A microphone

Install the plugin

# From Scope's virtual environment
pip install -e .
python -m spacy download en_core_web_sm

Microphone setup

The plugin defaults to device 27 (Intel Smart Sound) at 48kHz. To find your mic device number:

import sounddevice as sd
print(sd.query_devices())

Then edit pipeline.py line with mic_device = 27 to match your device index.

Usage

Open Daydream Scope
Select Audio Transcription as the first pipeline (preprocessor)
Select StreamDiffusion as the second pipeline
Set Input Mode to Video (the plugin overrides this to text-only internally)
Type a base prompt in the text box (this is your fallback prompt)
Click Play — speak into your mic and watch the imagery respond

What you'll see in the logs

AUDIO-PLUGIN: transcribing...
AUDIO-PLUGIN: audio amplitude=0.0352
AUDIO-PLUGIN: result='That's my little guy Freddy.'
AUDIO-PLUGIN: nouns extracted: ['my little guy', 'Freddy']
AUDIO-PLUGIN: >>> NEW PROMPT: 'my little guy, Freddy' (from: 'That's my little guy Freddy.')

Prompt priority

Source	Behavior
Voice nouns	Immediately override the active prompt with cache reset
UI text box	Accepted after the user types a new value; clears voice prompt
No speech	Voice prompt persists until UI text box changes

Prompt Monitor

An always-on-top tkinter overlay that shows what's driving the video output in real time.

# Launch the monitor
python tools/scope-prompt-monitor.pyw

Shows:

🎤 VOICE (green) — voice noun prompt active
📝 UI PROMPT (yellow) — text box prompt active
🔶 FALLBACK (orange) — voice timed out, reverted to text box
Amplitude bars, extracted nouns, raw transcription, skipped filler

Architecture

scope_audio_transcription/
├── __init__.py              # Package version
├── plugin.py                # @hookimpl registration for Scope
└── pipelines/
    ├── __init__.py           # Pipeline exports
    ├── pipeline.py           # Main pipeline (voice capture + NLP + prompt injection)
    └── schema.py             # Scope UI configuration schema
tools/
└── scope-prompt-monitor.pyw  # Real-time prompt overlay (tkinter)

Pipeline processor integration

The plugin requires three edits to Scope's pipeline_processor.py to ensure prompt overrides from the preprocessor always reach StreamDiffusion:

Queue bypass — preprocessor parameters merge directly into the next processor's state instead of going through the parameter queue (which can fill up and drop overrides)
Larger parameter queue — maxsize=64 instead of 8
Larger output queue — maxsize=64 instead of 8

See the installation guide for exact edit locations.

Whisper Model Options

Model	Size	Speed	Accuracy	VRAM
`tiny.en`	39MB	Fastest	Basic	~0.5GB
`base.en`	74MB	Fast	Good	~0.5GB
`small.en`	244MB	Default	Great	~0.5GB
`medium.en`	769MB	Slower	Best	~1GB

All models run on CPU with int8 quantization via faster-whisper, keeping GPU memory free for StreamDiffusion.

Based On

This plugin is the technical core of The Mirror's Echo, an interactive AI projection installation by Krista Faist. The installation transforms spoken words into evolving visual landscapes using Whisper AI, spaCy NLP, TouchDesigner, and StreamDiffusion.

Artist: Krista Faist
Gallery: Chaos Contemporary Craft, Sarasota FL
Fuse Factory Artist-in-Residence 2024, Columbus OH

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
scope_audio_transcription		scope_audio_transcription
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎤 Scope Audio Transcription Plugin

How It Works

Features

Installation

Prerequisites

Install the plugin

Microphone setup

Usage

What you'll see in the logs

Prompt priority

Prompt Monitor

Architecture

Pipeline processor integration

Whisper Model Options

Based On

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎤 Scope Audio Transcription Plugin

How It Works

Features

Installation

Prerequisites

Install the plugin

Microphone setup

Usage

What you'll see in the logs

Prompt priority

Prompt Monitor

Architecture

Pipeline processor integration

Whisper Model Options

Based On

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages