Fix voice transcription by dim · Pull Request #947 · sipeed/picoclaw

dim · 2026-03-01T08:39:21Z

📝 Description

Addresses #945

🗣️ Type of Change

🐞 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
📖 Documentation update
⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

🤖 Fully AI-generated (100% AI, 0% Human)
🛠️ Mostly AI-generated (AI draft, Human verified/modified)
👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Fixes #945

📚 Technical Context (Skip for Docs)

Reference URL:
Reasoning:

🧪 Test Environment

Hardware: Raspberry Pi 5
OS: Debian 13
Model/Provider: MiniMax M25 + Groq Whisper
Channels: Telegram

📸 Evidence (Optional)

2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] agent: Processing message from telegram:telegram:REDACTED: [voice] {channel=telegram, chat_id=REDACTED, sender_id=telegram:REDACTED, session_key=}
2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] voice: Starting transcription {audio_file=/tmp/picoclaw_media/452372f1_file_6.oga.ogg}
2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] voice: Transcription completed successfully {text_length=45, language=, duration_seconds=0, transcription_preview= Let me check if voice recognition now works.}
2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] agent: Routed message {agent_id=main, session_key=agent:main:telegram:direct:REDACTED, matched_by=default}
2026/03/01 08:30:05 [2026-03-01T08:30:05Z] [INFO] agent: LLM response without tool calls (direct answer) {agent_id=main, iteration=1, content_chars=294}
2026/03/01 08:30:05 [2026-03-01T08:30:05Z] [INFO] agent: Response: Hey, I can read that! 🎉

☑️ Checklist

My code/docs follow the style of this project.
I have performed a self-review of my own changes.
I have updated the documentation accordingly.

xiaket · 2026-03-01T10:58:26Z

 	agentLoop.SetMediaStore(mediaStore)

+	// Wire up voice transcription if Groq API key is available
+	groqAPIKey := cfg.Providers.Groq.APIKey


Longer term, we may want to consider multiple transcribe providers and how can we prioritise them. For now this works and address an issue, I'm happy with it.

@xiaket let me address this then slightly more elegantly

xiaket

LGTM

nikolasdehor

LGTM. This is a well-structured fix that moves voice transcription from channel-specific (Telegram only) to agent-level, making it work across all channels.

Key improvements:

Transcriber interface with Name() and Transcribe() methods -- clean abstraction
DetectTranscriber auto-detects Groq from either direct provider config or model_list entries
Agent-level transcription via transcribeAudioInMessage replaces audio annotations [voice] with transcribed text
Falls back gracefully: if transcription fails, the annotation is left as-is (empty string appended)
README updates across all translations correctly reflect the change

The regex-based annotation replacement is clean: audioAnnotationRe matches [voice] and [audio:*] patterns, and transcriptions are applied in order. Remaining transcriptions (more audio than annotations) are appended with newlines.

Test coverage is thorough: interface satisfaction, DetectTranscriber with various configs, actual transcription with mocked HTTP server, API errors, and missing files. Good work.

afjcjsbx

LGTM! @dim could you fix the lint?

Fix voice transcription

Orgmar · 2026-03-06T04:11:15Z

@dim Thanks for fixing the voice transcription issue! Audio handling bugs can be tough to track down, glad this one got sorted out.

We have a PicoClaw Dev Group on Discord where contributors connect and share ideas. If you'd like to join, send an email to support@sipeed.com with the subject [Join PicoClaw Dev Group] dim and we'll send the invite!

Fix voice transcription

dim · 2026-03-16T12:35:45Z

@Orgmar thanks, that's kind. I am a bit busy at the moment but would be very happy to join in a few weeks time. Will email support@sipeed.com as suggested. Thanks again!

Fix voice transcription

Fix voice transcription

b1386ad

xiaket reviewed Mar 1, 2026

View reviewed changes

xiaket requested a review from lxowalle March 1, 2026 10:58

xiaket approved these changes Mar 1, 2026

View reviewed changes

xiaket added type: enhancement New feature or request domain: agent labels Mar 1, 2026

github-actions bot mentioned this pull request Mar 1, 2026

🦞 OpenClaw 生态日报 2026-03-01 rollysys/agents-radar#13

Open

A more neutral and elegant voice.Transcriber interface

b74f92e

This was referenced Mar 2, 2026

🦞 OpenClaw 生态日报 2026-03-02 duanyytop/agents-radar#37

Open

🦞 OpenClaw 生态日报 2026-03-02 rollysys/agents-radar#23

Open

nikolasdehor approved these changes Mar 3, 2026

View reviewed changes

afjcjsbx self-requested a review March 3, 2026 22:04

afjcjsbx approved these changes Mar 3, 2026

View reviewed changes

lxowalle approved these changes Mar 4, 2026

View reviewed changes

Fix lint

494953f

afjcjsbx merged commit 3e5b849 into sipeed:main Mar 4, 2026
2 checks passed

github-actions bot mentioned this pull request Mar 5, 2026

🦞 OpenClaw 生态日报 2026-03-05 duanyytop/agents-radar#77

Open

hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026

Merge pull request sipeed#947 from dim/fix/transcription

3e4a649

Fix voice transcription

afjcjsbx mentioned this pull request Mar 11, 2026

refactor(voice): introduce Transcriber interface for pluggable STT #353

Closed

fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026

Merge pull request sipeed#947 from dim/fix/transcription

f1e248c

Fix voice transcription

andressg79 pushed a commit to andressg79/picoclaw that referenced this pull request Mar 30, 2026

Merge pull request sipeed#947 from dim/fix/transcription

6ead228

Fix voice transcription

ra1phdd pushed a commit to ra1phdd/picoclaw-pkg that referenced this pull request Apr 12, 2026

Merge pull request sipeed#947 from dim/fix/transcription

c5ae4d4

Fix voice transcription

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix voice transcription#947

Fix voice transcription#947
afjcjsbx merged 3 commits intosipeed:mainfrom
dim:fix/transcription

dim commented Mar 1, 2026

Uh oh!

xiaket Mar 1, 2026

Uh oh!

dim Mar 1, 2026

Uh oh!

xiaket left a comment

Uh oh!

nikolasdehor left a comment

Uh oh!

afjcjsbx left a comment •

edited

Loading

Uh oh!

Uh oh!

Orgmar commented Mar 6, 2026

Uh oh!

dim commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

dim commented Mar 1, 2026

📝 Description

🗣️ Type of Change

🤖 AI Code Generation

🔗 Related Issue

📚 Technical Context (Skip for Docs)

🧪 Test Environment

📸 Evidence (Optional)

☑️ Checklist

Uh oh!

xiaket Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

dim Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

xiaket left a comment

Choose a reason for hiding this comment

Uh oh!

nikolasdehor left a comment

Choose a reason for hiding this comment

Uh oh!

afjcjsbx left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Orgmar commented Mar 6, 2026

Uh oh!

dim commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

afjcjsbx left a comment •

edited

Loading