Skip to content

Fix voice transcription#947

Merged
afjcjsbx merged 3 commits intosipeed:mainfrom
dim:fix/transcription
Mar 4, 2026
Merged

Fix voice transcription#947
afjcjsbx merged 3 commits intosipeed:mainfrom
dim:fix/transcription

Conversation

@dim
Copy link
Copy Markdown
Contributor

@dim dim commented Mar 1, 2026

📝 Description

Addresses #945

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Fixes #945

📚 Technical Context (Skip for Docs)

  • Reference URL:
  • Reasoning:

🧪 Test Environment

  • Hardware: Raspberry Pi 5
  • OS: Debian 13
  • Model/Provider: MiniMax M25 + Groq Whisper
  • Channels: Telegram

📸 Evidence (Optional)

2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] agent: Processing message from telegram:telegram:REDACTED: [voice] {channel=telegram, chat_id=REDACTED, sender_id=telegram:REDACTED, session_key=}
2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] voice: Starting transcription {audio_file=/tmp/picoclaw_media/452372f1_file_6.oga.ogg}
2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] voice: Transcription completed successfully {text_length=45, language=, duration_seconds=0, transcription_preview= Let me check if voice recognition now works.}
2026/03/01 08:29:59 [2026-03-01T08:29:59Z] [INFO] agent: Routed message {agent_id=main, session_key=agent:main:telegram:direct:REDACTED, matched_by=default}
2026/03/01 08:30:05 [2026-03-01T08:30:05Z] [INFO] agent: LLM response without tool calls (direct answer) {agent_id=main, iteration=1, content_chars=294}
2026/03/01 08:30:05 [2026-03-01T08:30:05Z] [INFO] agent: Response: Hey, I can read that! 🎉

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

agentLoop.SetMediaStore(mediaStore)

// Wire up voice transcription if Groq API key is available
groqAPIKey := cfg.Providers.Groq.APIKey
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term, we may want to consider multiple transcribe providers and how can we prioritise them. For now this works and address an issue, I'm happy with it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaket let me address this then slightly more elegantly

@xiaket xiaket requested a review from lxowalle March 1, 2026 10:58
Copy link
Copy Markdown
Collaborator

@xiaket xiaket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is a well-structured fix that moves voice transcription from channel-specific (Telegram only) to agent-level, making it work across all channels.

Key improvements:

  1. Transcriber interface with Name() and Transcribe() methods -- clean abstraction
  2. DetectTranscriber auto-detects Groq from either direct provider config or model_list entries
  3. Agent-level transcription via transcribeAudioInMessage replaces audio annotations [voice] with transcribed text
  4. Falls back gracefully: if transcription fails, the annotation is left as-is (empty string appended)
  5. README updates across all translations correctly reflect the change

The regex-based annotation replacement is clean: audioAnnotationRe matches [voice] and [audio:*] patterns, and transcriptions are applied in order. Remaining transcriptions (more audio than annotations) are appended with newlines.

Test coverage is thorough: interface satisfaction, DetectTranscriber with various configs, actual transcription with mocked HTTP server, API errors, and missing files. Good work.

@afjcjsbx afjcjsbx self-requested a review March 3, 2026 22:04
Copy link
Copy Markdown
Collaborator

@afjcjsbx afjcjsbx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @dim could you fix the lint?

@afjcjsbx afjcjsbx merged commit 3e5b849 into sipeed:main Mar 4, 2026
2 checks passed
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
@Orgmar
Copy link
Copy Markdown
Contributor

Orgmar commented Mar 6, 2026

@dim Thanks for fixing the voice transcription issue! Audio handling bugs can be tough to track down, glad this one got sorted out.

We have a PicoClaw Dev Group on Discord where contributors connect and share ideas. If you'd like to join, send an email to support@sipeed.com with the subject [Join PicoClaw Dev Group] dim and we'll send the invite!

fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026
@dim
Copy link
Copy Markdown
Contributor Author

dim commented Mar 16, 2026

@Orgmar thanks, that's kind. I am a bit busy at the moment but would be very happy to join in a few weeks time. Will email support@sipeed.com as suggested. Thanks again!

andressg79 pushed a commit to andressg79/picoclaw that referenced this pull request Mar 30, 2026
ra1phdd pushed a commit to ra1phdd/picoclaw-pkg that referenced this pull request Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] voice package is not being used - voice.GroqTranscriber is not part of the process any more

6 participants