feat: add ElevenLabs Scribe STT transcriber and Telegram SendVoice support by manaporkun · Pull Request #1905 · sipeed/picoclaw

manaporkun · 2026-03-22T23:03:18Z

Summary

Add ElevenLabsTranscriber as an alternative speech-to-text provider using the ElevenLabs Scribe API (scribe_v1). This enables voice message transcription for users who already have an ElevenLabs API key, without requiring a separate Groq account.
Add Telegram SendVoice support so voice messages are sent as proper voice bubbles instead of audio file attachments.

Changes

File	Change
`pkg/voice/elevenlabs_transcriber.go`	New `ElevenLabsTranscriber` struct implementing the `Transcriber` interface. Uses `xi-api-key` header auth and `scribe_v1` model.
`pkg/voice/transcriber.go`	Updated `DetectTranscriber` to check `voice.elevenlabs_api_key` first, falling back to Groq model-list entries for backward compatibility.
`pkg/config/config.go`	Added `ElevenLabsAPIKey` field to `VoiceConfig`.
`pkg/channels/telegram/telegram.go`	Telegram-specific voice-bubble detection inside `"audio"` case — OGG files with "voice" in filename use `SendVoice`, others use `SendAudio`. No changes to `inferMediaType` or other channels.
`pkg/voice/elevenlabs_transcriber_test.go`	`TestElevenLabsTranscribe` suite (success, API error, missing file).
`pkg/voice/transcriber_test.go`	`DetectTranscriber` priority tests for ElevenLabs, Groq, and voice model name.

Configuration

{
  "voice": {
    "elevenlabs_api_key": "sk_your_key_here"
  }
}

Or via environment variable: PICOCLAW_VOICE_ELEVENLABS_API_KEY=sk_your_key_here

When configured, ElevenLabs takes priority over Groq for transcription. Existing Groq configurations (via model_list) continue to work unchanged.

Test plan

All existing tests pass
New ElevenLabs transcriber tests pass (success, API error, missing file)
DetectTranscriber correctly prioritizes: voice model name > ElevenLabs > Groq model-list
Interface compliance verified at compile time
Tested on Raspberry Pi Zero 2 W with real ElevenLabs API
Telegram voice bubbles render correctly on iOS/Android

CLAassistant · 2026-03-22T23:03:30Z

All committers have signed the CLA.

Copilot

Pull request overview

This PR adds an ElevenLabs Scribe-based speech-to-text transcriber option and improves Telegram outbound media handling by sending OGG “voice note” media as Telegram voice bubbles.

Changes:

Added ElevenLabsTranscriber (Scribe STT) and updated DetectTranscriber to prefer ElevenLabs when configured.
Added Telegram outbound "voice" handling via SendVoice.
Introduced filename-based "voice" media type inference for OGG/OGA files containing "voice" in the filename.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`pkg/voice/transcriber.go`	Adds ElevenLabs STT transcriber and updates provider detection priority.
`pkg/voice/transcriber_test.go`	Adds interface compliance + ElevenLabs transcribe tests and provider priority tests.
`pkg/config/config.go`	Extends `ProvidersConfig` with `ElevenLabs` and updates `IsEmpty()`.
`pkg/agent/loop.go`	Adds `"voice"` inference in `inferMediaType` based on filename for OGG/OGA.
`pkg/channels/telegram/telegram.go`	Adds `"voice"` branch to send Telegram voice bubbles via `SendVoice`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T23:07:57Z

+	// Detect voice messages: OGG files with "voice" in the filename.
+	// These are sent as Telegram voice bubbles rather than audio attachments.
+	if strings.Contains(fn, "voice") && (strings.HasSuffix(fn, ".ogg") || strings.HasSuffix(fn, ".oga")) {
+		return "voice"
+	}
+


inferMediaType now returns "voice" for files named like "voice.ogg". This value is propagated into bus.MediaPart.Type for all channels (see where tool media parts are built), but only Telegram has been updated to handle "voice". Channels that map types explicitly (e.g., OneBot maps only "audio" -> "record", WeCom maps only "audio" -> "voice") will now treat these as generic files, which is a functional regression for non-Telegram channels. Consider keeping the inferred type as "audio" and doing Telegram-specific voice-bubble detection inside the Telegram channel, or ensure every channel that handles audio also treats "voice" as audio-equivalent.

Suggested change

// Detect voice messages: OGG files with "voice" in the filename.

// These are sent as Telegram voice bubbles rather than audio attachments.

if strings.Contains(fn, "voice") && (strings.HasSuffix(fn, ".ogg") || strings.HasSuffix(fn, ".oga")) {

return "voice"

}

Good catch — fixed. Moved voice-bubble detection out of inferMediaType (which now always returns "audio" for OGG files) and into the Telegram channel's Send method. Other channels are unaffected.

afjcjsbx · 2026-03-22T23:26:15Z

a new branch was mergiated, please could you fix the conflicts? 🙏

manaporkun · 2026-03-22T23:35:44Z

Rebased on latest main and resolved conflicts. Adapted to the new transcriber architecture — ElevenLabs transcriber is now in its own file (elevenlabs_transcriber.go) matching the new pattern, and DetectTranscriber checks voice model name first, then ElevenLabs, then Groq. All 25 tests pass.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T23:38:03Z

+			// Send OGG files with "voice" in the filename as Telegram voice
+			// bubbles (SendVoice) instead of audio attachments (SendAudio).
+			fn := strings.ToLower(part.Filename)
+			if strings.Contains(fn, "voice") && (strings.HasSuffix(fn, ".ogg") || strings.HasSuffix(fn, ".oga")) {
+				vparams := &telego.SendVoiceParams{


The voice-bubble detection uses strings.Contains(fn, "voice"), which will also match unrelated filenames like invoice.ogg and send them as Telegram voice messages incorrectly. Consider using a stricter check (e.g., HasPrefix/HasSuffix on the base name with a delimiter, or a regex for (^|[^a-z0-9])voice([^a-z0-9]|$)) so only intended voice clips are routed to SendVoice.

Copilot · 2026-03-22T23:38:04Z

+		logger.ErrorCF("voice", "ElevenLabs API error", map[string]any{
+			"status_code": resp.StatusCode,
+			"response":    string(body),
+		})
+		return nil, fmt.Errorf("ElevenLabs API error (status %d): %s", resp.StatusCode, string(body))


On non-200 responses, the code logs and returns the full response body ("response": string(body) and includes it in the returned error). This can leak potentially sensitive information and can also produce very large logs/errors if the upstream returns HTML or verbose JSON. Consider truncating/sanitizing the body in logs/errors (and optionally parsing a structured error field) while still preserving enough detail for debugging.

Suggested change

logger.ErrorCF("voice", "ElevenLabs API error", map[string]any{

"status_code": resp.StatusCode,

"response": string(body),

})

return nil, fmt.Errorf("ElevenLabs API error (status %d): %s", resp.StatusCode, string(body))

truncatedBody := utils.Truncate(string(body), 512)

logger.ErrorCF("voice", "ElevenLabs API error", map[string]any{

"status_code": resp.StatusCode,

"response": truncatedBody,

})

return nil, fmt.Errorf("ElevenLabs API error (status %d): %s", resp.StatusCode, truncatedBody)

afjcjsbx

LGTM!

afjcjsbx · 2026-03-22T23:42:45Z

pls fix lint 🙏

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T11:34:14Z

 type VoiceConfig struct {
-	ModelName         string `json:"model_name,omitempty" env:"PICOCLAW_VOICE_MODEL_NAME"`
-	EchoTranscription bool   `json:"echo_transcription"   env:"PICOCLAW_VOICE_ECHO_TRANSCRIPTION"`
+	ModelName         string `json:"model_name,omitempty"      env:"PICOCLAW_VOICE_MODEL_NAME"`
+	EchoTranscription bool   `json:"echo_transcription"        env:"PICOCLAW_VOICE_ECHO_TRANSCRIPTION"`
+	ElevenLabsAPIKey  string `json:"elevenlabs_api_key,omitempty" env:"PICOCLAW_VOICE_ELEVENLABS_API_KEY"`
 }


The PR description/config snippet refer to providers.elevenlabs.api_key, but the implementation adds voice.elevenlabs_api_key (and DetectTranscriber reads cfg.Voice.ElevenLabsAPIKey). This mismatch will cause users following the documented JSON to get no ElevenLabs transcriber. Either wire ElevenLabs through providers.elevenlabs as described, or update the docs/PR description and any config examples to match the voice section.

Good catch! The original implementation used providers.elevenlabs, but during rebase onto latest main, the ProvidersConfig struct was removed (provider refactoring to model_list). I moved the ElevenLabs API key to voice.elevenlabs_api_key instead, which fits better since it's specifically for voice transcription. Updated the PR description and config examples to match the actual implementation.

huaaudio

Hi @manaporkun , thanks for the PR! This config logic is pretty good. Can you run make fmt && make lint locally to fix the lint issue before we proceed and merge? Thanks

…pport Add ElevenLabsTranscriber as an alternative speech-to-text provider using the ElevenLabs Scribe API (scribe_v1). This enables voice message transcription for users who already have an ElevenLabs API key, without requiring a separate Groq account. Changes: - Add ElevenLabsTranscriber implementing the Transcriber interface - Update DetectTranscriber to check providers.elevenlabs.api_key first, falling back to Groq for backward compatibility - Add ElevenLabs to ProvidersConfig - Add "voice" media type for OGG files with "voice" in filename - Add SendVoice support in Telegram channel for voice bubble messages - Add comprehensive tests for ElevenLabs transcriber Configuration: "providers": { "elevenlabs": { "api_key": "sk_your_key_here" } } Closes sipeed#1503 (partial)

…ssion in other channels Address review feedback: keep inferMediaType returning "audio" for all OGG files. Voice-bubble detection (SendVoice vs SendAudio) is now done inside the Telegram channel based on filename, so other channels that map "audio" explicitly are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Orgmar · 2026-03-24T03:05:44Z

@manaporkun Nice contribution! The ElevenLabs Scribe transcriber gives users another solid STT option, and the priority chain (voice model > ElevenLabs > Groq) keeps backward compatibility clean. The Telegram SendVoice bubble support is a great UX improvement too, way better than raw audio attachments. Thorough test coverage across the board.

We're running a PicoClaw Dev Group on Discord for contributors to chat and collaborate. If you're interested, email support@sipeed.com with subject [Join PicoClaw Dev Group] manaporkun and we'll get you the invite!

- Convert HEIC build photos to JPEG in docs/images/ with descriptive names - Rewrite README with hero image, build story, and architecture overview - Rename picoclaw/ to character/ (persona files, not the tool itself) - Update hardware.md with full audio config: MAX98357A dtoverlay, ALSA dmix+softvol, Pi Zero 2W over-amplification fix, USB mic tuning - Mention upstream ElevenLabs TTS contribution (sipeed/picoclaw#1905) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pport (sipeed#1905) * feat: add ElevenLabs Scribe STT transcriber and Telegram SendVoice support Add ElevenLabsTranscriber as an alternative speech-to-text provider using the ElevenLabs Scribe API (scribe_v1). This enables voice message transcription for users who already have an ElevenLabs API key, without requiring a separate Groq account. Changes: - Add ElevenLabsTranscriber implementing the Transcriber interface - Update DetectTranscriber to check providers.elevenlabs.api_key first, falling back to Groq for backward compatibility - Add ElevenLabs to ProvidersConfig - Add "voice" media type for OGG files with "voice" in filename - Add SendVoice support in Telegram channel for voice bubble messages - Add comprehensive tests for ElevenLabs transcriber Configuration: "providers": { "elevenlabs": { "api_key": "sk_your_key_here" } } Closes sipeed#1503 (partial) * fix: move voice-bubble detection into Telegram channel to avoid regression in other channels Address review feedback: keep inferMediaType returning "audio" for all OGG files. Voice-bubble detection (SendVoice vs SendAudio) is now done inside the Telegram channel based on filename, so other channels that map "audio" explicitly are unaffected. * fix: align VoiceConfig struct tags to pass golines formatter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(agent): use ModelName in loop test added by upstream Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 22, 2026 23:03

Copilot started reviewing on behalf of manaporkun March 22, 2026 23:03 View session

manaporkun force-pushed the feat/elevenlabs-transcriber-and-voice branch from 9841a94 to 3b2e19a Compare March 22, 2026 23:07

Copilot AI reviewed Mar 22, 2026

View reviewed changes

manaporkun force-pushed the feat/elevenlabs-transcriber-and-voice branch from 9095a02 to 4b6cdd1 Compare March 22, 2026 23:35

Copilot AI review requested due to automatic review settings March 22, 2026 23:35

Copilot started reviewing on behalf of manaporkun March 22, 2026 23:36 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

afjcjsbx approved these changes Mar 22, 2026

View reviewed changes

sipeed-bot Bot added type: enhancement New feature or request domain: provider domain: channel go Pull requests that update go code labels Mar 22, 2026

manaporkun force-pushed the feat/elevenlabs-transcriber-and-voice branch from 4b6cdd1 to 9cff4e4 Compare March 22, 2026 23:48

Copilot AI review requested due to automatic review settings March 23, 2026 11:31

manaporkun force-pushed the feat/elevenlabs-transcriber-and-voice branch from 9cff4e4 to 1451bb1 Compare March 23, 2026 11:31

Copilot started reviewing on behalf of manaporkun March 23, 2026 11:31 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

huaaudio approved these changes Mar 23, 2026

View reviewed changes

manaporkun and others added 4 commits March 23, 2026 21:55

fix: align VoiceConfig struct tags to pass golines formatter

a5238bd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(agent): use ModelName in loop test added by upstream

8ab96fd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 23, 2026 20:56

manaporkun force-pushed the feat/elevenlabs-transcriber-and-voice branch from 8fd098f to 8ab96fd Compare March 23, 2026 20:56

Copilot started reviewing on behalf of manaporkun March 23, 2026 20:57 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

huaaudio merged commit dd9adf8 into sipeed:main Mar 23, 2026
7 of 8 checks passed

manaporkun deleted the feat/elevenlabs-transcriber-and-voice branch March 23, 2026 21:12

github-actions Bot mentioned this pull request Mar 24, 2026

🦞 OpenClaw 生态日报 2026-03-24 gsscsd/big_model_radar#87

Open

Conversation

manaporkun commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Configuration

Test plan

Uh oh!

CLAassistant commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

manaporkun Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

afjcjsbx commented Mar 22, 2026

Uh oh!

manaporkun commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

afjcjsbx left a comment

Choose a reason for hiding this comment

Uh oh!

afjcjsbx commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

manaporkun Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

huaaudio left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Orgmar commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

manaporkun commented Mar 22, 2026 •

edited

Loading

CLAassistant commented Mar 22, 2026 •

edited

Loading