fix: support audio content using top-level data field by sidonsoft · Pull Request #1896 · agentscope-ai/QwenPaw

sidonsoft · 2026-03-20T00:50:32Z

Summary

Fix audio/voice message processing when an audio block uses a top-level data field instead of a source dict.

This affects the path used by Telegram voice/audio content in v0.1.0, where AudioContent is created with data=... but message_processing.py only looks for source.

What changed

Added compatibility handling in src/copaw/agents/utils/message_processing.py:
- for block_type == "audio", if source is missing but data is present, normalize it into a {"type": "url", "url": ...} source
Added targeted unit tests covering:
- _extract_source_and_filename() with Telegram-style audio blocks
- _process_single_block() using an audio block with data=file://...

Why this approach

This is a smaller, safer compatibility fix than changing Telegram channel output. It also makes the processing layer more data-agnostic for any path that emits AudioContent(data=...).

Validation

Targeted tests:

.venv/bin/python -m pytest -q tests/unit/agents/utils/test_message_processing.py

Result: 2 passed

Additional sanity slice:

.venv/bin/python -m pytest -q tests/unit/workspace/test_prompt.py tests/unit/cli/test_cli_version.py

Result: 6 passed

Closes #1516

sidonsoft · 2026-03-20T01:01:02Z

Follow-up: I found and fixed the actual live Telegram voice-note path too.

In the running app, the failing block was arriving as:

{
  "type": "audio",
  "source": {
    "type": "base64",
    "media_type": "audio/None",
    "data": "file:///.../telegram/voice-....oga"
  }
}

So there were really two compatibility issues:

audio blocks using top-level data
audio blocks using source.type == "base64" where source.data is actually a file://... URI, not real base64

This PR now handles both cases by normalizing local paths / file URIs into URL-style sources before the base64 decoder is reached.

Also fixed .oga media type normalization so Telegram voice-note files map to audio/ogg instead of audio/octet-stream.

Updated validation:

.venv/bin/python -m pytest -q tests/unit/agents/utils/test_message_processing.py
# 3 passed

.venv/bin/python -m pytest -q tests/unit/workspace/test_prompt.py tests/unit/cli/test_cli_version.py
# 6 passed

sidonsoft · 2026-03-20T01:22:12Z

Quick reviewer note: the current PR now covers all of the failure shapes I saw locally, not just one Telegram variant.

The compatibility handling now covers:

Audio blocks with top-level data
- e.g. {"type": "audio", "data": ...}
Audio blocks with source.type == "base64" where source.data is actually a local path / file://... URI
- this was the real live Telegram voice-note failure path in my logs
Pydantic content objects that need model_dump() before processing
- process_file_and_media_blocks_in_message() now converts model-like blocks to dicts before the normal media path runs
.oga media type normalization
- mapped to audio/ogg so Telegram voice-note files don’t fall through as audio/octet-stream

So the intent of the PR is:

keep channel-side behavior unchanged
make message processing more tolerant / data-agnostic
normalize multiple equivalent audio-content shapes into one common handling path

Targeted tests were added for:

top-level data
base64 + file://... source payloads
audio normalization through _process_single_block()

zhijianma · 2026-03-20T03:10:17Z

@sidonsoft

Plz see my test

sidonsoft · 2026-03-20T09:03:43Z

Thanks — yes, this aligns with the intended behavior.

The important point is that the processing layer should normalize audio inputs based on what the payload actually is, not just the nominal source.type.

So if an audio block comes through as:

source.type == "base64"
but source.data is actually a local file path (or file:// URI)

then converting it into a URL-style source before downstream processing is the correct compatibility behavior.

That is also why I’ve been framing this as a message-processing normalization issue rather than a Telegram-only workaround. Your console-uploaded MP3 example shows the same mismatch can appear in other paths too, not just Telegram voice notes.

So from my side, your test is a good confirmation that the normalization approach is the right shape for the fix.

zhijianma · 2026-03-20T09:27:04Z

@sidonsoft

You may have overlooked a crucial processing step.

In the runner, there is a conversion from Runtime Message to AgentScope Message that transforms AudioContent into AudioBlock.

type_mapping = {
                "text": (TextBlock, "text"),
                "image": (ImageBlock, "image_url"),
                "audio": (AudioBlock, "data"),
                "data": (TextBlock, "data"),
                "video": (VideoBlock, "video_url"),
                "file": (FileBlock, "file_url"),
            }
....
....
for cnt in message.content:   # AudioContent  in AgenetScope-Runtime
      cnt_type = cnt.type or "text" 
      block_cls, attr_name = type_mapping[cnt_type]  #AudioBlock in AgentScope
      value = getattr(cnt, attr_name)
      elif cnt_type == "audio":
                    if (
                        value
                        and isinstance(value, str)
                        and value.startswith(
                            "data:",
                        )
                    ):
                        mediatype_part = value.split(";")[0].replace(
                            "data:",
                            "",
                        )
                        base64_data = value.split(",")[1]
                        base64_source = Base64Source(
                            type="base64",
                            media_type=mediatype_part,
                            data=base64_data,
                        )
                        msg_content.append(
                            block_cls(type=cnt_type, source=base64_source),
                        )
                    else:
                        parsed_url = urlparse(value)
                        if parsed_url.scheme and parsed_url.netloc:
                            url_source = URLSource(type="url", url=value)
                            msg_content.append(
                                block_cls(type=cnt_type, source=url_source),
                            )
                        else:
                            audio_extension = getattr(cnt, "format")
                            base64_source = Base64Source(
                                type="base64",
                                media_type=f"audio/{audio_extension}",
                                data=value,
                            )
                            msg_content.append(
                                block_cls(type=cnt_type, source=base64_source),
                            )

Audio content with local file paths (e.g., /tmp/voice.ogg) were incorrectly wrapped as Base64Source instead of being converted to file:// URLs. Changes: - Add os.path.isfile() check before falling back to base64 - Convert local file paths to file:// URLs using Path.as_uri() - Fix getattr(cnt, 'format') to use default None to prevent AttributeError - Guard media_type construction to avoid 'audio/None' strings Related: agentscope-ai/QwenPaw#1896

sidonsoft · 2026-03-20T11:43:34Z

Upstream Fix

I've opened a matching PR in agentscope-runtime that fixes the root cause identified in #1896 (comment):

agentscope-ai/agentscope-runtime#466

Summary

The runner's message_to_agentscope_msg() now detects local file paths before falling back to base64:

if value and isinstance(value, str) and os.path.isfile(value):
    # Local file path → convert to file:// URL
    url_source = URLSource(
        type="url",
        url=Path(value).as_uri(),
        media_type=f"audio/{audio_extension}" if audio_extension else None,
    )

Coordination

PR	Location	Purpose
#466 (agentscope-runtime)	Root cause	Produces correct blocks from the start
#1896 (CoPaw)	Downstream	Defensive handling for legacy/edge cases

Both PRs should be merged. The downstream fix in this PR remains valuable for:

Backwards compatibility with older runtime versions
Defense-in-depth against similar issues in other code paths

…d_filename The previous implementation returned bare paths (e.g., /tmp/voice.ogg) as URLs without the file:// scheme, causing downstream download_file_from_url() to fail when trying to fetch them as remote URLs. Now properly normalizes: - Full URLs (https://, http://) → pass through unchanged - file:// URLs → pass through unchanged - Bare local paths that exist → convert to file:// URL with media_type - Unknown/invalid → pass through as-is (may be base64)

sidonsoft · 2026-03-20T11:49:55Z

Bug Found: Bare paths returned without file:// scheme

The _extract_source_and_filename function returns bare paths like /tmp/voice.ogg as {"type": "url", "url": data} without normalizing to file:// URLs. This causes downstream download_file_from_url() to fail.

The Problem

return {"type": "url", "url": data}, filename  # data = "/tmp/voice.ogg"

This gets passed to _process_single_file_block which sees parsed.scheme == "" and tries to fetch it as a remote URL.

Fix

Normalize bare local paths to file:// URLs:

if parsed.scheme and parsed.netloc:
    # Full URL (https://, http://, etc.)
    return {"type": "url", "url": data}, filename
elif parsed.scheme == "file":
    # Already a file:// URL
    return {"type": "url", "url": data}, filename
elif os.path.isfile(data):
    # Bare local path → convert to file:// URL
    return {
        "type": "url",
        "url": Path(data).as_uri(),
        "media_type": _media_type_from_path(data),
    }, filename
else:
    # Unknown - pass through as-is (may be base64 or invalid)
    return {"type": "url", "url": data}, filename

I've pushed this fix to a branch on my fork. The fix should be incorporated into this PR before merging.

zhijianma · 2026-03-20T12:00:27Z

@sidonsoft

So, do you think this PR still needs to be merged?

Audio content with local file paths (e.g., /tmp/voice.ogg) were incorrectly wrapped as Base64Source instead of being converted to file:// URLs. Changes: - Add os.path.isfile() check before falling back to base64 - Convert local file paths to file:// URLs using Path.as_uri() - Fix getattr(cnt, 'format') to use default None to prevent AttributeError - Guard media_type construction to avoid 'audio/None' strings Related: agentscope-ai/QwenPaw#1896

fix: support audio blocks using top-level data field

0d1a7eb

sidonsoft had a problem deploying to maintainer-approved March 20, 2026 00:50 — with GitHub Actions Failure

fix: normalize audio file URIs and oga media type

c1d0717

sidonsoft had a problem deploying to maintainer-approved March 20, 2026 01:00 — with GitHub Actions Failure

github-actions Bot mentioned this pull request Mar 20, 2026

🦞 OpenClaw 生态日报 2026-03-20 gsscsd/big_model_radar#66

Open

fix: normalize telegram voice file URIs

370e255

sidonsoft temporarily deployed to maintainer-approved March 20, 2026 01:11 — with GitHub Actions Inactive

github-actions Bot mentioned this pull request Mar 20, 2026

🦞 Bản tin hàng ngày hệ sinh thái OpenClaw 2026-03-20 compasify/agents-radar#61

Open

sidonsoft mentioned this pull request Mar 20, 2026

[Bug]: AudioContent not supported in Telegram channel - Fix #1516

Open

sidonsoft mentioned this pull request Mar 20, 2026

Fix: Handle local file paths in audio content conversion agentscope-ai/agentscope-runtime#466

Closed

sidonsoft had a problem deploying to maintainer-approved March 20, 2026 11:49 — with GitHub Actions Failure

github-actions Bot mentioned this pull request Mar 21, 2026

🦞 OpenClaw 生态日报 2026-03-21 gsscsd/big_model_radar#71

Open

cuiyuebing added this to QwenPaw Mar 25, 2026

github-project-automation Bot moved this to Todo in QwenPaw Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support audio content using top-level data field#1896

fix: support audio content using top-level data field#1896
sidonsoft wants to merge 4 commits intoagentscope-ai:mainfrom
sidonsoft:fix/audio-data-source-compat

sidonsoft commented Mar 20, 2026

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

zhijianma commented Mar 20, 2026

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

zhijianma commented Mar 20, 2026 •

edited

Loading

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

zhijianma commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sidonsoft commented Mar 20, 2026

Summary

What changed

Why this approach

Validation

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

zhijianma commented Mar 20, 2026

Uh oh!

sidonsoft commented Mar 20, 2026

Uh oh!

zhijianma commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sidonsoft commented Mar 20, 2026

Upstream Fix

Summary

Coordination

Uh oh!

sidonsoft commented Mar 20, 2026

Bug Found: Bare paths returned without file:// scheme

The Problem

Fix

Uh oh!

zhijianma commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhijianma commented Mar 20, 2026 •

edited

Loading