Skip to content

fix: support audio content using top-level data field#1896

Open
sidonsoft wants to merge 4 commits intoagentscope-ai:mainfrom
sidonsoft:fix/audio-data-source-compat
Open

fix: support audio content using top-level data field#1896
sidonsoft wants to merge 4 commits intoagentscope-ai:mainfrom
sidonsoft:fix/audio-data-source-compat

Conversation

@sidonsoft
Copy link
Copy Markdown
Contributor

Summary

Fix audio/voice message processing when an audio block uses a top-level data field instead of a source dict.

This affects the path used by Telegram voice/audio content in v0.1.0, where AudioContent is created with data=... but message_processing.py only looks for source.

What changed

  • Added compatibility handling in src/copaw/agents/utils/message_processing.py:
    • for block_type == "audio", if source is missing but data is present, normalize it into a {"type": "url", "url": ...} source
  • Added targeted unit tests covering:
    • _extract_source_and_filename() with Telegram-style audio blocks
    • _process_single_block() using an audio block with data=file://...

Why this approach

This is a smaller, safer compatibility fix than changing Telegram channel output. It also makes the processing layer more data-agnostic for any path that emits AudioContent(data=...).

Validation

Targeted tests:

.venv/bin/python -m pytest -q tests/unit/agents/utils/test_message_processing.py

Result: 2 passed

Additional sanity slice:

.venv/bin/python -m pytest -q tests/unit/workspace/test_prompt.py tests/unit/cli/test_cli_version.py

Result: 6 passed

Closes #1516

@sidonsoft sidonsoft had a problem deploying to maintainer-approved March 20, 2026 00:50 — with GitHub Actions Failure
@sidonsoft sidonsoft had a problem deploying to maintainer-approved March 20, 2026 01:00 — with GitHub Actions Failure
@sidonsoft
Copy link
Copy Markdown
Contributor Author

Follow-up: I found and fixed the actual live Telegram voice-note path too.

In the running app, the failing block was arriving as:

{
  "type": "audio",
  "source": {
    "type": "base64",
    "media_type": "audio/None",
    "data": "file:///.../telegram/voice-....oga"
  }
}

So there were really two compatibility issues:

  1. audio blocks using top-level data
  2. audio blocks using source.type == "base64" where source.data is actually a file://... URI, not real base64

This PR now handles both cases by normalizing local paths / file URIs into URL-style sources before the base64 decoder is reached.

Also fixed .oga media type normalization so Telegram voice-note files map to audio/ogg instead of audio/octet-stream.

Updated validation:

.venv/bin/python -m pytest -q tests/unit/agents/utils/test_message_processing.py
# 3 passed

.venv/bin/python -m pytest -q tests/unit/workspace/test_prompt.py tests/unit/cli/test_cli_version.py
# 6 passed

@sidonsoft sidonsoft temporarily deployed to maintainer-approved March 20, 2026 01:11 — with GitHub Actions Inactive
@sidonsoft
Copy link
Copy Markdown
Contributor Author

Quick reviewer note: the current PR now covers all of the failure shapes I saw locally, not just one Telegram variant.

The compatibility handling now covers:

  1. Audio blocks with top-level data

    • e.g. {"type": "audio", "data": ...}
  2. Audio blocks with source.type == "base64" where source.data is actually a local path / file://... URI

    • this was the real live Telegram voice-note failure path in my logs
  3. Pydantic content objects that need model_dump() before processing

    • process_file_and_media_blocks_in_message() now converts model-like blocks to dicts before the normal media path runs
  4. .oga media type normalization

    • mapped to audio/ogg so Telegram voice-note files don’t fall through as audio/octet-stream

So the intent of the PR is:

  • keep channel-side behavior unchanged
  • make message processing more tolerant / data-agnostic
  • normalize multiple equivalent audio-content shapes into one common handling path

Targeted tests were added for:

  • top-level data
  • base64 + file://... source payloads
  • audio normalization through _process_single_block()

@zhijianma
Copy link
Copy Markdown
Member

@sidonsoft

Plz see my test

@sidonsoft
Copy link
Copy Markdown
Contributor Author

Thanks — yes, this aligns with the intended behavior.

The important point is that the processing layer should normalize audio inputs based on what the payload actually is, not just the nominal source.type.

So if an audio block comes through as:

  • source.type == "base64"
  • but source.data is actually a local file path (or file:// URI)

then converting it into a URL-style source before downstream processing is the correct compatibility behavior.

That is also why I’ve been framing this as a message-processing normalization issue rather than a Telegram-only workaround. Your console-uploaded MP3 example shows the same mismatch can appear in other paths too, not just Telegram voice notes.

So from my side, your test is a good confirmation that the normalization approach is the right shape for the fix.

@zhijianma
Copy link
Copy Markdown
Member

zhijianma commented Mar 20, 2026

@sidonsoft

You may have overlooked a crucial processing step.

In the runner, there is a conversion from Runtime Message to AgentScope Message that transforms AudioContent into AudioBlock.

type_mapping = {
                "text": (TextBlock, "text"),
                "image": (ImageBlock, "image_url"),
                "audio": (AudioBlock, "data"),
                "data": (TextBlock, "data"),
                "video": (VideoBlock, "video_url"),
                "file": (FileBlock, "file_url"),
            }
....
....
for cnt in message.content:   # AudioContent  in AgenetScope-Runtime
      cnt_type = cnt.type or "text" 
      block_cls, attr_name = type_mapping[cnt_type]  #AudioBlock in AgentScope
      value = getattr(cnt, attr_name)
      elif cnt_type == "audio":
                    if (
                        value
                        and isinstance(value, str)
                        and value.startswith(
                            "data:",
                        )
                    ):
                        mediatype_part = value.split(";")[0].replace(
                            "data:",
                            "",
                        )
                        base64_data = value.split(",")[1]
                        base64_source = Base64Source(
                            type="base64",
                            media_type=mediatype_part,
                            data=base64_data,
                        )
                        msg_content.append(
                            block_cls(type=cnt_type, source=base64_source),
                        )
                    else:
                        parsed_url = urlparse(value)
                        if parsed_url.scheme and parsed_url.netloc:
                            url_source = URLSource(type="url", url=value)
                            msg_content.append(
                                block_cls(type=cnt_type, source=url_source),
                            )
                        else:
                            audio_extension = getattr(cnt, "format")
                            base64_source = Base64Source(
                                type="base64",
                                media_type=f"audio/{audio_extension}",
                                data=value,
                            )
                            msg_content.append(
                                block_cls(type=cnt_type, source=base64_source),
                            )

sidonsoft pushed a commit to sidonsoft/agentscope-runtime that referenced this pull request Mar 20, 2026
Audio content with local file paths (e.g., /tmp/voice.ogg) were
incorrectly wrapped as Base64Source instead of being converted to
file:// URLs.

Changes:
- Add os.path.isfile() check before falling back to base64
- Convert local file paths to file:// URLs using Path.as_uri()
- Fix getattr(cnt, 'format') to use default None to prevent AttributeError
- Guard media_type construction to avoid 'audio/None' strings

Related: agentscope-ai/QwenPaw#1896
@sidonsoft
Copy link
Copy Markdown
Contributor Author

Upstream Fix

I've opened a matching PR in agentscope-runtime that fixes the root cause identified in #1896 (comment):

agentscope-ai/agentscope-runtime#466

Summary

The runner's message_to_agentscope_msg() now detects local file paths before falling back to base64:

if value and isinstance(value, str) and os.path.isfile(value):
    # Local file path → convert to file:// URL
    url_source = URLSource(
        type="url",
        url=Path(value).as_uri(),
        media_type=f"audio/{audio_extension}" if audio_extension else None,
    )

Coordination

PR Location Purpose
#466 (agentscope-runtime) Root cause Produces correct blocks from the start
#1896 (CoPaw) Downstream Defensive handling for legacy/edge cases

Both PRs should be merged. The downstream fix in this PR remains valuable for:

  1. Backwards compatibility with older runtime versions
  2. Defense-in-depth against similar issues in other code paths

…d_filename

The previous implementation returned bare paths (e.g., /tmp/voice.ogg) as
URLs without the file:// scheme, causing downstream download_file_from_url()
to fail when trying to fetch them as remote URLs.

Now properly normalizes:
- Full URLs (https://, http://) → pass through unchanged
- file:// URLs → pass through unchanged
- Bare local paths that exist → convert to file:// URL with media_type
- Unknown/invalid → pass through as-is (may be base64)
@sidonsoft sidonsoft had a problem deploying to maintainer-approved March 20, 2026 11:49 — with GitHub Actions Failure
@sidonsoft
Copy link
Copy Markdown
Contributor Author

Bug Found: Bare paths returned without file:// scheme

The _extract_source_and_filename function returns bare paths like /tmp/voice.ogg as {"type": "url", "url": data} without normalizing to file:// URLs. This causes downstream download_file_from_url() to fail.

The Problem

return {"type": "url", "url": data}, filename  # data = "/tmp/voice.ogg"

This gets passed to _process_single_file_block which sees parsed.scheme == "" and tries to fetch it as a remote URL.

Fix

Normalize bare local paths to file:// URLs:

if parsed.scheme and parsed.netloc:
    # Full URL (https://, http://, etc.)
    return {"type": "url", "url": data}, filename
elif parsed.scheme == "file":
    # Already a file:// URL
    return {"type": "url", "url": data}, filename
elif os.path.isfile(data):
    # Bare local path → convert to file:// URL
    return {
        "type": "url",
        "url": Path(data).as_uri(),
        "media_type": _media_type_from_path(data),
    }, filename
else:
    # Unknown - pass through as-is (may be base64 or invalid)
    return {"type": "url", "url": data}, filename

I've pushed this fix to a branch on my fork. The fix should be incorporated into this PR before merging.

@zhijianma
Copy link
Copy Markdown
Member

@sidonsoft

So, do you think this PR still needs to be merged?

sidonsoft added a commit to sidonsoft/agentscope-runtime that referenced this pull request Mar 20, 2026
Audio content with local file paths (e.g., /tmp/voice.ogg) were
incorrectly wrapped as Base64Source instead of being converted to
file:// URLs.

Changes:
- Add os.path.isfile() check before falling back to base64
- Convert local file paths to file:// URLs using Path.as_uri()
- Fix getattr(cnt, 'format') to use default None to prevent AttributeError
- Guard media_type construction to avoid 'audio/None' strings

Related: agentscope-ai/QwenPaw#1896
@github-project-automation github-project-automation Bot moved this to Todo in QwenPaw Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

[Bug]: AudioContent not supported in Telegram channel - Fix

3 participants