fix: support audio content using top-level data field#1896
fix: support audio content using top-level data field#1896sidonsoft wants to merge 4 commits intoagentscope-ai:mainfrom
Conversation
|
Follow-up: I found and fixed the actual live Telegram voice-note path too. In the running app, the failing block was arriving as: {
"type": "audio",
"source": {
"type": "base64",
"media_type": "audio/None",
"data": "file:///.../telegram/voice-....oga"
}
}So there were really two compatibility issues:
This PR now handles both cases by normalizing local paths / file URIs into URL-style sources before the base64 decoder is reached. Also fixed Updated validation: .venv/bin/python -m pytest -q tests/unit/agents/utils/test_message_processing.py
# 3 passed
.venv/bin/python -m pytest -q tests/unit/workspace/test_prompt.py tests/unit/cli/test_cli_version.py
# 6 passed |
|
Quick reviewer note: the current PR now covers all of the failure shapes I saw locally, not just one Telegram variant. The compatibility handling now covers:
So the intent of the PR is:
Targeted tests were added for:
|
|
Plz see my test |
|
Thanks — yes, this aligns with the intended behavior. The important point is that the processing layer should normalize audio inputs based on what the payload actually is, not just the nominal So if an audio block comes through as:
then converting it into a URL-style source before downstream processing is the correct compatibility behavior. That is also why I’ve been framing this as a message-processing normalization issue rather than a Telegram-only workaround. Your console-uploaded MP3 example shows the same mismatch can appear in other paths too, not just Telegram voice notes. So from my side, your test is a good confirmation that the normalization approach is the right shape for the fix. |
|
You may have overlooked a crucial processing step. In the runner, there is a conversion from Runtime Message to AgentScope Message that transforms AudioContent into AudioBlock. type_mapping = {
"text": (TextBlock, "text"),
"image": (ImageBlock, "image_url"),
"audio": (AudioBlock, "data"),
"data": (TextBlock, "data"),
"video": (VideoBlock, "video_url"),
"file": (FileBlock, "file_url"),
}
....
....
for cnt in message.content: # AudioContent in AgenetScope-Runtime
cnt_type = cnt.type or "text"
block_cls, attr_name = type_mapping[cnt_type] #AudioBlock in AgentScope
value = getattr(cnt, attr_name)
elif cnt_type == "audio":
if (
value
and isinstance(value, str)
and value.startswith(
"data:",
)
):
mediatype_part = value.split(";")[0].replace(
"data:",
"",
)
base64_data = value.split(",")[1]
base64_source = Base64Source(
type="base64",
media_type=mediatype_part,
data=base64_data,
)
msg_content.append(
block_cls(type=cnt_type, source=base64_source),
)
else:
parsed_url = urlparse(value)
if parsed_url.scheme and parsed_url.netloc:
url_source = URLSource(type="url", url=value)
msg_content.append(
block_cls(type=cnt_type, source=url_source),
)
else:
audio_extension = getattr(cnt, "format")
base64_source = Base64Source(
type="base64",
media_type=f"audio/{audio_extension}",
data=value,
)
msg_content.append(
block_cls(type=cnt_type, source=base64_source),
) |
Audio content with local file paths (e.g., /tmp/voice.ogg) were incorrectly wrapped as Base64Source instead of being converted to file:// URLs. Changes: - Add os.path.isfile() check before falling back to base64 - Convert local file paths to file:// URLs using Path.as_uri() - Fix getattr(cnt, 'format') to use default None to prevent AttributeError - Guard media_type construction to avoid 'audio/None' strings Related: agentscope-ai/QwenPaw#1896
Upstream FixI've opened a matching PR in agentscope-runtime that fixes the root cause identified in #1896 (comment): agentscope-ai/agentscope-runtime#466 SummaryThe runner's if value and isinstance(value, str) and os.path.isfile(value):
# Local file path → convert to file:// URL
url_source = URLSource(
type="url",
url=Path(value).as_uri(),
media_type=f"audio/{audio_extension}" if audio_extension else None,
)Coordination
Both PRs should be merged. The downstream fix in this PR remains valuable for:
|
…d_filename The previous implementation returned bare paths (e.g., /tmp/voice.ogg) as URLs without the file:// scheme, causing downstream download_file_from_url() to fail when trying to fetch them as remote URLs. Now properly normalizes: - Full URLs (https://, http://) → pass through unchanged - file:// URLs → pass through unchanged - Bare local paths that exist → convert to file:// URL with media_type - Unknown/invalid → pass through as-is (may be base64)
Bug Found: Bare paths returned without file:// schemeThe The Problemreturn {"type": "url", "url": data}, filename # data = "/tmp/voice.ogg"This gets passed to FixNormalize bare local paths to if parsed.scheme and parsed.netloc:
# Full URL (https://, http://, etc.)
return {"type": "url", "url": data}, filename
elif parsed.scheme == "file":
# Already a file:// URL
return {"type": "url", "url": data}, filename
elif os.path.isfile(data):
# Bare local path → convert to file:// URL
return {
"type": "url",
"url": Path(data).as_uri(),
"media_type": _media_type_from_path(data),
}, filename
else:
# Unknown - pass through as-is (may be base64 or invalid)
return {"type": "url", "url": data}, filenameI've pushed this fix to a branch on my fork. The fix should be incorporated into this PR before merging. |
|
So, do you think this PR still needs to be merged? |
Audio content with local file paths (e.g., /tmp/voice.ogg) were incorrectly wrapped as Base64Source instead of being converted to file:// URLs. Changes: - Add os.path.isfile() check before falling back to base64 - Convert local file paths to file:// URLs using Path.as_uri() - Fix getattr(cnt, 'format') to use default None to prevent AttributeError - Guard media_type construction to avoid 'audio/None' strings Related: agentscope-ai/QwenPaw#1896
Summary
Fix audio/voice message processing when an audio block uses a top-level
datafield instead of asourcedict.This affects the path used by Telegram voice/audio content in
v0.1.0, whereAudioContentis created withdata=...butmessage_processing.pyonly looks forsource.What changed
src/copaw/agents/utils/message_processing.py:block_type == "audio", ifsourceis missing butdatais present, normalize it into a{"type": "url", "url": ...}source_extract_source_and_filename()with Telegram-style audio blocks_process_single_block()using an audio block withdata=file://...Why this approach
This is a smaller, safer compatibility fix than changing Telegram channel output. It also makes the processing layer more data-agnostic for any path that emits
AudioContent(data=...).Validation
Targeted tests:
Result:
2 passedAdditional sanity slice:
Result:
6 passedCloses #1516