🎯 The Goal / Use Case
When users send non-image files (documents, audio, video, etc.) via Feishu, PicoClaw currently:
- Downloads the file to a local temp path (e.g.
/tmp/picoclaw_media/...), and
- Stores it in
MediaStore, but
- Does not expose the local path to the agent in the message text.
As a result, the agent cannot easily use tools like read_file to open the file that the user just sent, because it has no way to discover the corresponding local path from the conversation context.
The goal is to:
- Let the agent see and use the local filesystem path of inbound Feishu files, so it can read, analyze, or transform user-uploaded files via existing file tools (e.g.
read_file).
💡 Proposed Solution
For Feishu inbound messages of type:
Include the resolved local filesystem path in the human-visible message content in a structured tag format.
For example:
Similarly for audio / video:
recording.ogg [audio:/tmp/picoclaw_media/...]
video.mp4 [video:/tmp/picoclaw_media/...]
This gives the agent a stable, parseable way to locate the exact file that was received.
🛠 Potential Implementation (Optional)
Current Feishu inbound flow (simplified):
handleMessageReceive:
content := extractContent(messageType, rawContent)
mediaRefs := c.downloadInboundMedia(...) → returns []string of media:// refs
content = appendMediaTags(content, messageType, mediaRefs)
In handleMessageReceive (for MsgTypeFile, MsgTypeAudio, MsgTypeMedia), after resolving mediaRefs, instead of only appending a generic [file] / [audio] / [video] tag, derive the local path via MediaStore.Resolve(ref) and build structured tags:
- Get
store := c.GetMediaStore()
- For each
ref in mediaRefs:
localPath, err := store.Resolve(ref)
- Build tags like:
[file:localPath], [audio:localPath], [video:localPath]
- Append these tags to
content:
- If there is filename text already:
content = content + " " + strings.Join(parts, " ")
- Else:
content = strings.Join(parts, " ")
- Fallback: if
Resolve fails or mediaRefs is empty, keep current appendMediaTags behavior.
This keeps behavior unchanged for images and for cases where the local path is not available, while exposing extra power when it is.
🚦 Impact & Roadmap Alignment
Rationale:
- File understanding and manipulation is a core capability for personal agents.
- Enabling agents to directly read user-uploaded files via existing tools is central to many workflows (code review, document analysis, data inspection, etc.).
🔄 Alternatives Considered
-
Add a dedicated tool like resolve_media_ref exposed to the agent
- The agent would call a tool with
media://... and receive the local path.
- This is cleaner from a separation-of-concerns perspective but requires adding and wiring a new tool interface.
- It also assumes the agent always remembers or can see the raw
media:// refs, which is not always true.
-
Embed file content directly as base64 for all file types
- Similar to how images are converted to
data:image/...;base64,....
- This can be very heavy for large documents/binaries and is not ideal for tools like
read_file that want a real path.
-
Store the path only in metadata, not in message content
- Safer from a user-facing perspective, but the LLM/agent would still lack a direct, parseable reference in the text channel.
The proposed [file:/path] approach is the simplest way to make this power available to the agent immediately, leveraging existing tools.
💬 Additional Context
- Feishu inbound media is already downloaded to a temp directory via
downloadResource (e.g. /tmp/picoclaw_media), and registered in MediaStore with media:// refs.
- For images, a separate pipeline converts refs to base64 data URLs for multi-modal LLMs; this proposal focuses specifically on non-image file types (documents, audio, video), where path-based tools like
read_file are more appropriate.
- The structured tag format
[file:/absolute/path] is:
- Easy for the agent to parse,
- Backward-compatible with existing plain-text content, and
- Does not change the underlying MediaStore or storage semantics.
🎯 The Goal / Use Case
When users send non-image files (documents, audio, video, etc.) via Feishu, PicoClaw currently:
/tmp/picoclaw_media/...), andMediaStore, butAs a result, the agent cannot easily use tools like
read_fileto open the file that the user just sent, because it has no way to discover the corresponding local path from the conversation context.The goal is to:
read_file).💡 Proposed Solution
For Feishu inbound messages of type:
fileaudiomedia(video)Include the resolved local filesystem path in the human-visible message content in a structured tag format.
For example:
File with filename:
report.pdf [file:/tmp/picoclaw_media/om_xxx-filekey.pdf]File without filename:
[file:/tmp/picoclaw_media/om_xxx-filekey]Similarly for audio / video:
recording.ogg [audio:/tmp/picoclaw_media/...]video.mp4 [video:/tmp/picoclaw_media/...]This gives the agent a stable, parseable way to locate the exact file that was received.
🛠 Potential Implementation (Optional)
Current Feishu inbound flow (simplified):
handleMessageReceive:content := extractContent(messageType, rawContent)mediaRefs := c.downloadInboundMedia(...)→ returns[]stringofmedia://refscontent = appendMediaTags(content, messageType, mediaRefs)In
handleMessageReceive(forMsgTypeFile,MsgTypeAudio,MsgTypeMedia), after resolvingmediaRefs, instead of only appending a generic[file]/[audio]/[video]tag, derive the local path viaMediaStore.Resolve(ref)and build structured tags:store := c.GetMediaStore()refinmediaRefs:localPath, err := store.Resolve(ref)[file:localPath],[audio:localPath],[video:localPath]content:content = content + " " + strings.Join(parts, " ")content = strings.Join(parts, " ")Resolvefails ormediaRefsis empty, keep currentappendMediaTagsbehavior.This keeps behavior unchanged for images and for cases where the local path is not available, while exposing extra power when it is.
🚦 Impact & Roadmap Alignment
Rationale:
🔄 Alternatives Considered
Add a dedicated tool like
resolve_media_refexposed to the agentmedia://...and receive the local path.media://refs, which is not always true.Embed file content directly as base64 for all file types
data:image/...;base64,....read_filethat want a real path.Store the path only in metadata, not in message content
The proposed
[file:/path]approach is the simplest way to make this power available to the agent immediately, leveraging existing tools.💬 Additional Context
downloadResource(e.g./tmp/picoclaw_media), and registered inMediaStorewithmedia://refs.read_fileare more appropriate.[file:/absolute/path]is: