Skip to content

[Feature] Expose local file path in Feishu inbound messages to enable agent file tools #1506

@huer512

Description

@huer512

🎯 The Goal / Use Case

When users send non-image files (documents, audio, video, etc.) via Feishu, PicoClaw currently:

  • Downloads the file to a local temp path (e.g. /tmp/picoclaw_media/...), and
  • Stores it in MediaStore, but
  • Does not expose the local path to the agent in the message text.

As a result, the agent cannot easily use tools like read_file to open the file that the user just sent, because it has no way to discover the corresponding local path from the conversation context.

The goal is to:

  • Let the agent see and use the local filesystem path of inbound Feishu files, so it can read, analyze, or transform user-uploaded files via existing file tools (e.g. read_file).

💡 Proposed Solution

For Feishu inbound messages of type:

  • file
  • audio
  • media (video)

Include the resolved local filesystem path in the human-visible message content in a structured tag format.

For example:

  • File with filename:
    report.pdf [file:/tmp/picoclaw_media/om_xxx-filekey.pdf]

  • File without filename:
    [file:/tmp/picoclaw_media/om_xxx-filekey]

Similarly for audio / video:

  • recording.ogg [audio:/tmp/picoclaw_media/...]
  • video.mp4 [video:/tmp/picoclaw_media/...]

This gives the agent a stable, parseable way to locate the exact file that was received.

🛠 Potential Implementation (Optional)

Current Feishu inbound flow (simplified):

  • handleMessageReceive:
    • content := extractContent(messageType, rawContent)
    • mediaRefs := c.downloadInboundMedia(...) → returns []string of media:// refs
    • content = appendMediaTags(content, messageType, mediaRefs)

In handleMessageReceive (for MsgTypeFile, MsgTypeAudio, MsgTypeMedia), after resolving mediaRefs, instead of only appending a generic [file] / [audio] / [video] tag, derive the local path via MediaStore.Resolve(ref) and build structured tags:

  • Get store := c.GetMediaStore()
  • For each ref in mediaRefs:
    • localPath, err := store.Resolve(ref)
    • Build tags like: [file:localPath], [audio:localPath], [video:localPath]
  • Append these tags to content:
    • If there is filename text already:
      content = content + " " + strings.Join(parts, " ")
    • Else:
      content = strings.Join(parts, " ")
  • Fallback: if Resolve fails or mediaRefs is empty, keep current appendMediaTags behavior.

This keeps behavior unchanged for images and for cases where the local path is not available, while exposing extra power when it is.

🚦 Impact & Roadmap Alignment

  • This is a Core Feature
  • This is a Nice-to-Have / Enhancement
  • This aligns with the current Roadmap

Rationale:

  • File understanding and manipulation is a core capability for personal agents.
  • Enabling agents to directly read user-uploaded files via existing tools is central to many workflows (code review, document analysis, data inspection, etc.).

🔄 Alternatives Considered

  1. Add a dedicated tool like resolve_media_ref exposed to the agent

    • The agent would call a tool with media://... and receive the local path.
    • This is cleaner from a separation-of-concerns perspective but requires adding and wiring a new tool interface.
    • It also assumes the agent always remembers or can see the raw media:// refs, which is not always true.
  2. Embed file content directly as base64 for all file types

    • Similar to how images are converted to data:image/...;base64,....
    • This can be very heavy for large documents/binaries and is not ideal for tools like read_file that want a real path.
  3. Store the path only in metadata, not in message content

    • Safer from a user-facing perspective, but the LLM/agent would still lack a direct, parseable reference in the text channel.

The proposed [file:/path] approach is the simplest way to make this power available to the agent immediately, leveraging existing tools.

💬 Additional Context

  • Feishu inbound media is already downloaded to a temp directory via downloadResource (e.g. /tmp/picoclaw_media), and registered in MediaStore with media:// refs.
  • For images, a separate pipeline converts refs to base64 data URLs for multi-modal LLMs; this proposal focuses specifically on non-image file types (documents, audio, video), where path-based tools like read_file are more appropriate.
  • The structured tag format [file:/absolute/path] is:
    • Easy for the agent to parse,
    • Backward-compatible with existing plain-text content, and
    • Does not change the underlying MediaStore or storage semantics.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions