Agents

LLM agents run as native DimOS modules. They subscribe to camera, LiDAR, odometry, and spatial memory streams and they control the robot through skills.

Architecture

Human Input ──→ Agent ──→ Skill Calls ──→ Robot
  (text/voice)     │         (RPC)
                   │
          subscribes to streams:
          color_image, odom, spatial_memory

Agent (dimos/agents/agent.py) is a Module with:

human_input: In[str]: receives text from humancli, WebInput, or agent-send
agent: Out[BaseMessage]: publishes agent responses (text, tool calls, images)
agent_idle: Out[bool]: signals when the agent is waiting for input

The agent uses LangGraph with a configurable LLM. The default is gpt-4o and you need to provide an OPENAI_API_KEY environment variable. On startup, it discovers all @skill-annotated methods across deployed modules via RPC and exposes them as LangChain tools.

Skills

Skills are methods decorated with @skill on any Module. The agent discovers them automatically at startup.

from dimos.agents.annotation import skill
from dimos.core.module import Module

class MySkillContainer(Module):
    @skill
    def wave_hello(self) -> str:
        """Wave at the nearest person."""
        # ... robot control logic ...
        return "Waving!"

Rules:

Parameters must be JSON-serializable primitives (str, int, float, bool, list, dict).
Docstrings become the tool description the LLM sees. Write them clearly so the agent has sufficent context.
The function must return a string or image which with be used by the agent to decide what to do next.

Built-in Skills

Skill	Module	Description
`relative_move(forward, left, degrees)`	`UnitreeSkillContainer`	Move robot relative to current position
`execute_sport_command(command_name)`	`UnitreeSkillContainer`	Unitree sport commands (sit, stand, flip, etc.)
`wait(seconds)`	`UnitreeSkillContainer`	Pause execution
`observe()`	`GO2Connection`	Capture and return current camera frame
`navigate_with_text(query)`	`NavigationSkillContainer`	Navigate to a location by description
`tag_location(name)`	`NavigationSkillContainer`	Tag current position for later recall
`stop_navigation()`	`NavigationSkillContainer`	Cancel current navigation goal
`follow_person(query)`	`PersonFollowSkill`	Visual servoing to follow a described person
`stop_following()`	`PersonFollowSkill`	Stop person following
`speak(text)`	`SpeakSkill`	Text-to-speech through robot speakers
`where_am_i()`	`GoogleMapsSkillContainer`	Current street/area from GPS
`get_gps_position_for_queries(queries)`	`GoogleMapsSkillContainer`	Look up GPS coordinates
`set_gps_travel_points(points)`	`GPSNavSkill`	Navigate via GPS waypoints
`map_query(query)`	`OsmSkill`	Search OpenStreetMap with VLM

MCP

There is also an MCP implementation. It replaces the Agent with two modules: McpServer and McpClient.

McpServer exposes the methods annotated with @skill as MCP tools. Any external client can connect to the server to use the MCP tools.
McpClient has a LangGraph LLM which calls MCP tools from McpServer.

CLI access:

dimos mcp list-tools                                # List available skills
dimos mcp call relative_move --arg forward=0.5      # Call a skill
dimos mcp status                                    # Server status

Input Methods

Method	How it works
`humancli`	Standalone terminal — type messages, see responses
`dimos agent-send "text"`	One-shot CLI command via LCM
`WebInput`	Web interface at localhost:7779 with optional Whisper STT

Models

Config	Model	Notes
Default	`gpt-4o`	Best quality, requires `OPENAI_API_KEY`
`ollama:llama3.1`	Local Ollama	Requires `ollama serve` running
Custom	Any LangChain-compatible	Set via `AgentConfig(model="...")`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agents

Architecture

Skills

Built-in Skills

MCP

Input Methods

Models

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

Agents

Architecture

Skills

Built-in Skills

MCP

Input Methods

Models