API and usage reference for llm-do. For theory, see theory.md. For internals, see architecture.md.
Agent files (.agent) can declare a Pydantic input model so agent calls (and tool-call
planning) use a structured contract:
---
name: evaluator
input_model_ref: schemas.py:PitchInput
---Supported forms:
module.Classpath.py:Class(relative to the agent file)
Input models must subclass AgentArgs and implement prompt_messages(). Input is passed as
a dict (validated into the input model) or as an AgentArgs instance:
# With attachments
await ctx.deps.call_agent("agent_name", {"input": "text", "attachments": ["file.pdf"]})If no input model is declared, the default is PromptInput (input + optional attachments).
For custom input models, subclass AgentArgs:
from llm_do.runtime import PromptContent, AgentArgs
class PitchInput(AgentArgs):
input: str
company_name: str
def prompt_messages(self) -> list[PromptContent]:
return [f"Evaluate {self.company_name}: {self.input}"]This input model shapes tool-call arguments and validates inputs before the agent runs.
Entry selection is explicit in the manifest:
entry.agentselects an agent name from.agentfiles (runs as an AgentEntry)entry.functionselects a Python function viapath.py:function(must be listed inpython_files) and wraps it as a FunctionEntry
If the target cannot be resolved, loading fails with a descriptive error.
Python code can invoke agents in two contexts:
- From entry functions — using
ctx.call_agent()directly on theCallContext - From within tools — using
ctx.deps.call_agent()wherectx.depsis theCallContext
async def call_agent(spec_or_name: AgentSpec | str, input_data: Any) -> AnyInvokes an agent by name (looked up in the registry) or by AgentSpec directly.
Parameters:
spec_or_name: Agent name (string) orAgentSpecinstanceinput_data: Input payload—can be:dict: Validated into the agent's input model (default expects"input"plus optional"attachments")AgentArgs: Custom schema instance
Returns: The agent's output (typically a string)
Raises: RuntimeError if max_depth is exceeded
Example:
# From entry function
async def main(input_data, ctx: CallContext) -> str:
result = await ctx.call_agent("analyzer", {"input": "data"})
return result
# From tool
@tools.tool
async def my_tool(ctx: RunContext[CallContext], data: str) -> str:
return await ctx.deps.call_agent("analyzer", {"input": data})If you pass an AgentSpec directly, its model must already be a resolved Model
instance. Use resolve_model(...) (or pass a PydanticAI model object):
from llm_do import resolve_model
from llm_do.runtime import AgentSpec
spec = AgentSpec(
name="analyzer",
instructions="Analyze input.",
model=resolve_model("anthropic:claude-haiku-4-5"),
)
result = await ctx.deps.call_agent(spec, {"input": "input text"})Use Runtime to create a shared execution environment and run an entry:
from pathlib import Path
from llm_do.project import (
EntryConfig,
build_registry,
build_registry_host_wiring,
resolve_entry,
)
from llm_do.runtime import RunApprovalPolicy, Runtime
async def main():
project_root = Path(".").resolve()
registry = build_registry(
["analyzer.agent"],
[],
project_root=project_root,
**build_registry_host_wiring(project_root),
)
entry = resolve_entry(
EntryConfig(agent="analyzer"),
registry,
python_files=[],
base_path=project_root,
)
runtime = Runtime(
run_approval_policy=RunApprovalPolicy(mode="approve_all"),
project_root=project_root,
)
runtime.register_agents(registry.agents)
result, ctx = await runtime.run_entry(
entry,
input_data={"input": "Analyze this data"},
)
print(result)Runtime.run_entry():
- Creates a fresh entry runtime (NullModel, no toolsets) for the entry function
- Reuses runtime-scoped state (usage, approval cache, message log)
- Runtime state is process-scoped (in-memory only, not persisted beyond the process)
- Returns both the result and the runtime context
build_registry() returns an AgentRegistry and requires explicit host wiring.
Use build_registry_host_wiring(project_root) for standard CLI-equivalent host toolsets.
AgentRegistry is a thin container around the agents mapping, so pass the same root
to Runtime and register registry.agents to keep filesystem toolsets and attachment
resolution aligned.
Agent files resolve model identifiers when building the registry (or when dynamic
agents are created). AgentSpec.model always stores a resolved Model instance.
Entry functions run under NullModel (no toolsets), so direct LLM calls from entry
code are not allowed.
Parameters:
| Parameter | Description |
|---|---|
entry |
Entry to run (AgentEntry or FunctionEntry) |
input_data |
Input payload (dict validated into the input model, or AgentArgs) |
message_history |
Pre-seed conversation history for the top-level call scope |
Use Runtime.run() for sync execution when you already have an entry object.
For chat-style flows, carry forward message_history between turns:
from pathlib import Path
from llm_do.project import (
EntryConfig,
build_registry,
build_registry_host_wiring,
resolve_entry,
)
from llm_do.runtime import Runtime
async def main():
project_root = Path(".").resolve()
registry = build_registry(
["assistant.agent"],
[],
project_root=project_root,
**build_registry_host_wiring(project_root),
)
entry = resolve_entry(
EntryConfig(agent="assistant"),
registry,
python_files=[],
base_path=project_root,
)
runtime = Runtime(project_root=project_root)
runtime.register_agents(registry.agents)
message_history = None
result, ctx = await runtime.run_entry(entry, {"input": "turn 1"})
message_history = list(ctx.frame.messages)
result, ctx = await runtime.run_entry(
entry,
{"input": "turn 2"},
message_history=message_history,
)The top-level agent consumes message_history on each turn at depth 0.
Tools can access the runtime to call other agents. This enables hybrid patterns where deterministic Python code orchestrates LLM reasoning.
Accepting the Runtime Context:
To access the runtime, accept RunContext[CallContext] as the first parameter:
from pydantic_ai.tools import RunContext
from pydantic_ai.toolsets import FunctionToolset
from llm_do.runtime import CallContext
def build_tools(_ctx: RunContext[CallContext]) -> FunctionToolset:
tools = FunctionToolset()
@tools.tool
async def my_tool(ctx: RunContext[CallContext], data: str) -> str:
"""Tool that can call agents."""
result = await ctx.deps.call_agent("agent_name", {"input": data})
return result
return tools
TOOLSETS = {"tools": build_tools}The ctx parameter is automatically injected by PydanticAI and excluded from the tool schema the LLM sees.
Calling Agents:
Use ctx.deps.call_agent(spec_or_name, input_data) to invoke an agent by name or AgentSpec:
@tools.tool
async def orchestrate(ctx: RunContext[CallContext], task: str) -> str:
# Call an LLM agent
analysis = await ctx.deps.call_agent("analyzer", {"input": task})
return analysisRunContext.prompt is derived from AgentArgs.prompt_messages() for logging/UI
only; tools should rely on their typed args and use ctx.deps only for delegation.
The input_data argument can be a dict (validated into the input model) or an AgentArgs instance.
Available Runtime State:
Via ctx.deps (a CallContext), tools can access:
| Property | Description |
|---|---|
call_agent(spec_or_name, input_data) |
Invoke an agent by name or AgentSpec |
frame.config.depth |
Current nesting depth |
frame.config.model |
Resolved Model instance for this call |
frame.prompt |
Current prompt string |
frame.messages |
Conversation history |
config.max_depth |
Maximum allowed depth |
config.project_root |
Project root path |
A common pattern is using a Python function as the entry point for deterministic orchestration:
from pathlib import Path
from llm_do.runtime import CallContext
async def main(_input_data, runtime: CallContext) -> str:
"""Orchestrate evaluation of multiple files."""
files = list(Path("input").glob("*.pdf")) # deterministic
results = []
for f in files:
# LLM agent handles reasoning
report = await runtime.call_agent(
"evaluator",
{"input": "Analyze this file.", "attachments": [str(f)]},
)
Path(f"output/{f.stem}.md").write_text(report) # deterministic
results.append(f.stem)
return f"Processed {len(results)} files"Run with a manifest that includes tools.py and evaluator.agent, and set:
entry.function: "tools.py:main" in project.json.
If you want to create the entry manually (outside the manifest flow), wrap it:
FunctionEntry(name="main", fn=main).
FunctionEntry fields:
name: Entry name for logging/eventsfn: Async function called for the entryinput_model:AgentArgssubclass for input normalization (defaults toPromptInput)
Convenience helper:
FunctionEntry.from_function(fn)creates an entry usingfn.__name__as the name.
The entry function receives:
- An
AgentArgsinstance validated against the entry'sinput_model(default:PromptInput)
Note: Entry functions are trusted code, but agent calls still go through approval
wrappers and follow the run approval policy. To skip prompts, use approve_all
(or drop to raw Python to bypass the tool plane).
Example with custom input model:
from llm_do.runtime import FunctionEntry, AgentArgs, PromptContent, CallContext
class TaggedInput(AgentArgs):
input: str
tag: str
def prompt_messages(self) -> list[PromptContent]:
return [f"{self.input}:{self.tag}"]
async def main(args: TaggedInput, _runtime: CallContext) -> str:
return args.tag
ENTRY = FunctionEntry(
name="main",
fn=main,
input_model=TaggedInput,
)Stabilize stochastic components to deterministic code as patterns emerge.
Agent handles everything with LLM judgment:
---
name: filename_cleaner
model: anthropic:claude-haiku-4-5
---
Clean the given filename: remove special characters,
normalize spacing, ensure valid extension.Run it repeatedly. Watch what the LLM consistently does:
- Always lowercases
- Replaces spaces with underscores
- Strips leading/trailing whitespace
- Keeps alphanumerics and
.-_
Stable patterns become Python:
@tools.tool
def sanitize_filename(name: str) -> str:
"""Remove special characters from filename."""
name = name.strip().lower()
return "".join(c if c.isalnum() or c in ".-_" else "_" for c in name)Agent still handles ambiguous cases the code can't:
---
name: filename_cleaner
model: anthropic:claude-haiku-4-5
toolsets: [filename_tools]
---
Clean the given filename. Use sanitize_filename for basic cleanup.
For ambiguous cases (is "2024-03" a date or version?), use judgment
to pick the most descriptive format.| Aspect | Before (stochastic) | After (deterministic) |
|---|---|---|
| Cost | Per-token API charges | Effectively free |
| Latency | Network + inference | Microseconds |
| Reliability | May vary | Identical every time |
| Testing | Statistical sampling | Assert equality |
| Approvals | May need user consent | Trusted by default |
The pitchdeck examples demonstrate this:
pitchdeck_eval/— All LLM: orchestrator decides everythingpitchdeck_eval_stabilized/— Extractedlist_pitchdecks()to Pythonpitchdeck_eval_code_entry/— Python orchestration, LLM only for analysis
Soften deterministic code back to stochastic when edge cases multiply or you need new capability.
Need new capability? Write a spec:
---
name: sentiment_analyzer
model: anthropic:claude-haiku-4-5
---
Analyze the sentiment of the given text.
Return: positive, negative, or neutral with confidence score.Now it's callable:
result = await ctx.deps.call_agent("sentiment_analyzer", {"input": feedback})Rigid code drowning in edge cases? A function full of if/elif handling linguistic variations might be better as an LLM call that handles the variation naturally.
Python handles deterministic logic; agents handle judgment:
@tools.tool
async def evaluate_document(ctx: RunContext[CallContext], path: str) -> dict:
content = load_file(path) # deterministic
if not validate_format(content): # deterministic
raise ValueError("Invalid format")
# Stochastic: LLM judgment for analysis
analysis = await ctx.deps.call_agent("content_analyzer", {"input": content})
return { # deterministic
"score": compute_score(analysis),
"analysis": analysis
}Think: "deterministic pipeline that uses LLM where judgment is needed."
Toolsets provide tools to agents. There are two approaches:
The simplest way to create tools. Define functions with the @tools.tool decorator:
from pydantic_ai.tools import RunContext
from pydantic_ai.toolsets import FunctionToolset
from llm_do.runtime import CallContext
def build_calc_tools(_ctx: RunContext[CallContext]) -> FunctionToolset:
calc_tools = FunctionToolset()
@calc_tools.tool
def calculate(expression: str) -> float:
"""Evaluate a mathematical expression."""
return eval(expression) # simplified example
@calc_tools.tool
async def fetch_data(url: str) -> str:
"""Fetch data from a URL."""
async with httpx.AsyncClient() as client:
response = await client.get(url)
return response.text
return calc_tools
TOOLSETS = {"calc_tools": build_calc_tools}Save as tools.py and reference in your agent:
---
name: calculator
model: anthropic:claude-haiku-4-5
toolsets:
- calc_tools
---
You are a helpful calculator...Factories accept a RunContext (you can ignore it) and are called once per
agent run; close over any configuration you need when defining the factory
(e.g., base paths).
Accessing the Runtime:
To call other agents from your tool, accept RunContext[CallContext]:
from pydantic_ai.tools import RunContext
from pydantic_ai.toolsets import FunctionToolset
from llm_do.runtime import CallContext
def build_calc_tools(_ctx: RunContext[CallContext]) -> FunctionToolset:
calc_tools = FunctionToolset()
@calc_tools.tool
async def analyze(ctx: RunContext[CallContext], text: str) -> str:
"""Analyze text using another agent."""
return await ctx.deps.call_agent("sentiment_analyzer", {"input": text})
return calc_toolsFor more control over tool behavior, approval logic, and configuration, extend AbstractToolset:
from typing import Any
from pydantic_ai.tools import ToolDefinition
from pydantic_ai.toolsets import AbstractToolset, ToolsetTool
from pydantic_ai_blocking_approval import ApprovalResult
class MyToolset(AbstractToolset[Any]):
"""Custom toolset with configuration and approval logic."""
def __init__(self, config: dict):
self._config = config
self._require_approval = config.get("require_approval", True)
async def get_tools(self, ctx: Any) -> dict[str, ToolsetTool[Any]]:
"""Define available tools."""
return {
"my_tool": ToolsetTool(
toolset=self,
tool_def=ToolDefinition(
name="my_tool",
description="Does something useful",
parameters_json_schema={
"type": "object",
"properties": {
"input": {"type": "string"}
},
"required": ["input"]
},
),
),
}
async def call_tool(
self,
name: str,
tool_args: dict[str, Any],
ctx: Any,
tool: ToolsetTool[Any],
) -> Any:
"""Handle tool calls."""
if name == "my_tool":
return f"Processed: {tool_args['input']}"
raise ValueError(f"Unknown tool: {name}")
def needs_approval(
self,
name: str,
tool_args: dict[str, Any],
ctx: Any,
config: Any = None,
) -> ApprovalResult:
"""Control which calls need approval."""
if self._require_approval:
return ApprovalResult.needs_approval()
return ApprovalResult.pre_approved()
def get_approval_description(
self,
name: str,
tool_args: dict[str, Any],
ctx: Any,
) -> str:
"""Human-readable description for approval prompts."""
return f"{name}({tool_args.get('input', '')})"Register it with a factory so each call gets a fresh instance:
from pydantic_ai.tools import RunContext
from llm_do.runtime import CallContext
def build_my_toolset(_ctx: RunContext[CallContext]) -> MyToolset:
return MyToolset(config={"require_approval": True})
TOOLSETS = {"my_toolset": build_my_toolset}Toolset configuration lives in the toolset factory in Python. Agent YAML
only references toolset names, so you define any config when building
the toolset in a .py file:
from pydantic_ai.tools import RunContext
from pydantic_ai.toolsets import FunctionToolset
from llm_do.runtime import CallContext
from llm_do.toolsets import FileSystemToolset
def build_calc_tools(_ctx: RunContext[CallContext]) -> FunctionToolset:
return FunctionToolset()
def build_filesystem(_ctx: RunContext[CallContext]) -> FileSystemToolset:
return FileSystemToolset(config={"base_path": "./data", "write_approval": True})
TOOLSETS = {
"calc_tools": build_calc_tools,
"filesystem_data": build_filesystem,
}Then reference the toolset names in your agent:
toolsets:
- calc_tools
- filesystem_dataIf you need to pre-approve specific tools, attach an approval config dict:
from pydantic_ai.toolsets import FunctionToolset
from llm_do.toolsets.approval import set_toolset_approval_config
def build_calc_tools():
tools = FunctionToolset()
set_toolset_approval_config(
tools,
{
"add": {"pre_approved": True},
"multiply": {"pre_approved": True},
},
)
return toolsDependencies:
Toolset instances are created per call in Python, so pass any dependencies directly in the factory (e.g., base paths or sandbox handles).
filesystem_project uses the project root passed to build_registry (the manifest
directory in the CLI).
| Name | Class | Tools |
|---|---|---|
filesystem_cwd |
FileSystemToolset |
read_file, write_file, list_files (base: CWD) |
filesystem_cwd_ro |
ReadOnlyFileSystemToolset |
read_file, list_files (base: CWD) |
filesystem_project |
FileSystemToolset |
read_file, write_file, list_files (base: project root) |
filesystem_project_ro |
ReadOnlyFileSystemToolset |
read_file, list_files (base: project root) |
shell_readonly |
ShellToolset |
Read-only shell commands (whitelist) |
shell_file_ops |
ShellToolset |
ls (pre-approved) + mv (approval required) |
Agents are defined in .agent files with YAML frontmatter:
---
name: my_agent
model: anthropic:claude-haiku-4-5
tools:
- normalize_path
toolsets:
- filesystem_project
- shell_readonly
- calc_tools
---
System prompt goes here...
You have access to filesystem and shell tools.Frontmatter Fields:
| Field | Required | Description |
|---|---|---|
name |
Yes | Agent identifier (used for ctx.deps.call_agent()) |
description |
No | Tool description when the agent is exposed as a tool (falls back to instructions) |
model |
No | Model identifier (e.g., anthropic:claude-haiku-4-5), resolved on load; falls back to LLM_DO_MODEL if omitted |
compatible_models |
No | List of acceptable model patterns for the LLM_DO_MODEL fallback (mutually exclusive with model) |
input_model_ref |
No | Input model reference (see Agent Input Models) |
server_side_tools |
No | Server-side tool configs (e.g., web search) |
tools |
No | List of tool names |
toolsets |
No | List of toolset names |
Model Format:
Models use the format provider:model-name:
anthropic:claude-haiku-4-5openai:gpt-4o-miniollama:llama3
When constructing AgentSpec in Python, use resolve_model("provider:model-name")
to turn these identifiers into Model instances.
Custom Providers:
To use a custom provider with LLM_DO_MODEL, register a model factory in a Python file that gets
imported when your project loads (e.g., add it to python_files in project.json):
Custom prefixes are only for non-standard providers. Built-in PydanticAI provider prefixes
(openai, anthropic, ollama, etc.) are resolved by PydanticAI directly and cannot be registered.
# providers.py
from pydantic_ai.models.openai import OpenAIChatModel
from llm_do import register_model_factory
from llm_do.providers import OpenAICompatibleProvider
class AcmeProvider(OpenAICompatibleProvider):
def __init__(self) -> None:
super().__init__(
base_url="http://127.0.0.1:11434/v1",
name="acme",
)
def build_acme(model_name: str) -> OpenAIChatModel:
return OpenAIChatModel(model_name, provider=AcmeProvider())
register_model_factory("acme", build_acme)Then set:
export LLM_DO_MODEL="acme:my-model"Tool & Toolset References:
Tools can be specified as:
- Tool names exported via
TOOLS(dict or list) or__all__in Python files
Toolsets can be specified as:
- Built-in toolset name (e.g.,
filesystem_project,shell_readonly) - Toolset name exported via
TOOLSETS(dict or list), or module-levelAbstractToolsetinstances - Other agent names from
.agentfiles (agents act as toolsets)
Recursive Agents:
Agents can opt into recursion by listing themselves in toolsets:
---
name: explainer
model: anthropic:claude-haiku-4-5
toolsets:
- explainer
---
Explain the topic, and call yourself for missing prerequisites.Recursion is bounded by max_depth (default: 5). Use --max-depth in the CLI
or Runtime(max_depth=...) in Python to adjust it.
Compatible Models:
Use compatible_models when you want the agent to accept the LLM_DO_MODEL
fallback if it matches a pattern, rather than hardcoding model. Patterns use glob matching:
compatible_models:
- "*" # allow any model
- "anthropic:*" # any Anthropic model
- "anthropic:claude-haiku-*" # any Claude Haiku variantCompatibility checks run when resolving the LLM_DO_MODEL fallback during
.agent/dynamic agent creation. If you build AgentSpec in Python, call
select_model(...) yourself if you want compatibility validation. If you set
compatible_models, ensure LLM_DO_MODEL is set to a compatible value.
model and compatible_models are mutually exclusive.
Server-Side Tools:
Use server_side_tools to enable provider-hosted tools:
server_side_tools:
- tool_type: web_search
max_uses: 3
allowed_domains: ["example.com"]Supported tool types:
web_search(options:max_uses,blocked_domains,allowed_domains)web_fetchcode_executionimage_generation
# Run a manifest
llm-do project.json "prompt"
# Run with input JSON
llm-do project.json --input-json '{"input": "prompt"}'
# Set fallback model via env var
LLM_DO_MODEL=anthropic:claude-haiku-4-5 llm-do project.json "prompt"
# TUI / headless output
llm-do project.json --tui
llm-do project.json --headless "prompt"
# Verbose output
llm-do project.json -v "prompt" # basic
llm-do project.json -vv "prompt" # detailedSee cli.md for full CLI documentation.