Single-agent loop runtime on iii-engine. 44 narrow workers, all iii-first.
Status: 0.11.9, 0.x experimental. API surface unstable until production-proven. Live-validated: 10 e2e tests against a real iii engine, 41 lib test crates, 6 TUI render snapshots. Tier 1-4 of the composability ladder verified cross-process (real
policy-denylistbinary blocking real Anthropic tool calls). llm-routerproviderfield upstream (iii-hq/workers#57);RoutingRequestschema fix pending (iii-hq/workers#58).
One curl, one shell. Bootstraps the iii engine, rustup if missing, and builds harness, harness-tui, and harnessd into ~/.local/bin.
curl -fsSL https://raw.githubusercontent.com/iii-experimental/harness/main/install.sh | shBuild from source instead
# 1. iii engine prerequisite
curl -fsSL https://install.iii.dev/iii/main/install.sh | sh
# 2. clone + build (crates.io publish lands at v1.0)
git clone https://github.com/iii-experimental/harness && cd harness
cargo build --release --bin harness --bin harness-tui --bin harnessdZero-config smoke (no API key, no provider account needed):
iii --use-default-config &
harness --provider faux --model echo "say hi"
# → "hello from faux — this is the harness zero-config smoke path."That round-trips a session through the iii bus end-to-end in under a second. Once you see it work, swap the provider for a real one:
export ANTHROPIC_API_KEY=sk-ant-...
harness "summarise this repo and list workspace crates using ls."You should see the agent stream events as it reads the README, calls ls on workers/, and produces a final summary.
The whole point of harness: every capability is one iii worker add away. Watch the agent gain power without touching harness code.
# Tier 1 — baseline
./target/release/harness "run uname -a"
# → tool::bash spawns a host child process. Plain.
# Tier 2 — sandboxed bash (microVM isolation)
iii worker add sandbox
./target/release/harness "run uname -a"
# → tool::bash auto-routes through sandbox::exec. Host filesystem untouched.
# → No flag changed in harness. Bus is the source of truth.
# Tier 3 — smart provider routing (production deployment shape)
# `iii worker add llm-router` from the registry is broken at the time of
# writing (engine returns "missing field `name`"; tracked upstream). For
# now run from source:
git clone https://github.com/iii-hq/workers /tmp/iii-workers
cd /tmp/iii-workers/llm-router
cargo run --release --bin iii-llm-router &
cd -
./target/release/harness "say hi"
# → agent::stream_assistant probes for router::decide on the bus once.
# → When present, harness extracts the last user prompt + routing hints,
# calls router::decide (RoutingRequest -> RoutingDecision), and dispatches
# to the routed provider/model. Provider derivation: response.provider
# field (shipped in llm-router via iii-hq/workers PR #57, merged), else
# split on '/' for namespaced model ids, else fall back to caller hint.
# → When llm-router isn't on the bus, harness dispatches directly. No flag.
# Tier 4 — policy enforcement (denylist + audit + DLP)
cargo run --release --bin policy-denylist -- --deny "bash:rm -rf,sudo" &
cargo run --release --bin audit-log -- --log ~/.harness/audit.jsonl &
cargo run --release --bin dlp-scrubber &
./target/release/harness "delete all logs"
# → before_tool_call fan-out. policy-denylist replies { block: true, reason }.
# Loop blocks the call. Agent sees the block, plans around it.
# → after_tool_call fan-out (parallel). audit-log appends a JSONL line per
# call. dlp-scrubber rewrites secrets (AWS / OpenAI / GitHub / Stripe /
# Google) in the result text via merge_after's `content` override.
# All three subscribers are live-validated in the e2e suite as of 0.11.7.
# Tier 5 — sub-agent recursion + context transforms
./target/release/harness "use run_subagent for the file scan, summarise"
# → tool::run_subagent spawns a focused child loop; bounded at depth 3
# (override via max_subagent_depth) to prevent unbounded recursion.
# → transform_context fan-out runs before every stream_assistant call;
# subscribers can rewrite the in-flight messages array (e.g. context
# compaction, redaction, system-prompt injection).Each tier = one worker addition. No code change in harness. This is iii primitives + narrow workers in practice.
What's verified live, end-to-end:
| Tier | Same-process e2e | Cross-process e2e (real deployment shape) |
|---|---|---|
| 1 (baseline) | ✅ | ✅ — harness --provider anthropic ... over a real engine |
| 2 (sandbox) | ✅ host fallback warns | ⏳ requires an iii-sandbox worker; not in the public registry yet |
| 3 (llm-router) | ✅ via test fake | ✅ once upstream PR #58 (drop deny_unknown_fields on RoutingRequest) lands; harness already logs the router-call failure on tracing::warn! and falls back to direct dispatch |
| 4 (policy-denylist) | ✅ | ✅ — verified driving anthropic with the standalone policy-denylist binary; required hook-timeout bump to 30 s (cross-process pubsub fan-out latency on iii v0.11.x is multi-second) |
| 5 (sub-agent + transform_context) | ✅ | ⏳ same-process tested; cross-process driver pending |
The agent loop is a state machine. Every concern outside the state machine is a worker on the iii bus. Both reference binaries (harness-cli, harness-tui) are thin invokers — they connect to a running iii engine, register the runtime + a provider, trigger agent::run_loop, and consume the per-session events stream.
+---------------------------+
| iii engine |
| (ws://localhost:49134) |
+-------------+-------------+
|
| trigger/publish/subscribe
+---------------+ +-----------------+ | +----------------------+
| harness-cli |---+ | harness-tui |---+ | +---| policy-denylist |
| (thin) | | | (ratatui) | | | | | audit-log |
+-------+-------+ | +--------+--------+ | | | | dlp-scrubber |
| | | | | | | hook-example |
| trigger | subscribe | trigger | | | +----------+-----------+
| run_loop | events | run_loop | | | | subscribe
v | v | v v v
+-------------+-----------+--------------+------------+---+---+--+
| iii bus |
+---+--------+--------+--------+--------+--------+--------+-------+
| | | | | | |
v v v v v v v
+-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
|runtime| |provider| |oauth | |session| |compact| |corpus | |models |
|worker | | x22 | | x5 | | tree | | -ion | | | |catalog|
+-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
12 agent::* provider:: oauth::* session:: subscriber corpus:: models::
8 tool::* <name>:: login/ fork/ on agent:: scan/ list/
4 HTTP stream_ refresh/ clone/ events -> redact/ get/
triggers assistant status compact/ transform_ review/ supports
tree/ context publish
export_html
create/append/messages
+-----------------+ +----------------+
| iii-sandbox | | document- |
| (auto-route | | extract |
| bash) | | |
+-----------------+ +----------------+
^ ^
| tool::bash discovers | document::extract
+--------------------------+
Streams the runtime publishes:
agent::events/<sid> 11 AgentEvent variants
agent::hook_reply/<eid> collected-pubsub replies
Pubsub topics the runtime publishes on:
agent::before_tool_call collect, first-block-wins
agent::after_tool_call collect, field-by-field merge
agent::transform_context collect, last-reply-wins
State (scope `agent`):
session/<id>/steering | followup | abort_signal
Modern agent harnesses bundle the loop, the tool sandbox, the provider clients, the session storage, and the UI into one process. Works at small scale, fails at ecosystem scale: tools have to live in the harness's language, hooks are limited to one process, sub-agents become subprocess shells, sessions are local files.
harness keeps the loop and nothing else. Every other concern is a worker on the iii bus.
- Worker — process that registers iii functions
- Function — named unit of work
- Trigger — what causes a function to run
The loop adds:
- AgentMessage — transcript entries (LLM + custom-typed)
- AgentEvent — 11 emitted events covering run / turn / message / tool lifecycle
- AssistantMessageEvent — 14 stream variants for incremental assistant output
- AgentTool — schema + execute fn
- 3 hook topics —
agent::before_tool_call,agent::after_tool_call,agent::transform_context - 2 pull points —
get_steering,get_followup - 2 semantic rules — terminate-batch (all-must-true), sequential-override (any forces all)
That is the entire vocabulary. Implementation details (auth, models, providers, storage, hooks) are workers consumed through iii functions or pubsub topics.
# anthropic (default)
export ANTHROPIC_API_KEY=sk-ant-...
./target/release/harness "list every workspace crate using ls."
# pick provider + model
./target/release/harness --provider openai --model gpt-4o "say hi"
./target/release/harness --provider groq --model llama-3.3-70b-versatile "say hi"
# zero-config smoke (no API key needed)
./target/release/harness --provider faux --model echo "say hi"
# experimental providers (endpoint URL not yet verified upstream)
./target/release/harness --provider zai --experimental-providers "..."
# Without the flag, harness refuses with a remediation hint.
# Gated set: opencode-go, minimax, huggingface, opencode-zen, zai,
# vercel-ai-gateway, kimi-coding.Flags worth knowing:
--version,-V— print version and exit.--max-turns <n>— hard cap on assistant turns (default 10). Loop emits a synthetic "loop stopped" assistant message and exits cleanly when reached.--engine-url <ws>/HARNESS_ENGINE_URL— overridews://127.0.0.1:49134. Env var honored as fallback.--json— emit oneAgentEventJSON per line on stdout, then a final{"type":"summary",...}. Pipe tojq, build dashboards, write golden-trace tests.--experimental-providers— opt in to the seven providers whose upstream endpoint URL hasn't been verified.
Provider env vars (each provider reads its own key at the moment it's invoked; harness never touches them):
| Provider | Env var |
|---|---|
anthropic |
ANTHROPIC_API_KEY |
openai / openai-responses |
OPENAI_API_KEY |
azure-openai |
AZURE_OPENAI_KEY + AZURE_OPENAI_ENDPOINT |
google |
GOOGLE_API_KEY |
google-vertex |
GOOGLE_APPLICATION_CREDENTIALS (ADC path) |
bedrock |
AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (+ AWS_REGION) |
openrouter |
OPENROUTER_API_KEY |
groq |
GROQ_API_KEY |
cerebras |
CEREBRAS_API_KEY |
xai |
XAI_API_KEY |
deepseek |
DEEPSEEK_API_KEY |
mistral |
MISTRAL_API_KEY |
fireworks |
FIREWORKS_API_KEY |
huggingface |
HF_TOKEN |
kimi-coding / minimax / zai / opencode-go / opencode-zen / vercel-ai-gateway |
provider-specific (see crate source — these are the seven --experimental-providers gated entries) |
faux |
none (zero-config smoke) |
Built-in tools the agent can call:
read,write,edit— file ops with diff-style replacels,find,grep— directory walks and substring searchbash—bash -lcon host, or routed throughiii-sandbox::execwhen the sandbox worker is registered. Emits a singletracing::warn!on the first host fallback so the user knows commands aren't sandboxed.run_subagent— spawn a focused sub-agent for a subtask, returns its final answer. Bounded at depth 3 by default (override per-call viamax_subagent_depth); the chain is encoded in the session-id (root::sub-...::sub-...).
CLI prints AgentEvents as they stream so you can watch the agent reason, call tools, iterate.
cargo build --release --bin harness-tui
./target/release/harness-tui --provider anthropic --model claude-sonnet-4-6ratatui interactive UI. Same iii-bus shape as the CLI: connects to engine, registers runtime + provider, triggers agent::run_loop, renders the events stream.
- Multi-line editor with slash commands,
@filefuzzy attachment, inline!bash - Markdown render with collapsible tool / thinking blocks, queue + spinner indicator
- Native Kitty / iTerm2 inline image render via terminal escape protocols (placeholder fallback elsewhere)
- Clipboard image paste
/treeoverlay with parent / child branching glyphs (├─└─│), search, filter, bookmarks/hotkeysoverlay listing every binding- Themes (
dark,light, user-supplied TOML at~/.harness/themes/<name>.toml) - Keybinding overrides at
~/.harness/keybindings.json - Hot-reload via
notifywatcher: edit theme or keybindings file, TUI picks up the change live HARNESS_ENGINE_URLenv var overrides the engine URL
When to reach for which. Use harness (CLI) for one-shot driving — it registers the runtime + a single provider for the duration of the call, then disconnects. Use harnessd for persistent deployments where you want every harness worker on the bus alongside hook subscribers, the TUI, and other consumers; it stays running and lets multiple clients trigger agent::run_loop against the same registered surface.
./target/release/harnessd serve --providers all [--with-hook-example]
./target/release/harnessd statusSingle binary that registers every harness worker on the bus in one process — runtime, 22 providers, 5 oauth flows, auth-storage, models-catalog, sessions, corpus, extract, compaction. Replaces the chain of cargo run -p ... calls.
| Function | Purpose |
|---|---|
agent::run_loop |
Orchestrator. Drives the inner state machine. Persists transcripts to session-tree when registered. |
agent::stream_assistant |
Provider router. Calls provider::<name>::stream_assistant based on payload. |
agent::prepare_tool |
Validate args, fan out before_tool_call via publish_collect. |
agent::execute_tool |
Resolve tool::<name> and dispatch via iii.trigger. |
agent::finalize_tool |
Fan out after_tool_call via publish_collect, merge replies. |
agent::transform_context |
Fan out transform_context via publish_collect, decode last reply. |
agent::convert_to_llm |
AgentMessage[] → provider wire shape. Pass-through default. |
agent::get_steering |
Drain mid-run injection queue. |
agent::get_followup |
Drain post-stop continuation queue. |
agent::abort |
Set abort signal in iii state. |
agent::push_steering |
HTTP-callable enqueue helper. |
agent::push_followup |
HTTP-callable enqueue helper. |
tool::read, tool::write, tool::edit, tool::ls, tool::grep, tool::find, tool::bash, tool::run_subagent. tool::bash runtime-discovers sandbox::exec.
Each provider crate self-registers via register_with_iii(iii):
anthropic, openai, openai-responses, google, google-vertex, azure-openai,
bedrock, openrouter, groq, cerebras, xai, deepseek, mistral, fireworks,
kimi-coding, minimax, zai, huggingface, vercel-ai-gateway, opencode-zen,
opencode-go, faux
PKCE / device-code flows registered by:
oauth-anthropic, oauth-openai-codex, oauth-github-copilot,
oauth-google-gemini-cli, oauth-google-antigravity
session::create,session::append,session::messages— persistent transcriptssession::fork,session::clone— branch off existing sessionssession::compact— append a Compaction entry summarising the active pathsession::tree— return the tree-shapesession::export_html— render the active path as a self-contained HTML doc
When session-tree is registered, agent::run_loop automatically hydrates from the existing transcript on entry and persists every new turn on exit.
auth::get_token,auth::set_token,auth::delete_token— credential vaultauth::list_providers,auth::status
models::list,models::get,models::supports— state-first model capability catalogmodels::register— write a Model to state undermodels:<provider>:<id>(state is the source of truth; embeddeddata/models.jsonis a one-time seed used only when state is empty)
corpus::scan,corpus::redact,corpus::review,corpus::publish— session corpus pipeline
PDF + DOCX text extraction.
policy::denylist— block tool calls by namepolicy::audit_log— JSONL append of every callpolicy::dlp_scrubber— redact AWS / OpenAI / GitHub / Stripe / Google secret shapes
Three pubsub topics. Subscribers are independent workers; the loop fans out via publish_collect (write event with event_id to topic, poll agent::hook_reply/<event_id> for replies, merge).
| Topic | Subscriber reply shape | Merge rule | Live-validated |
|---|---|---|---|
agent::before_tool_call |
{ block: bool, reason: str } |
first-blocker-wins | ✅ policy_denylist_blocks_tool_dispatch + hooks_before_and_after_see_tool_calls |
agent::after_tool_call |
partial ToolResult (content, details, terminate) |
field-by-field | ✅ audit_log_records_after_tool_call + dlp_scrubber_rewrites_secret_in_tool_result |
agent::transform_context |
{ messages: [...] } or bare [...] |
last reply wins | ✅ transform_context_subscriber_mutates_messages |
Subscribers reply via stream::set with (stream_name=agent::hook_reply, group_id=event_id, item_id=<uuid>). iii v0.11.x silently drops writes missing item_id — every shipped harness subscriber mints one per reply.
Add a custom hook by registering an iii function and binding a subscribe trigger. See workers/hook-example/src/lib.rs for the minimal pattern and workers/policy-subscribers/src/lib.rs for production-shape examples.
# Reference subscribers — pick one or all
HOOK_EXAMPLE_DENY=dangerous,rm cargo run --release -p hook-example
cargo run --release --bin policy-denylist -- --deny "bash:rm -rf,sudo"
cargo run --release --bin audit-log -- --log ~/.harness/audit.jsonl
cargo run --release --bin dlp-scrubberworkers/
├── harness-types/ # closed type vocabulary
├── harness-runtime/ # loop + 8 tools + collected pubsub + hook merge
├── harness-cli/ # thin iii-bus CLI binary
├── harness-tui/ # thin iii-bus TUI binary (ratatui)
├── harness-iii-bridge/ # legacy bridge crate (most code lives in harness-runtime now)
├── harnessd/ # all-in-one daemon registering every worker
├── provider-base/ # shared HTTP/SSE/error infra + provider register helper
├── provider-{anthropic, openai, openai-responses, google, google-vertex,
│ azure-openai, bedrock, openrouter, groq, cerebras, xai,
│ deepseek, mistral, fireworks, kimi-coding, minimax, zai,
│ huggingface, vercel-ai-gateway, opencode-zen, opencode-go,
│ faux}/ # 22 provider workers
├── oauth-{anthropic, openai-codex, github-copilot,
│ google-gemini-cli, google-antigravity}/ # 5 oauth flows
├── auth-storage/ # credential vault
├── models-catalog/ # model capability database
├── session-tree/ # 8 session::* fns inc. persistent transcripts
├── context-compaction/ # subscriber on agent::events
├── session-corpus/ # corpus::* pipeline
├── document-extract/ # PDF / DOCX text extraction
├── overflow-classify/ # 20-pattern provider context-overflow detector
├── hook-example/ # reference live subscriber across the 3 hook topics
├── policy-subscribers/ # production-shape denylist + audit + DLP subscribers
├── replay-test/ # replay agent-session fixtures + e2e gated test
└── fixtures-gen/ # dev helper for generating golden fixtures
iii-sdk 0.11 has no native publish_collect. The runtime workaround:
IiiRuntime::publish_collectmintsevent_id(uuid v4).- Publishes
{ event_id, reply_stream: "agent::hook_reply", payload: {...} }to the topic. - Subscribers receive the envelope, do their work, write reply via
stream::seton(stream_name="agent::hook_reply", group_id=event_id, item_id=<fresh uuid>). Theitem_idis mandatory — iii v0.11.x silently drops writes without one. - Runtime polls
stream::listuntiltimeout_ms(default 5000), accepting both the bare-array shape iii v0.11.x ships and the older{items: [...]}envelope. Collects every reply, normalisingitem.get("data")or the bare item. harness_runtime::hooks(merge_before/merge_after/decode_transform) composes the final outcome.
When iii-sdk grows a Stream trigger that fires per new item without polling (the type already exists in iii_sdk::builtin_triggers; semantics around backfill / ordering need validation), the polling falls away. Merge logic stays.
# Lib + integration tests (skips live e2e)
cargo test --workspace --lib --release # 41 lib test crates
cargo test --release -p harness-tui --test render_snapshot # 2 ratatui snapshots
# Live end-to-end against a running iii engine
iii --use-default-config &
IIIX_TEST_ENGINE_URL=ws://127.0.0.1:49134 \
cargo test --release -p replay-test --test end_to_end -- --test-threads=1Nine gated e2e tests cover the bus-mediated paths the unit tests can't:
| Test | What it proves |
|---|---|
faux_round_trip |
runtime + faux provider + run_loop round-trip |
llm_router_swaps_provider_and_model |
tier-3 router delegation rewrites (provider, model) |
policy_denylist_blocks_tool_dispatch |
merge_before short-circuits dispatch when a subscriber blocks |
hooks_before_and_after_see_tool_calls |
hook-example subscribers fire on both topics with counters proof |
audit_log_records_after_tool_call |
policy::audit_log writes one JSONL line per dispatch |
dlp_scrubber_rewrites_secret_in_tool_result |
policy::dlp_scrubber replaces secrets via merge_after.content |
transform_context_subscriber_mutates_messages |
the third hook topic rewrites the in-flight messages array |
run_subagent_refuses_at_depth_limit |
sub-agent depth-3 cap fires before nested run_loop spawn |
oauth_anthropic_register_smoke |
oauth-anthropic registers login/refresh/status and status is callable |
models_catalog_state_register_round_trip |
models::register writes a custom Model to state; models::get and models::supports see it — proves state-first registry, not the embedded baseline |
TUI snapshots render App against a TestBackend at fixed dimensions, capture the trimmed cell buffer (style discarded for stability), and diff via insta. Re-bless with cargo insta test -p harness-tui --review.
Apache-2.0. v0.11.7 — full live e2e for all three hook topics + sub-agent depth + oauth registration smoke + TUI render snapshots. See release notes. Specs in repo: ARCHITECTURE.md, PHASES.md. Remaining iii-sdk gaps tracked in docs/SDK-BLOCKED.md.
Both harness-cli and harness-tui are iii-first thin invokers as of v0.8. Hook subscribers can block tool calls, modify tool results, and rewrite context as of v0.10.
| Bucket | Status |
|---|---|
| 12 canonical agent fns | ✅ shipped |
| 8 builtin tools | ✅ shipped |
| 3 hook topics live e2e | ✅ 3/3 |
| 3 policy subscribers live e2e | ✅ 3/3 (denylist + audit + dlp) |
| Sub-agent depth cap (default 3) | ✅ shipped |
| max_turns enforcement | ✅ shipped |
| Provider workers compiled + registered | 22/22 (15 verified, 7 gated) |
| OAuth flows | 5 registered, 1 e2e smoke |
| Models-catalog: state-first | ✅ shipped + e2e |
| TUI snapshot tests | ✅ 6 (idle, after-message, running+tool-call, tool-end, error, wide) |
llm-router provider field upstream |
✅ merged (iii-hq/workers#57) |
stream::listenvelope drift: CLI/TUI/runtime now accept the bare-array shape iii v0.11.x ships with the{items: [...]}fallback (CLI was silent before v0.11.5).- Hook replies were silently dropped: every harness subscriber now mints
item_id: uuid::v4()onstream::set(v0.11.6). agent::run_loopSDK-default 30s timeout: CLI/TUI cap at 600s, provider dispatch at 300s, run_subagent at 600s (v0.11.5/v0.11.6).- Audit-log byte-interleave: append_jsonl serialises writes per path with a process-wide tokio Mutex map (v0.11.7).
RouterPresenceCacheblocked concurrent first-callers behind the bus probe; refactored to atomic fast path + double-checked-lock (v0.11.7).- Cross-process pubsub fan-out latency: iii v0.11.x routes subscribers asynchronously after publish returns; in-process fan-out is sub-100 ms but cross-process can take seconds. Default hook timeout bumped 5 s → 30 s so cross-process subscribers (denylist + audit + dlp binaries) actually reach
merge_before/merge_afterbefore the loop dispatches (v0.11.9). - Router silent fall-through:
agent::stream_assistantused to swallowrouter::decideerrors. Now logstracing::warn!so operators can see schema rejects (e.g. iii-sdk's_caller_worker_idinjection vs llm-router'sdeny_unknown_fields— fix upstream at iii-hq/workers#58) (v0.11.9).
- Apache-2.0 only
- No external agent-harness product names in code, comments, commits, or PR text
- Provider names (Anthropic, OpenAI, Google, etc.) are APIs we authenticate against and may be referenced
- No emojis in any committed text
- Commit per concern, not per file
- No
Cargo.lockin workspace root (library workspace)
Apache-2.0. See LICENSE.