Background
Research into OpenClaw (the open-source AI assistant that inspired IronClaw's workspace/memory system) reveals a mature testing architecture we can learn from. OpenClaw has ~243k LOC with a three-tier test suite, shared lifecycle helpers, architecture-enforcement linting, and CI sharding. IronClaw currently has ~820+ mod tests {} blocks with no tier separation or shared test infrastructure.
OpenClaw's Testing Architecture
OpenClaw uses three tiers of increasing realism:
| Tier |
Pattern |
Scope |
Cost |
| Unit/Integration |
*.test.ts colocated |
Pure logic, no external deps |
Free, fast |
| E2E |
*.e2e.test.ts |
Full gateway startup, WebSocket protocol, multi-instance |
Moderate |
| Live |
*.live.test.ts |
Real LLM providers, real API keys, sequential |
Expensive |
Key infrastructure:
- Shared server lifecycle helpers (
withGatewayServer(), startServerWithClient(), port allocation, temp directories, cleanup)
- Stub channel plugins for all 6 messaging platforms so the agent loop can be tested without real channels
- Pre-extraction security testing for skills installer (path traversal, symlinks validated before extraction)
- Architecture-enforcement linting via custom Oxlint rules (e.g., "no raw channel fetch", "channel-agnostic boundaries")
- CI sharding via custom orchestration script splitting tests across runners
- Pragmatic coverage thresholds (70% lines/functions, 55% branches, with explicit exclusions for daemon/interactive code)
Proposed Changes for IronClaw
1. Test Tier Separation
Currently all tests run in a single cargo test invocation with no distinction between unit tests, integration tests needing a database, and live tests needing LLM API keys.
Proposal:
- Default
cargo test = unit tests only (no DB, no network, no API keys)
cargo test --features integration = tests requiring PostgreSQL or libSQL
cargo test --features live = tests requiring real LLM API keys
- Or: adopt
cargo nextest with test groups/partitions for the same effect without feature flags
2. Stub Channel Implementations
IronClaw has TUI, HTTP, WASM, and web channels. Create stub Channel trait implementations that can be used in tests to exercise the agent loop, message routing, and session management without spawning real channel processes.
3. Gateway/Server Test Helpers
Inspired by OpenClaw's test-helpers.server.ts, build shared Rust test utilities for:
- Spinning up the web gateway on a random free port
- Connecting SSE/WebSocket clients
- Sending authenticated API requests
- Temp directory setup/teardown for workspace isolation
- Cleanup on test completion (even on panic)
4. Security Testing for Installers
Add tests for:
src/tools/wasm/loader.rs — validate WASM modules before loading (oversized, malformed)
src/skills/registry.rs — path traversal, symlink attacks in skill archives
- Pre-validation before extraction, not post-extraction scanning
5. Architecture Boundary Enforcement
Add CI checks (clippy lints, grep-based scripts, or a custom build script) that enforce:
- No direct LLM calls outside
src/llm/
- No database calls outside
src/db/
- No raw secret access outside
src/secrets/
- Channel implementations use the
Channel trait, not ad-hoc messaging
6. CI Sharding (longer-term)
As the test suite grows, adopt cargo nextest with native sharding (--partition hash:1/N) to distribute tests across CI runners, matching OpenClaw's OPENCLAW_TEST_SHARDS pattern.
7. Memory/Search Test Suite
OpenClaw's memory system was mature enough to be extracted as a standalone library (memsearch). IronClaw's src/workspace/search.rs RRF hybrid search implementation deserves a focused test suite with known-answer queries covering:
- FTS-only queries, vector-only queries, hybrid queries
- RRF score fusion correctness
- Edge cases (empty results, single-source results, duplicate documents)
References
Background
Research into OpenClaw (the open-source AI assistant that inspired IronClaw's workspace/memory system) reveals a mature testing architecture we can learn from. OpenClaw has ~243k LOC with a three-tier test suite, shared lifecycle helpers, architecture-enforcement linting, and CI sharding. IronClaw currently has ~820+
mod tests {}blocks with no tier separation or shared test infrastructure.OpenClaw's Testing Architecture
OpenClaw uses three tiers of increasing realism:
*.test.tscolocated*.e2e.test.ts*.live.test.tsKey infrastructure:
withGatewayServer(),startServerWithClient(), port allocation, temp directories, cleanup)Proposed Changes for IronClaw
1. Test Tier Separation
Currently all tests run in a single
cargo testinvocation with no distinction between unit tests, integration tests needing a database, and live tests needing LLM API keys.Proposal:
cargo test= unit tests only (no DB, no network, no API keys)cargo test --features integration= tests requiring PostgreSQL or libSQLcargo test --features live= tests requiring real LLM API keyscargo nextestwith test groups/partitions for the same effect without feature flags2. Stub Channel Implementations
IronClaw has TUI, HTTP, WASM, and web channels. Create stub
Channeltrait implementations that can be used in tests to exercise the agent loop, message routing, and session management without spawning real channel processes.3. Gateway/Server Test Helpers
Inspired by OpenClaw's
test-helpers.server.ts, build shared Rust test utilities for:4. Security Testing for Installers
Add tests for:
src/tools/wasm/loader.rs— validate WASM modules before loading (oversized, malformed)src/skills/registry.rs— path traversal, symlink attacks in skill archives5. Architecture Boundary Enforcement
Add CI checks (clippy lints, grep-based scripts, or a custom build script) that enforce:
src/llm/src/db/src/secrets/Channeltrait, not ad-hoc messaging6. CI Sharding (longer-term)
As the test suite grows, adopt
cargo nextestwith native sharding (--partition hash:1/N) to distribute tests across CI runners, matching OpenClaw'sOPENCLAW_TEST_SHARDSpattern.7. Memory/Search Test Suite
OpenClaw's memory system was mature enough to be extracted as a standalone library (memsearch). IronClaw's
src/workspace/search.rsRRF hybrid search implementation deserves a focused test suite with known-answer queries covering:References