Skip to content

Adopt OpenClaw-inspired testing strategy: tier separation, stub channels, gateway helpers #466

@zmanian

Description

@zmanian

Background

Research into OpenClaw (the open-source AI assistant that inspired IronClaw's workspace/memory system) reveals a mature testing architecture we can learn from. OpenClaw has ~243k LOC with a three-tier test suite, shared lifecycle helpers, architecture-enforcement linting, and CI sharding. IronClaw currently has ~820+ mod tests {} blocks with no tier separation or shared test infrastructure.

OpenClaw's Testing Architecture

OpenClaw uses three tiers of increasing realism:

Tier Pattern Scope Cost
Unit/Integration *.test.ts colocated Pure logic, no external deps Free, fast
E2E *.e2e.test.ts Full gateway startup, WebSocket protocol, multi-instance Moderate
Live *.live.test.ts Real LLM providers, real API keys, sequential Expensive

Key infrastructure:

  • Shared server lifecycle helpers (withGatewayServer(), startServerWithClient(), port allocation, temp directories, cleanup)
  • Stub channel plugins for all 6 messaging platforms so the agent loop can be tested without real channels
  • Pre-extraction security testing for skills installer (path traversal, symlinks validated before extraction)
  • Architecture-enforcement linting via custom Oxlint rules (e.g., "no raw channel fetch", "channel-agnostic boundaries")
  • CI sharding via custom orchestration script splitting tests across runners
  • Pragmatic coverage thresholds (70% lines/functions, 55% branches, with explicit exclusions for daemon/interactive code)

Proposed Changes for IronClaw

1. Test Tier Separation

Currently all tests run in a single cargo test invocation with no distinction between unit tests, integration tests needing a database, and live tests needing LLM API keys.

Proposal:

  • Default cargo test = unit tests only (no DB, no network, no API keys)
  • cargo test --features integration = tests requiring PostgreSQL or libSQL
  • cargo test --features live = tests requiring real LLM API keys
  • Or: adopt cargo nextest with test groups/partitions for the same effect without feature flags

2. Stub Channel Implementations

IronClaw has TUI, HTTP, WASM, and web channels. Create stub Channel trait implementations that can be used in tests to exercise the agent loop, message routing, and session management without spawning real channel processes.

3. Gateway/Server Test Helpers

Inspired by OpenClaw's test-helpers.server.ts, build shared Rust test utilities for:

  • Spinning up the web gateway on a random free port
  • Connecting SSE/WebSocket clients
  • Sending authenticated API requests
  • Temp directory setup/teardown for workspace isolation
  • Cleanup on test completion (even on panic)

4. Security Testing for Installers

Add tests for:

  • src/tools/wasm/loader.rs — validate WASM modules before loading (oversized, malformed)
  • src/skills/registry.rs — path traversal, symlink attacks in skill archives
  • Pre-validation before extraction, not post-extraction scanning

5. Architecture Boundary Enforcement

Add CI checks (clippy lints, grep-based scripts, or a custom build script) that enforce:

  • No direct LLM calls outside src/llm/
  • No database calls outside src/db/
  • No raw secret access outside src/secrets/
  • Channel implementations use the Channel trait, not ad-hoc messaging

6. CI Sharding (longer-term)

As the test suite grows, adopt cargo nextest with native sharding (--partition hash:1/N) to distribute tests across CI runners, matching OpenClaw's OPENCLAW_TEST_SHARDS pattern.

7. Memory/Search Test Suite

OpenClaw's memory system was mature enough to be extracted as a standalone library (memsearch). IronClaw's src/workspace/search.rs RRF hybrid search implementation deserves a focused test suite with known-answer queries covering:

  • FTS-only queries, vector-only queries, hybrid queries
  • RRF score fusion correctness
  • Edge cases (empty results, single-source results, duplicate documents)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions