Adopt OpenClaw-inspired testing strategy: tier separation, stub channels, gateway helpers

## Background

Research into [OpenClaw](https://github.com/openclaw/openclaw) (the open-source AI assistant that inspired IronClaw's workspace/memory system) reveals a mature testing architecture we can learn from. OpenClaw has ~243k LOC with a three-tier test suite, shared lifecycle helpers, architecture-enforcement linting, and CI sharding. IronClaw currently has ~820+ `mod tests {}` blocks with no tier separation or shared test infrastructure.

## OpenClaw's Testing Architecture

OpenClaw uses three tiers of increasing realism:

| Tier | Pattern | Scope | Cost |
|------|---------|-------|------|
| Unit/Integration | `*.test.ts` colocated | Pure logic, no external deps | Free, fast |
| E2E | `*.e2e.test.ts` | Full gateway startup, WebSocket protocol, multi-instance | Moderate |
| Live | `*.live.test.ts` | Real LLM providers, real API keys, sequential | Expensive |

Key infrastructure:
- **Shared server lifecycle helpers** (`withGatewayServer()`, `startServerWithClient()`, port allocation, temp directories, cleanup)
- **Stub channel plugins** for all 6 messaging platforms so the agent loop can be tested without real channels
- **Pre-extraction security testing** for skills installer (path traversal, symlinks validated before extraction)
- **Architecture-enforcement linting** via custom Oxlint rules (e.g., "no raw channel fetch", "channel-agnostic boundaries")
- **CI sharding** via custom orchestration script splitting tests across runners
- **Pragmatic coverage thresholds** (70% lines/functions, 55% branches, with explicit exclusions for daemon/interactive code)

## Proposed Changes for IronClaw

### 1. Test Tier Separation

Currently all tests run in a single `cargo test` invocation with no distinction between unit tests, integration tests needing a database, and live tests needing LLM API keys.

**Proposal:**
- Default `cargo test` = unit tests only (no DB, no network, no API keys)
- `cargo test --features integration` = tests requiring PostgreSQL or libSQL
- `cargo test --features live` = tests requiring real LLM API keys
- Or: adopt `cargo nextest` with test groups/partitions for the same effect without feature flags

### 2. Stub Channel Implementations

IronClaw has TUI, HTTP, WASM, and web channels. Create stub `Channel` trait implementations that can be used in tests to exercise the agent loop, message routing, and session management without spawning real channel processes.

### 3. Gateway/Server Test Helpers

Inspired by OpenClaw's `test-helpers.server.ts`, build shared Rust test utilities for:
- Spinning up the web gateway on a random free port
- Connecting SSE/WebSocket clients
- Sending authenticated API requests
- Temp directory setup/teardown for workspace isolation
- Cleanup on test completion (even on panic)

### 4. Security Testing for Installers

Add tests for:
- `src/tools/wasm/loader.rs` — validate WASM modules before loading (oversized, malformed)
- `src/skills/registry.rs` — path traversal, symlink attacks in skill archives
- Pre-validation before extraction, not post-extraction scanning

### 5. Architecture Boundary Enforcement

Add CI checks (clippy lints, grep-based scripts, or a custom build script) that enforce:
- No direct LLM calls outside `src/llm/`
- No database calls outside `src/db/`
- No raw secret access outside `src/secrets/`
- Channel implementations use the `Channel` trait, not ad-hoc messaging

### 6. CI Sharding (longer-term)

As the test suite grows, adopt `cargo nextest` with native sharding (`--partition hash:1/N`) to distribute tests across CI runners, matching OpenClaw's `OPENCLAW_TEST_SHARDS` pattern.

### 7. Memory/Search Test Suite

OpenClaw's memory system was mature enough to be extracted as a standalone library ([memsearch](https://github.com/zilliztech/memsearch)). IronClaw's `src/workspace/search.rs` RRF hybrid search implementation deserves a focused test suite with known-answer queries covering:
- FTS-only queries, vector-only queries, hybrid queries
- RRF score fusion correctness
- Edge cases (empty results, single-source results, duplicate documents)

## References

- [OpenClaw repo](https://github.com/openclaw/openclaw)
- [OpenClaw testing docs](https://docs.openclaw.ai/help/testing)
- [OpenClaw AGENTS.md](https://github.com/openclaw/openclaw/blob/main/AGENTS.md)
- [memsearch (extracted memory library)](https://github.com/zilliztech/memsearch)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt OpenClaw-inspired testing strategy: tier separation, stub channels, gateway helpers #466

Background

OpenClaw's Testing Architecture

Proposed Changes for IronClaw

1. Test Tier Separation

2. Stub Channel Implementations

3. Gateway/Server Test Helpers

4. Security Testing for Installers

5. Architecture Boundary Enforcement

6. CI Sharding (longer-term)

7. Memory/Search Test Suite

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tier	Pattern	Scope	Cost
Unit/Integration	`*.test.ts` colocated	Pure logic, no external deps	Free, fast
E2E	`*.e2e.test.ts`	Full gateway startup, WebSocket protocol, multi-instance	Moderate
Live	`*.live.test.ts`	Real LLM providers, real API keys, sequential	Expensive

Adopt OpenClaw-inspired testing strategy: tier separation, stub channels, gateway helpers #466

Description

Background

OpenClaw's Testing Architecture

Proposed Changes for IronClaw

1. Test Tier Separation

2. Stub Channel Implementations

3. Gateway/Server Test Helpers

4. Security Testing for Installers

5. Architecture Boundary Enforcement

6. CI Sharding (longer-term)

7. Memory/Search Test Suite

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions