Skip to content

chore: promote staging to staging-promote/4dea5dd5-24736995931 (2026-04-21 19:40 UTC)#2811

Open
ironclaw-ci[bot] wants to merge 1 commit intostaging-promote/4dea5dd5-24736995931from
staging-promote/e29429d7-24742613700
Open

chore: promote staging to staging-promote/4dea5dd5-24736995931 (2026-04-21 19:40 UTC)#2811
ironclaw-ci[bot] wants to merge 1 commit intostaging-promote/4dea5dd5-24736995931from
staging-promote/e29429d7-24742613700

Conversation

@ironclaw-ci
Copy link
Copy Markdown
Contributor

@ironclaw-ci ironclaw-ci Bot commented Apr 21, 2026

Auto-promotion from staging CI

Batch range: 7fb41555a9e55677d1aaea29ca567a5b369c2b05..e29429d727bc3f8ddaaeb509a29c798e927cafbe
Promotion branch: staging-promote/e29429d7-24742613700
Base: staging-promote/4dea5dd5-24736995931
Triggered by: Staging CI batch at 2026-04-21 19:40 UTC

Commits in this batch (68):

Current commits in this promotion (1)

Current base: staging-promote/4dea5dd5-24736995931
Current head: staging-promote/e29429d7-24742613700
Current range: origin/staging-promote/4dea5dd5-24736995931..origin/staging-promote/e29429d7-24742613700

Auto-updated by staging promotion metadata workflow

Waiting for gates:

  • Tests: pending
  • E2E: pending
  • Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

…2790)

* fix(e2e): fix 5 test failures — multi-tenant widget isolation + portfolio nudge recovery

Widget customization: three tests expected multi-tenant behavior (CSS/widget/CSP
isolation) but ran against the single-tenant default server. Add a session-scoped
`multi_tenant_gateway_server` fixture with AGENT_MULTI_TENANT=true and its own
libSQL database, and rewire the three failing tests to use it.

Portfolio: the mock LLM's nudge response ("I found the information you
requested.") swallowed portfolio context when the engine sent a tool-intent
nudge. Add context-aware nudge recovery in match_response() that checks prior
user messages for portfolio/wallet keywords before falling through to the
generic nudge pattern. Also add word boundaries to the hello|hi|hey canned
pattern to prevent "hi" from matching inside "this".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings (iteration 1)

Forward cargo-llvm-cov env vars in multi_tenant_gateway_server fixture
so code coverage from the 3 rewired widget tests is captured in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size: XS < 10 changed lines (excluding docs) risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 21, 2026
@claude
Copy link
Copy Markdown

claude Bot commented Apr 21, 2026

Code review

Found 10 issues:

  1. [HIGH:90] Port Binding TOCTOU Race Condition

Between closing the port-reservation sockets and passing those port numbers to create_subprocess_exec(), another process could bind to either port, causing subprocess failures or binding to unintended ports. The pattern should pass socket file descriptors directly or use SO_REUSEADDR, rather than close-then-use.

reserved = []
for _ in range(2):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("127.0.0.1", 0))
reserved.append(sock)
gateway_port = reserved[0].getsockname()[1]
http_port = reserved[1].getsockname()[1]
for sock in reserved:
sock.close()
env = {
"PATH": os.environ.get("PATH", "/usr/bin:/bin"),
"HOME": home_dir,
"IRONCLAW_BASE_DIR": os.path.join(home_dir, ".ironclaw"),
"RUST_LOG": "ironclaw=info",
"RUST_BACKTRACE": "1",
"IRONCLAW_OWNER_ID": "e2e-widget-multi-tenant",
"AGENT_MULTI_TENANT": "true",
"GATEWAY_ENABLED": "true",
"GATEWAY_HOST": "127.0.0.1",
"GATEWAY_PORT": str(gateway_port),
"GATEWAY_AUTH_TOKEN": AUTH_TOKEN,
"GATEWAY_USER_ID": "e2e-widget-multi-tenant",
"HTTP_HOST": "127.0.0.1",
"HTTP_PORT": str(http_port),
"CLI_ENABLED": "false",
"LLM_BACKEND": "openai_compatible",
"LLM_BASE_URL": mock_llm_server,
"LLM_MODEL": "mock-model",
"DATABASE_BACKEND": "libsql",
"LIBSQL_PATH": os.path.join(db_tmpdir.name, "multi-tenant.db"),
"SANDBOX_ENABLED": "false",
"SKILLS_ENABLED": "true",
"ROUTINES_ENABLED": "true",
"HEARTBEAT_ENABLED": "false",
"EMBEDDING_ENABLED": "false",
"WASM_ENABLED": "false",
"ONBOARD_COMPLETED": "true",
}
# Forward cargo-llvm-cov env vars so coverage data is captured in CI.
cov_prefixes = ("CARGO_LLVM_COV", "LLVM_")
cov_extras = ("CARGO_ENCODED_RUSTFLAGS", "CARGO_INCREMENTAL")
for key, val in os.environ.items():
if key.startswith(cov_prefixes) or key in cov_extras:
env[key] = val
proc = await asyncio.create_subprocess_exec(
ironclaw_binary,
"--no-onboard",
stdin=asyncio.subprocess.DEVNULL,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env,
)
base_url = f"http://127.0.0.1:{gateway_port}"
try:

  1. [HIGH:90] Fixture Code Duplication — DRY Violation

The multi_tenant_gateway_server and single_tenant_gateway_server fixtures are 95% identical (lines 122–192 vs 201–283). Both duplicate environment dict construction, port reservation logic, process spawning, error handling, and cleanup. This violates DRY and creates future maintenance risk when gateway env setup must be edited in multiple places.

Extract a shared _create_dedicated_gateway_server() helper function or move this logic to conftest.py to allow reuse across test files.

@pytest.fixture(scope="session")
async def multi_tenant_gateway_server(ironclaw_binary, mock_llm_server):
"""Dedicated gateway with AGENT_MULTI_TENANT=true for multi-tenant isolation tests."""
home_tmpdir = tempfile.TemporaryDirectory(prefix="ironclaw-widget-multi-tenant-home-")
home_dir = home_tmpdir.name
db_tmpdir = tempfile.TemporaryDirectory(prefix="ironclaw-widget-multi-tenant-db-")
os.makedirs(os.path.join(home_dir, ".ironclaw"), exist_ok=True)
reserved = []
for _ in range(2):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("127.0.0.1", 0))
reserved.append(sock)
gateway_port = reserved[0].getsockname()[1]
http_port = reserved[1].getsockname()[1]
for sock in reserved:
sock.close()
env = {
"PATH": os.environ.get("PATH", "/usr/bin:/bin"),
"HOME": home_dir,
"IRONCLAW_BASE_DIR": os.path.join(home_dir, ".ironclaw"),
"RUST_LOG": "ironclaw=info",
"RUST_BACKTRACE": "1",
"IRONCLAW_OWNER_ID": "e2e-widget-multi-tenant",
"AGENT_MULTI_TENANT": "true",
"GATEWAY_ENABLED": "true",
"GATEWAY_HOST": "127.0.0.1",
"GATEWAY_PORT": str(gateway_port),
"GATEWAY_AUTH_TOKEN": AUTH_TOKEN,
"GATEWAY_USER_ID": "e2e-widget-multi-tenant",
"HTTP_HOST": "127.0.0.1",
"HTTP_PORT": str(http_port),
"CLI_ENABLED": "false",
"LLM_BACKEND": "openai_compatible",
"LLM_BASE_URL": mock_llm_server,
"LLM_MODEL": "mock-model",
"DATABASE_BACKEND": "libsql",
"LIBSQL_PATH": os.path.join(db_tmpdir.name, "multi-tenant.db"),
"SANDBOX_ENABLED": "false",
"SKILLS_ENABLED": "true",
"ROUTINES_ENABLED": "true",
"HEARTBEAT_ENABLED": "false",
"EMBEDDING_ENABLED": "false",
"WASM_ENABLED": "false",
"ONBOARD_COMPLETED": "true",
}
# Forward cargo-llvm-cov env vars so coverage data is captured in CI.
cov_prefixes = ("CARGO_LLVM_COV", "LLVM_")
cov_extras = ("CARGO_ENCODED_RUSTFLAGS", "CARGO_INCREMENTAL")
for key, val in os.environ.items():
if key.startswith(cov_prefixes) or key in cov_extras:
env[key] = val
proc = await asyncio.create_subprocess_exec(
ironclaw_binary,
"--no-onboard",
stdin=asyncio.subprocess.DEVNULL,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env,
)
base_url = f"http://127.0.0.1:{gateway_port}"
try:
await wait_for_ready(f"{base_url}/api/health", timeout=60)
yield base_url
except TimeoutError:
stderr_text = ""
if proc.stderr:
try:
stderr_text = (await asyncio.wait_for(proc.stderr.read(8192), timeout=2)).decode(
"utf-8",
errors="replace",
)
except asyncio.TimeoutError:
pass
pytest.fail(f"multi-tenant widget server failed to start:\n{stderr_text}")
finally:
await _stop_proc(proc)
home_tmpdir.cleanup()
db_tmpdir.cleanup()

  1. [HIGH:85] Unbounded Regex Compilation on Every match_response() Call

The _nudge_re pattern is compiled inside the function body (line 795–798) on every invocation. Since match_response() is called once per chat completion request (potentially hundreds of times per test run), this wastes CPU compiling the same regex repeatedly instead of caching it at module-level. Move the regex pattern definition outside the function.

_nudge_re = re.compile(
r"You said you would perform an action|You expressed intent",
re.IGNORECASE,
)

  1. [MEDIUM:85] Incomplete Async Fixture Scope Documentation

The multi_tenant_gateway_server is session-scoped with cleanup in the finally block (lines 282–283), but the relationship between session-scoped server lifespan and function-scoped cleanup fixture (clean_multi_tenant_customizations) is not documented. Add docstring clarifying expected fixture interaction and temp-dir cleanup ordering.

@pytest.fixture(scope="session")
async def multi_tenant_gateway_server(ironclaw_binary, mock_llm_server):
"""Dedicated gateway with AGENT_MULTI_TENANT=true for multi-tenant isolation tests."""
home_tmpdir = tempfile.TemporaryDirectory(prefix="ironclaw-widget-multi-tenant-home-")
home_dir = home_tmpdir.name
db_tmpdir = tempfile.TemporaryDirectory(prefix="ironclaw-widget-multi-tenant-db-")

  1. [MEDIUM:75] Multiple Sequential Regex Searches Without Early Return

The nudge recovery block (lines 800–813) iterates through all user messages and performs two regex searches per message without short-circuiting when a match is found. For conversations with 10+ user messages, this performs 20+ regex searches. Add an early return after the first successful domain-context match to avoid redundant searches.

for msg in messages:
if msg.get("role") == "user":
msg_text = _message_text(msg)
if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE):
return (
"I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan "
"your wallet addresses across chains to discover positions, check yields, and "
"suggest rebalancing opportunities."
)
if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE):
return (
"I found your wallet address. Let me scan your portfolio across all supported "
"chains to discover DeFi positions and classify them against known protocols."
)

  1. [MEDIUM:75] Mock LLM Pattern Matching — Unclear Fallthrough Behavior

The new nudge-recovery code (lines 789–813) runs before the CANNED_RESPONSES loop. If neither portfolio nor wallet context is found, code falls through to the generic nudge pattern in CANNED_RESPONSES (line 57). This "special-case-before-fallback" structure works but intent is not explicit. Document the fallthrough behavior in a comment above line 815 to clarify pattern matching precedence.

# Nudge recovery: when the engine sends a "you expressed intent but
# didn't call a tool" nudge, check whether the conversation has
# portfolio/wallet context from an earlier user message and return a
# portfolio-relevant response so the nudge pattern (which matches
# before the portfolio patterns in CANNED_RESPONSES) doesn't swallow
# the domain context.
_nudge_re = re.compile(
r"You said you would perform an action|You expressed intent",
re.IGNORECASE,
)
if _nudge_re.search(content):
for msg in messages:
if msg.get("role") == "user":
msg_text = _message_text(msg)
if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE):
return (
"I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan "
"your wallet addresses across chains to discover positions, check yields, and "
"suggest rebalancing opportunities."
)
if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE):
return (
"I found your wallet address. Let me scan your portfolio across all supported "
"chains to discover DeFi positions and classify them against known protocols."
)
for pattern, response in CANNED_RESPONSES:

  1. [MEDIUM:75] Session-Scoped Fixture with Implicit Resource Cleanup

The multi_tenant_gateway_server subprocess persists for the entire pytest session. If pytest crashes or is forcefully terminated before the finally block, the ironclaw process will leak as an orphan. In CI, this could accumulate processes across test runs. Either make the fixture function-scoped (per-test recreation) or add explicit process cleanup guarantees via pytest plugins.

proc = await asyncio.create_subprocess_exec(
ironclaw_binary,
"--no-onboard",
stdin=asyncio.subprocess.DEVNULL,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env,
)
base_url = f"http://127.0.0.1:{gateway_port}"
try:
await wait_for_ready(f"{base_url}/api/health", timeout=60)
yield base_url
except TimeoutError:
stderr_text = ""
if proc.stderr:
try:
stderr_text = (await asyncio.wait_for(proc.stderr.read(8192), timeout=2)).decode(
"utf-8",
errors="replace",
)
except asyncio.TimeoutError:
pass
pytest.fail(f"multi-tenant widget server failed to start:\n{stderr_text}")
finally:
await _stop_proc(proc)
home_tmpdir.cleanup()
db_tmpdir.cleanup()

  1. [MEDIUM:70] Unconstrained Environment Variable Forwarding

Coverage environment variables are forwarded from the outer environment using key.startswith(cov_prefixes) without explicit allowlisting (lines 249–254). While intentional for CI, this forwards any key starting with LLVM_ or CARGO_LLVM_COV, including potentially sensitive ones. Explicitly enumerate allowed keys (e.g., LLVM_PROFILE_FILE) rather than prefix-matching.

# Forward cargo-llvm-cov env vars so coverage data is captured in CI.
cov_prefixes = ("CARGO_LLVM_COV", "LLVM_")
cov_extras = ("CARGO_ENCODED_RUSTFLAGS", "CARGO_INCREMENTAL")
for key, val in os.environ.items():
if key.startswith(cov_prefixes) or key in cov_extras:
env[key] = val

  1. [MEDIUM:70] Duplicate Coverage Env Forwarding Logic

The _forward_coverage_env() logic (lines 249–254) duplicates identical code from conftest.py. Call the existing shared helper from conftest.py instead of re-implementing inline. This indicates the fixture should live in conftest.py for reuse.

# Forward cargo-llvm-cov env vars so coverage data is captured in CI.
cov_prefixes = ("CARGO_LLVM_COV", "LLVM_")
cov_extras = ("CARGO_ENCODED_RUSTFLAGS", "CARGO_INCREMENTAL")
for key, val in os.environ.items():
if key.startswith(cov_prefixes) or key in cov_extras:
env[key] = val

  1. [MEDIUM:70] Temporary Directory Cleanup Order Issue

In the fixture finally-block (lines 282–283), home_tmpdir.cleanup() is called before ensuring the subprocess has fully released all file handles. If the subprocess is still writing logs/state during the graceful shutdown window, cleanup could fail silently or leave locked files. Reorder: close the process context first, then cleanup directories.

stderr_text = (await asyncio.wait_for(proc.stderr.read(8192), timeout=2)).decode(
"utf-8",
errors="replace",
)
except asyncio.TimeoutError:
pass
pytest.fail(f"multi-tenant widget server failed to start:\n{stderr_text}")
finally:
await _stop_proc(proc)
home_tmpdir.cleanup()
db_tmpdir.cleanup()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules size: XS < 10 changed lines (excluding docs) staging-promotion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant