Skip to content

fix(cli): suppress non-CLI listeners under --cli-only (#1840)#1869

Merged
ilblackdragon merged 4 commits into
nearai:stagingfrom
reidliu41:fix/cli-only-webhook-server-1840
Apr 19, 2026
Merged

fix(cli): suppress non-CLI listeners under --cli-only (#1840)#1869
ilblackdragon merged 4 commits into
nearai:stagingfrom
reidliu41:fix/cli-only-webhook-server-1840

Conversation

@reidliu41
Copy link
Copy Markdown
Contributor

Summary

  • --cli-only did not fully honor its contract: webhook routes, WASM channel endpoints, gateway startup, managed
    tunnels, and the sandbox orchestrator API could still start network listeners
  • This change unifies all non-CLI startup paths behind a single enable_non_cli guard derived from --cli-only
  • It also fixes two state-consistency issues caused by suppressing orchestrator startup: job_prompt was registered
    without a consumer, and sandbox readiness could be misreported as unavailable instead of disabled

Change Type

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • CI/Infrastructure
  • Security
  • Dependencies

Linked Issue

Fixes #1840

Validation

  • cargo fmt --all -- --check
  • cargo clippy --all --benches --tests --examples --all-features -- -D warnings
  • cargo build
  • Relevant tests pass:
  • cargo test --features integration if database-backed or integration behavior changed
  • Manual testing:
  • If a coding agent was used and supports it, review-pr or pr-shepherd --fix was run before requesting review

Security Impact

This fixes unintended listener startup under --cli-only, including webhook ingress, gateway HTTP serving, managed
tunnels, and the sandbox orchestrator API. The value of the change is that --cli-only now consistently suppresses non-
CLI network-facing services, which restores operator trust in the mode’s security boundary.

Database Impact

None

Blast Radius

  • src/main.rs startup wiring for non-CLI services
  • sandbox tool registration and readiness reporting
  • NETWORK_SECURITY.md behavior documentation

Rollback Plan

Revert the commit. The change is limited to startup wiring and documentation, with no schema, migration, or config-
format changes.

Review Follow-Through

  • Regression coverage now checks known non-CLI startup paths in async_main
  • New network-facing startup paths should be added to the guard list and regression test

Review track:
B — bug fix affecting startup behavior and security-sensitive listener initialization

  - Gate webhook, WASM, Signal, HTTP, gateway, tunnel, and orchestrator startup behind --cli-only
  - Restore the expected --cli-only contract so no non-CLI listeners bind or expose services
  - Prevent unintended network exposure from fallback listeners and managed tunnels
  - Stop registering job_prompt when no orchestrator is running to consume prompts
  - Fix sandbox readiness reporting under --cli-only so it reports disabled, not unavailable
  - Centralize the guard through non_cli_channels_enabled() for consistent startup behavior
  - Add regression coverage for unguarded network startup paths in async_main
  - Document --cli-only listener suppression in NETWORK_SECURITY.md
@github-actions github-actions Bot added scope: docs Documentation size: M 50-199 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a --cli-only mode to suppress all non-CLI network services, such as webhooks, WASM channels, and managed tunnels, ensuring no network listeners start when the flag is active. The changes include documentation updates and a regression test suite that performs a structural check on main.rs to verify all ingress paths are guarded. A potential panic was identified in the test code where string slicing using byte offsets could split multi-byte UTF-8 characters; a byte-based search approach was suggested to resolve this.

Comment thread src/main.rs Outdated
Comment on lines +1306 to +1307
let window_start = abs.saturating_sub(500);
let window = &async_main_body[window_start..abs];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Slicing a string using byte offsets derived from saturating_sub can lead to a panic if the offset lands in the middle of a multi-byte UTF-8 character. Using byte-based searching (as suggested) avoids this risk. Additionally, when detecting keywords like 'enable_non_cli', ensure you use word-boundary checks instead of simple substring containment to avoid false positives where the keyword is part of a larger string.

Suggested change
let window_start = abs.saturating_sub(500);
let window = &async_main_body[window_start..abs];
let window_start = abs.saturating_sub(500);
let window = &async_main_body.as_bytes()[window_start..abs];
assert!(
window.windows("enable_non_cli".len()).any(|w| w == "enable_non_cli".as_bytes()),
References
  1. When truncating or slicing a UTF-8 string, use character-aware methods or byte-based operations to avoid panics caused by slicing in the middle of a multi-byte character.
  2. When detecting commands or keywords in a string, use token-based or word-boundary checks instead of simple substring containment to avoid false positives.

zmanian
zmanian previously approved these changes Apr 1, 2026
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review -- APPROVE

Good security fix closing a real gap where --cli-only wasn't suppressing network listeners (webhooks, WASM channels, HTTP, Signal, gateway, tunnels, orchestrator API). Single enable_non_cli guard applied consistently. NETWORK_SECURITY.md updated.

Minor nits (non-blocking)

  • non_cli_channels_enabled(cli_only: bool) -> bool is just !cli_only -- the indirection adds no value
  • Source grep regression test is clever but fragile (500-char window is arbitrary)

@reidliu41
Copy link
Copy Markdown
Contributor Author

can it merge?

…hook-server-1840

# Conflicts:
#	src/NETWORK_SECURITY.md
@ilblackdragon
Copy link
Copy Markdown
Member

Review — paranoid architect

Thanks for tightening --cli-only; the unified enable_non_cli guard is a clean improvement over scattered !cli.cli_only checks, and the source-level regression test is a thoughtful tripwire.

High — relay channels bypass the guard

src/main.rs:935-940 (not in this diff — but load-bearing for the contract this PR claims to enforce):

if let Some(ref ext_mgr) = components.extension_manager {
    ext_mgr.set_relay_channel_manager(Arc::clone(&channels)).await;
    ext_mgr.restore_relay_channels(&ext_user_id).await;
}

Under --cli-only, this still runs. restore_relay_channels -> activate_stored_relay -> activate_channel_relay (src/extensions/manager.rs:4896):

  1. Fetches team_id from settings.
  2. Makes an outbound HTTPS call to the relay server's /relay/signing-secret endpoint (manager.rs:4998).
  3. Calls hot_add on the channel manager (channels/manager.rs:57), registering a live RelayChannel — so any inbound message delivered through another path (or a later hot_add webhook route) would still be processed, and respond()/broadcast_all() now fans out to the relay provider.

The comment at src/main.rs:878 (Relay channels are handled separately below via restore_relay_channels()) acknowledges this is a distinct path; it just isn't included in the new suppression. The PR body, the docstring on non_cli_channels_enabled, and the NETWORK_SECURITY.md delta (no webhook server, WASM channel endpoints, HTTP channel, Signal channel, gateway channel, managed tunnel, or sandbox orchestrator API will start) all assert a complete no-non-CLI-side-effect contract, which this gap violates in the outbound direction.

The all_non_cli_ingress_paths_are_guarded test will not catch this because restore_relay_channels( and set_relay_channel_manager( are not in guarded_calls. Suggest: (a) wrap those two lines in if enable_non_cli, and (b) add restore_relay_channels( / set_relay_channel_manager( to the guarded_calls list.

Medium — tool-registration view of --cli-only

The prompt_queue fix at src/main.rs:626 (config.sandbox.enabled && container_job_manager.is_some()) is correct — good catch that the old form would register JobPromptTool with a dead queue.

But note: under --cli-only the code still calls components.tools.register_job_tools(...) with container_job_manager = None, which registers CreateJobTool, ListJobsTool, JobStatusTool, CancelJobTool (plus JobEventsTool if a DB exists). That's intentional — scheduler-backed jobs still work CLI-only — but the "all non-CLI services disabled" story in the NETWORK_SECURITY.md note is specifically about listeners, not tool surface. Worth saying so explicitly in the doc delta to avoid a future reader concluding jobs are disabled under --cli-only.

Medium — test fragility in all_non_cli_ingress_paths_are_guarded

src/main.rs:1310-1354. Two concerns:

  1. The "preceding 500 chars contains enable_non_cli" heuristic will happily pass if enable_non_cli appears in an unrelated nearby line (e.g. an else branch that does not actually gate the call). A future refactor like if !enable_non_cli { ...note... } ... webhooks::routes(...) would satisfy the grep without gating anything. A stronger check would parse the enclosing if block or require the call to appear inside a {-balanced region following enable_non_cli.
  2. include_str!("main.rs") embeds the file relative to this source file — fine today, but if async_main moves to app.rs (per CLAUDE.md "Module-owned initialization"), the test silently keeps green-stamping a stale blob. A #[cfg(test)] fn assert_guarded() colocated with the guarded code, or walking proc_macro2/syn over the real module, would survive that refactor.

Low — doc claim slightly overstates scope

NETWORK_SECURITY.md +141 lists "no ... Signal channel" — correct — but omits relay channels (which this PR does not in fact suppress; see High above). If the relay gap is fixed, update the sentence to include them; if it isn't, remove the stronger "all non-CLI ingress listeners are suppressed" framing so the doc matches reality.

Notes that held up

  • Tunnel suppression at src/main.rs:377-381 is clean.
  • Orchestrator suppression + DockerStatus::Disabled propagation to SandboxReadiness::DisabledByConfig (main.rs:999-1001) is the right move — previously a suppressed orchestrator would have shown up as DockerUnavailable, which would mislead routine dispatch.
  • Gateway, HTTP, Signal, WASM-channel, and webhook-route suppressions are each gated at the point where the listener/registration actually happens — no half-built state left behind.
  • No .unwrap()/.expect() added in production paths (the one .expect at line 553 is pre-existing and reachable only when the HTTP channel is configured). Tests are fine under the project's rules.
  • No info!/warn! added that would corrupt the REPL.

Verdict

Not blocking merge on the relay gap per se, since the original issue #1840 was specifically the webhook listener — but the PR's framing ("all non-CLI ingress listeners are suppressed") overreaches given the relay path is still live. Either (a) extend the suppression to relay restoration and widen the regression test, or (b) narrow the doc/comment language to "listener ports" and file a follow-up for the outbound-side cleanup.

  Guard persisted relay channel restoration behind the non-CLI startup flag so CLI-only mode does not make relay calls or hot-add relay
  channels.

  Update the CLI-only regression test to check guarded startup calls with byte-based, brace-delimited source scanning, and document that CLI-
  only suppresses network-facing services while preserving scheduler-backed job tools.
@github-actions github-actions Bot added size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Apr 18, 2026
@reidliu41
Copy link
Copy Markdown
Contributor Author

updated

This was referenced Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: docs Documentation size: L 200-499 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--cli-only still starts webhook server and ignores HTTP_HOST/HTTP_PORT

3 participants