Skip to content

Engine V2 bypasses inbound secret scanning — tokens sent directly to LLM #2491

@serrrfirat

Description

@serrrfirat

Summary

When ENGINE_V2=true, user messages bypass the scan_inbound_for_secrets() safety check entirely. Secrets (API keys, tokens, credentials) pasted in chat are sent directly to the LLM without detection or blocking.

This was confirmed on staging — a Slack bot token (xoxb-...) was pasted in chat and the assistant accepted it, echoed it back, and used it to call slack_tool. The token is now permanently stored in conversation history (per the "LLM data is never deleted" invariant).

Root Cause

The V1 path (thread_ops.rs:553-563) calls safety.scan_inbound_for_secrets(content) before processing. The V2 path (bridge/router.rs:2397-2409) routes through handle_with_engine_inner() which passes raw content directly to conversation_manager.handle_user_message() — no secret scanning anywhere in the V2 pipeline.

V1 (protected):

process_user_input() → safety.scan_inbound_for_secrets() → agentic loop

V2 (unprotected):

handle_with_engine_inner() → conversation_manager.handle_user_message(content) // no scan

There are zero references to scan_inbound_for_secrets, scan_and_clean, or leak_detect in src/bridge/router.rs.

Routing logic

src/agent/agent_loop.rs:1396-1401:

if self.config.engine_v2 {
    match &submission {
        Submission::UserInput { content } => {
            return crate::bridge::handle_with_engine(self, message, content)
                .await
                .map(HandleOutcome::from_legacy);
        }

Impact

  • All V2 users are unprotected — any pasted secret (Slack tokens, OpenAI keys, AWS keys, PEM private keys, etc.) goes straight to the LLM
  • Secrets are permanently stored in conversation history per the "LLM data is never deleted" policy
  • LLM may echo tokens in responses, compounding exposure
  • The 21 patterns in LeakDetector (including xox[baprs]-, sk-, AKIA, ghp_, PEM keys, etc.) are all bypassed

Fix

Add agent.safety().scan_inbound_for_secrets(content) to handle_with_engine_inner() in src/bridge/router.rs, before the handle_user_message() call. Return an error to the user if a secret is detected, matching the V1 behavior.

Reproduction

  1. Set ENGINE_V2=true
  2. Paste any token matching a LeakDetector pattern (e.g., xoxb-1234567890-abcdefghij)
  3. Observe: message is accepted and processed instead of being blocked

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingbug_bash_P1Issue created from daily "Bug Bash" sessions with suggested priorityscope: agentAgent core (agent loop, router, scheduler)scope: safetyPrompt injection defensesecurity-review-requiredPR requires security review before merge

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions