Skip to content

fix: scan inbound messages for leaked secrets#433

Merged
ilblackdragon merged 2 commits intonearai:mainfrom
zmanian:fix/393-inbound-secret-scan
Mar 1, 2026
Merged

fix: scan inbound messages for leaked secrets#433
ilblackdragon merged 2 commits intonearai:mainfrom
zmanian:fix/393-inbound-secret-scan

Conversation

@zmanian
Copy link
Copy Markdown
Collaborator

@zmanian zmanian commented Feb 28, 2026

Summary

  • Add scan_inbound_for_secrets() to SafetyLayer that reuses the existing LeakDetector on user input
  • Wire into thread_ops.rs after policy check to reject messages containing API keys/tokens before they reach the LLM
  • Prevents error loops where the LLM echoes a secret back, triggering the outbound leak detector
  • Fixes Bug: Telegram channel Secrets filter causes endless loop #393

Test plan

  • Send a message containing an API key pattern (e.g., sk-proj-abc123...) and verify it is rejected with a user-friendly warning
  • Send a normal message and verify it passes through unblocked
  • Verify existing leak detection tests still pass (cargo test safety)

🤖 Generated with Claude Code

@github-actions github-actions Bot added scope: agent Agent core (agent loop, router, scheduler) scope: safety Prompt injection defense size: S 10-49 changed lines labels Feb 28, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the application's security posture by implementing inbound secret detection for user messages. By scanning for sensitive information like API keys and tokens before messages are processed by the LLM, it prevents potential data exposure and eliminates error loops that previously occurred when the LLM echoed secrets back. This ensures a more robust and secure interaction flow.

Highlights

  • Inbound Secret Scanning: Introduced a new scan_inbound_for_secrets() function within the SafetyLayer to proactively detect API keys and tokens in user input messages.
  • Integration into Message Flow: Integrated the inbound secret scanning into thread_ops.rs to check messages immediately after policy validation, preventing secrets from reaching the LLM.
  • Error Loop Prevention: Addressed a critical issue where LLMs could echo leaked secrets, triggering outbound leak detectors and creating undesirable error loops, thereby improving system stability.
  • Issue Resolution: Fixed issue Bug: Telegram channel Secrets filter causes endless loop #393, which was related to the handling of leaked secrets.
Changelog
  • src/agent/thread_ops.rs
    • Added a new conditional block to call self.safety().scan_inbound_for_secrets(content) after policy checks.
    • Included logging for blocked inbound messages containing secrets.
    • Returned an error SubmissionResult with a user-friendly warning if a secret is detected.
  • src/safety/mod.rs
    • Implemented pub fn scan_inbound_for_secrets(&self, input: &str) -> Option<String> which utilizes the existing self.leak_detector.scan_and_clean(input).
    • Provided two distinct warning messages based on whether the secret was redacted or blocked entirely by the leak detector.
    • Added documentation for the new function explaining its purpose and return value.
Activity
  • The pull request author provided a detailed summary of changes and a test plan.
  • No specific review comments or other user activities have been recorded yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added risk: high Safety, secrets, auth, or critical infrastructure contributor: experienced 6-19 merged PRs labels Feb 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a security check to scan inbound user messages for secrets like API keys. This is a valuable addition to prevent secrets from being sent to the LLM, which could lead to them being echoed back and triggering outbound leak detection loops. The implementation in thread_ops.rs is correct. In safety/mod.rs, I've suggested a refactoring to simplify the logic, remove code duplication, and provide a more consistent and helpful warning message to the user.

Comment thread src/safety/mod.rs
Comment on lines +134 to +155
pub fn scan_inbound_for_secrets(&self, input: &str) -> Option<String> {
match self.leak_detector.scan_and_clean(input) {
Ok(cleaned) if cleaned != input => {
// The leak detector found and redacted something
Some(
"Your message appears to contain a secret (API key, token, or credential). \
For security, it was not sent to the AI. Please remove the secret and try again. \
To store credentials, use the setup form or `ironclaw config set <name> <value>`."
.to_string(),
)
}
Err(_) => {
// The leak detector blocked the content entirely
Some(
"Your message was blocked because it contains what appears to be a secret. \
Please remove credentials from your message and try again."
.to_string(),
)
}
_ => None, // Clean input
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling redacted secrets (Ok(cleaned) if cleaned != input) and blocked secrets (Err(_)) is duplicated, and the user-facing warning message for blocked secrets is less helpful than the one for redacted secrets.

To improve code clarity and provide a consistent, helpful message to the user, you can refactor this to determine if a secret was found and then return a single, unified warning message.

    pub fn scan_inbound_for_secrets(&self, input: &str) -> Option<String> {
        let found_secret = match self.leak_detector.scan_and_clean(input) {
            Ok(cleaned) if cleaned != input => true, // Redacted
            Err(_) => true, // Blocked
            _ => false, // Clean or warn-only
        };

        if found_secret {
            Some(
                "Your message appears to contain a secret (API key, token, or credential). For security, it was not sent to the AI. Please remove the secret and try again. To store credentials, use the setup form or `ironclaw config set <name> <value>`."
                    .to_string(),
            )
        } else {
            None
        }
    }

zmanian added a commit to zmanian/ironclaw that referenced this pull request Mar 1, 2026
Both the detected-secret and error branches now show the same
actionable message guiding users to remove secrets and use the
config system instead.

Addresses Gemini review feedback on PR nearai#433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian and others added 2 commits February 28, 2026 21:38
…earai#393)

Add scan_inbound_for_secrets() to SafetyLayer that reuses the existing
leak detector on user input. Wire it into thread_ops.rs after the policy
check so messages containing API keys or tokens are rejected early,
preventing the LLM from echoing them back and triggering outbound
leak-detection error loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both the detected-secret and error branches now show the same
actionable message guiding users to remove secrets and use the
config system instead.

Addresses Gemini review feedback on PR nearai#433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zmanian zmanian force-pushed the fix/393-inbound-secret-scan branch from 1a618db to 3b8b042 Compare March 1, 2026 05:38
@ilblackdragon ilblackdragon merged commit 1f2e8c3 into nearai:main Mar 1, 2026
13 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 1, 2026
zmanian added a commit to zmanian/ironclaw that referenced this pull request Mar 1, 2026
* fix: scan inbound messages for leaked secrets before LLM processing (nearai#393)

Add scan_inbound_for_secrets() to SafetyLayer that reuses the existing
leak detector on user input. Wire it into thread_ops.rs after the policy
check so messages containing API keys or tokens are rejected early,
preventing the LLM from echoing them back and triggering outbound
leak-detection error loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify inbound secret scan warning messages

Both the detected-secret and error branches now show the same
actionable message guiding users to remove secrets and use the
config system instead.

Addresses Gemini review feedback on PR nearai#433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* fix: scan inbound messages for leaked secrets before LLM processing (nearai#393)

Add scan_inbound_for_secrets() to SafetyLayer that reuses the existing
leak detector on user input. Wire it into thread_ops.rs after the policy
check so messages containing API keys or tokens are rejected early,
preventing the LLM from echoing them back and triggering outbound
leak-detection error loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify inbound secret scan warning messages

Both the detected-secret and error branches now show the same
actionable message guiding users to remove secrets and use the
config system instead.

Addresses Gemini review feedback on PR nearai#433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: experienced 6-19 merged PRs risk: high Safety, secrets, auth, or critical infrastructure scope: agent Agent core (agent loop, router, scheduler) scope: safety Prompt injection defense size: S 10-49 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Telegram channel Secrets filter causes endless loop

2 participants