Skip to content

feat: mask sensitive infrastructure identifiers before model calls (#…#634

Merged
Devesh36 merged 2 commits intoTracer-Cloud:mainfrom
hamzzaaamalik:issue/478-mask-sensitive-identifiers
Apr 17, 2026
Merged

feat: mask sensitive infrastructure identifiers before model calls (#…#634
Devesh36 merged 2 commits intoTracer-Cloud:mainfrom
hamzzaaamalik:issue/478-mask-sensitive-identifiers

Conversation

@hamzzaaamalik
Copy link
Copy Markdown
Collaborator

@hamzzaaamalik hamzzaaamalik commented Apr 17, 2026

Adds a reversible masking layer that swaps pod/cluster/host/account/IP/
email identifiers with stable placeholders before sending prompts to the
LLM, and restores the originals in the final Slack report.

Configurable via OPENSRE_MASK_ENABLED and OPENSRE_MASK_KINDS env vars.
Off by default - no behavior change for existing users.

Closes #478

Comment thread app/masking/detectors.py Fixed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR adds an opt-in, reversible masking layer that replaces infrastructure identifiers (pods, namespaces, clusters, IPs, emails, etc.) with stable placeholders before LLM calls and restores them in user-facing Slack output. All previously flagged issues — private import of _compile_extra_patterns, per-call regex compilation, partial-overlap corruption, and counter inflation — are correctly resolved in the follow-up commit. The feature is off by default, integrates cleanly with AgentState, and ships with solid unit and integration test coverage.

Confidence Score: 5/5

Safe to merge — feature is off by default, all prior P0/P1 issues resolved, remaining findings are minor P2 suggestions.

All previously flagged blocking issues (private import, per-call compilation, partial-overlap corruption, counter inflation) are properly addressed. No new P0 or P1 defects found. The two remaining observations — unmasked dict keys and the silent ALL_KINDS fallback — are P2 quality improvements that don't affect correctness for the common case.

app/masking/context.py (dict-key masking gap) and app/masking/policy.py (silent ALL_KINDS fallback) warrant a second look before the feature is widely enabled.

Important Files Changed

Filename Overview
app/masking/policy.py Pydantic-based MaskingPolicy with env-var loading; compile_extra_patterns promoted to public API; kind validation and bool parsing are clean.
app/masking/detectors.py Regex-based identifier detection; partial-overlap guard added in _resolve_overlaps; compiled_extras parameter added to find_identifiers for one-time compilation.
app/masking/context.py MaskingContext with stable placeholder map; counter inflation bug fixed by accumulating max-index first; _compiled_extras compiled once in init; mask_value masks dict values but not keys.
app/nodes/investigate/node.py Masking applied to evidence before downstream LLM nodes; masking_map conditionally written to state only when non-empty.
app/nodes/root_cause_diagnosis/node.py LLM response fields (root_cause, causal_chain, claims) are unmasked before writing back to state; correct integration with MaskingContext.from_state.
app/nodes/publish_findings/node.py slack_message, short_summary, and all_blocks unmasked before delivery; send_ingest receives unmasked report with full state as intended.
app/state/agent_state.py masking_map field added to both AgentState TypedDict and AgentStateModel Pydantic model; kept in sync as required.

Sequence Diagram

sequenceDiagram
    participant Inv as node_investigate
    participant State as AgentState
    participant RCA as node_root_cause_diagnosis
    participant Pub as node_publish_findings
    participant LLM as External LLM

    Inv->>Inv: Execute tool actions → raw evidence
    Inv->>Inv: MaskingContext.from_state(state) mask_value(evidence)
    Inv->>State: evidence = masked_evidence, masking_map = {placeholder→original}

    RCA->>State: read masked evidence + masking_map
    RCA->>LLM: build_diagnosis_prompt(state, masked_evidence)
    LLM-->>RCA: response (may contain placeholders)
    RCA->>RCA: MaskingContext.from_state(state) unmask(root_cause, causal_chain, claims)
    RCA->>State: root_cause = unmasked value

    Pub->>State: read root_cause (unmasked) + masking_map
    Pub->>Pub: format_slack_message(ctx) → slack_message
    Pub->>Pub: MaskingContext.from_state(state) unmask(slack_message, problem_md, all_blocks)
    Pub->>Pub: send_slack_report(unmasked message)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: app/masking/context.py
Line: 120-122

Comment:
**Dict keys not masked in `mask_value`**

`mask_value` recurses into dict *values* only — dict *keys* are passed through unchanged. This is a gap when Kubernetes evidence stores identifiers as keys, e.g. `{"etl-worker-7d9f8b-xkp2q": {"status": "Failed"}}`. The integration fixture test (`test_integration_with_k8s_fixture.py`) uses `json.dumps(masked)` to scan for the namespace value, which wouldn't catch a pod name that survived as a key.

```suggestion
        if isinstance(value, dict):
            return {self.mask(k) if isinstance(k, str) else k: self.mask_value(v) for k, v in value.items()}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/masking/policy.py
Line: 71-78

Comment:
**Silent ALL_KINDS fallback when every specified kind is invalid**

When all entries in `OPENSRE_MASK_KINDS` are unrecognised, `_filter_valid_kinds` silently falls back to masking *all* identifier kinds. An operator who sets `OPENSRE_MASK_KINDS=internal_only` expecting restricted masking would instead get every built-in detector active — the opposite of their intent — with only per-kind `ignoring unknown identifier kind` warnings but no indication that the fallback occurred.

```suggestion
        if valid:
            return tuple(valid)
        logger.warning(
            "[masking] all specified kinds were invalid; falling back to all defaults: %s",
            ", ".join(ALL_KINDS),
        )
        return ALL_KINDS
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "fix: CodeQL ReDoS + Greptile review (par..." | Re-trigger Greptile

Comment thread app/masking/detectors.py Outdated
Comment thread app/masking/detectors.py Outdated
Comment thread app/masking/detectors.py
Comment thread app/masking/context.py Outdated
@Devesh36 Devesh36 merged commit c5db908 into Tracer-Cloud:main Apr 17, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P0 Mask sensitive infrastructure identifiers before model calls

3 participants