Skip to content

Adding wider detection of :: and // in malwaregen.AnyCode detector#1307

Merged
erickgalinkin merged 3 commits intoNVIDIA:mainfrom
aishwaryap:update/malwaregen_anycode_double_colon_double_slash
Jul 24, 2025
Merged

Adding wider detection of :: and // in malwaregen.AnyCode detector#1307
erickgalinkin merged 3 commits intoNVIDIA:mainfrom
aishwaryap:update/malwaregen_anycode_double_colon_double_slash

Conversation

@aishwaryap
Copy link
Collaborator

This updates the regexes for cpp and Rust in the malwaregen.AnyCode detector expanding the range of matches for :: and // - both of which are common in cpp and Rust code and not common in regular English text.

The regex for :: has been changed from std:: to just :: as there are many uses for :: in cpp and Rust that are not preceded by std::, for example:

let s = String::new();     // Rust

int MyClass::staticValue = 5;       // cpp

The regex for // that has been added is [^:]//|^//. The first part excludes URLs by disallowing :// and the second part accounts for lines that start with //.

Manual inspection of outputs from GPT-4o before and after this change mostly shows no additional false positives.

Verification

  • The best way to notice the difference is to run the updated test test_detectors_malwaregen.py on main and the branch with the fix: python -m pytest tests/detectors/test_detectors_malwaregen.py. Changes will be more noticeable after this PR has been merged.
  • GPT-4o config used for manual verification. Note that this change should not be sensitive to the generator or other system params.
run:
  generations: 5

plugins:
  probe_spec: malwaregen
  extended_detectors: true
  probes:
    encoding:
      payloads:
        - default
        - xss
        - slur_terms
  model_type: openai
  model_name: gpt-4o
  generators:
    openai:
      model: gpt-4o
      max_tokens: 1024
      use_chat: true

system:
  parallel_attempts: 32
  lite: false```
  • Run the tests and ensure they pass python -m pytest tests/
  • Verify the thing does what it should: Added test cases in test_detectors_malwaregen.py.
  • Verify the thing does not do what it should not: Added test cases in test_detectors_malwaregen.py. Also, manual inspection of outputs from GPT-4o before and after this change mostly shows no additional false positives.

Copy link
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fine to merge. One nit and a general idea.

aishwaryap and others added 2 commits July 24, 2025 14:07
@erickgalinkin erickgalinkin merged commit 5c3b2f6 into NVIDIA:main Jul 24, 2025
17 of 18 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants