Skip to content

probe: ansi escape codes in tokenizer#1351

Merged
leondz merged 11 commits intoNVIDIA:mainfrom
leondz:feature/ansi-tokenizer-probe
Sep 17, 2025
Merged

probe: ansi escape codes in tokenizer#1351
leondz merged 11 commits intoNVIDIA:mainfrom
leondz:feature/ansi-tokenizer-probe

Conversation

@leondz
Copy link
Collaborator

@leondz leondz commented Sep 1, 2025

add probe for scanning HF tokenizers for tokens bearing raw escape codes

Verification

  • Basic execution garak -m huggingface -n gpt2 -p ansiescape.AnsiRawTokenizerHF
  • Test new probe & existing ansiescape probes python -m pytest tests/probes/test_detectors_ansiescape.py
  • Test behaviour on non-HF generator garak -m openai -n o3-mini -p ansiescape.AnsiRawTokenizerHF (should noop)

@leondz leondz added probes Content & activity of LLM probes new plugin Describes an entirely new probe, detector, generator or harness labels Sep 1, 2025
Comment on lines +189 to +194
for t in tok.vocab:
if any(payload in t for payload in LIVE_PAYLOAD_TOKENS):
attempts.append(_get_token_attempt(t))
elif not clean_attempt_found:
clean_attempt_found = True
attempts.append(_get_token_attempt(t))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm thinking about this -- does the escape code need to be a single token? Or will it work otherwise? I think we can be more efficient and more accurate with this.

Looking at tiktoken for an example:

>>> import tiktoken
>>> enc = tiktoken.encoding_for_model("gpt-4")
>>> enc.encode("\x1b[")
[91535]
>>> enc.encode("\x1b]")
[215, 60]
>>> enc.encode("\x9b")
[126, 249]
>>> enc.encode("\x9d")
[126, 251]
>>> enc.decode([91535])
'\x1b['
>>> enc.decode([215, 60])
'\x1b]'
>>> enc.decode([126, 251])
'\x9d'
>>> enc.decode([126, 249])
'\x9b'

Looks like only one of them is encoded as a single token, but all of these will still work.

I think we can rewrite this to go only over the set of LIVE_PAYLOAD_TOKENS (much smaller set) and then rewrite _get_token_attempt to encode, then decode, and if the same string pops out, Bob's your uncle.
If it has to be a single token (I don't believe it does), then we only need to check that tok.convert_tokens_to_ids(token_to_check) tokenizes to a single token not equal to tok.unk_token_id.

Thoughts?

Copy link
Collaborator Author

@leondz leondz Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, like are escape codes composable? Crossed my mind too. This is cool, we should add these. And maybe even be principled about it.

The codes are already starting to be found in four-five places - should possibly get factored out as payloads or data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some Intricacies predicated on tokeniser implementation here. tiktoken has its own way of handling these sequences. HF tokenizers has another. Will rename class to scope just to hf. More mining on how to get these out of hf tokenizers is appropriate, but this feels like a borderline red team/research question. What's the minimum bar you'd like to see for acceptance here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could accept it as-is, but IMO there are two adjacent questions worth answering:

  1. In the current implementation, we are checking single tokens in the vocabulary. Wouldn't it be more efficient to have something like:
for escape_code in `LIVE_PAYLOAD_TOKENS`:
    tokenized = tok.convert_tokens_to_ids(escape_code)
    if len(tokenized) == 1 and tokenized != tok.unk_token_id:
        attempts.append(_get_token_attempt(t))

or whatever -- you get it.

  1. If it does not need to be a single token, then shouldn't we simply check if:
for escape_code in `LIVE_PAYLOAD_TOKENS`:
    if tok.decode(tok.encode(escape_code)) == escape_code:
        do_whatever()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We're checking all the tokenizer vocab to see if it has any entries containing a usable sequence, not just necessarily matching. I can see multiple modes worth checking for - the current one is conservative (i.e. sensitive)
  2. This is a fine test for tiktoken, yeah. Current probe focuses on Hugging Face models (& has been renamed accordingly)

@leondz leondz requested a review from jmartin-tech September 5, 2025 11:44
@leondz leondz merged commit 9a5d898 into NVIDIA:main Sep 17, 2025
15 of 18 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 17, 2025
@leondz leondz deleted the feature/ansi-tokenizer-probe branch September 17, 2025 11:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

new plugin Describes an entirely new probe, detector, generator or harness probes Content & activity of LLM probes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants