probe: ansi escape codes in tokenizer#1351
Conversation
garak/probes/ansiescape.py
Outdated
| for t in tok.vocab: | ||
| if any(payload in t for payload in LIVE_PAYLOAD_TOKENS): | ||
| attempts.append(_get_token_attempt(t)) | ||
| elif not clean_attempt_found: | ||
| clean_attempt_found = True | ||
| attempts.append(_get_token_attempt(t)) |
There was a problem hiding this comment.
So I'm thinking about this -- does the escape code need to be a single token? Or will it work otherwise? I think we can be more efficient and more accurate with this.
Looking at tiktoken for an example:
>>> import tiktoken
>>> enc = tiktoken.encoding_for_model("gpt-4")
>>> enc.encode("\x1b[")
[91535]
>>> enc.encode("\x1b]")
[215, 60]
>>> enc.encode("\x9b")
[126, 249]
>>> enc.encode("\x9d")
[126, 251]
>>> enc.decode([91535])
'\x1b['
>>> enc.decode([215, 60])
'\x1b]'
>>> enc.decode([126, 251])
'\x9d'
>>> enc.decode([126, 249])
'\x9b'
Looks like only one of them is encoded as a single token, but all of these will still work.
I think we can rewrite this to go only over the set of LIVE_PAYLOAD_TOKENS (much smaller set) and then rewrite _get_token_attempt to encode, then decode, and if the same string pops out, Bob's your uncle.
If it has to be a single token (I don't believe it does), then we only need to check that tok.convert_tokens_to_ids(token_to_check) tokenizes to a single token not equal to tok.unk_token_id.
Thoughts?
There was a problem hiding this comment.
Oh, like are escape codes composable? Crossed my mind too. This is cool, we should add these. And maybe even be principled about it.
The codes are already starting to be found in four-five places - should possibly get factored out as payloads or data
There was a problem hiding this comment.
there are some Intricacies predicated on tokeniser implementation here. tiktoken has its own way of handling these sequences. HF tokenizers has another. Will rename class to scope just to hf. More mining on how to get these out of hf tokenizers is appropriate, but this feels like a borderline red team/research question. What's the minimum bar you'd like to see for acceptance here?
There was a problem hiding this comment.
I think we could accept it as-is, but IMO there are two adjacent questions worth answering:
- In the current implementation, we are checking single tokens in the vocabulary. Wouldn't it be more efficient to have something like:
for escape_code in `LIVE_PAYLOAD_TOKENS`:
tokenized = tok.convert_tokens_to_ids(escape_code)
if len(tokenized) == 1 and tokenized != tok.unk_token_id:
attempts.append(_get_token_attempt(t))or whatever -- you get it.
- If it does not need to be a single token, then shouldn't we simply check if:
for escape_code in `LIVE_PAYLOAD_TOKENS`:
if tok.decode(tok.encode(escape_code)) == escape_code:
do_whatever()There was a problem hiding this comment.
- We're checking all the tokenizer vocab to see if it has any entries containing a usable sequence, not just necessarily matching. I can see multiple modes worth checking for - the current one is conservative (i.e. sensitive)
- This is a fine test for
tiktoken, yeah. Current probe focuses on Hugging Face models (& has been renamed accordingly)
add probe for scanning HF tokenizers for tokens bearing raw escape codes
Verification
garak -m huggingface -n gpt2 -p ansiescape.AnsiRawTokenizerHFpython -m pytest tests/probes/test_detectors_ansiescape.pygarak -m openai -n o3-mini -p ansiescape.AnsiRawTokenizerHF(should noop)