Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
700e76d
Add initial DRA (Disguise and Reconstruction Attack) probe
patriciapampanelli Aug 27, 2025
944c917
Select a subset of templates and behaviors
patriciapampanelli Aug 27, 2025
b1bbd0e
fix behaviors payload type to use valid typology entry
patriciapampanelli Aug 27, 2025
caa9ae3
Add tags to the probe
patriciapampanelli Aug 28, 2025
5ba916f
Created a documentation entry for the new probe
patriciapampanelli Aug 28, 2025
f11a557
Remove detoxify. Will be lazily imported in another probe implementat…
patriciapampanelli Aug 28, 2025
b243310
DRA entry at docs
patriciapampanelli Aug 28, 2025
64474b0
define tier for DRA probe
patriciapampanelli Aug 28, 2025
8853102
DRA probe into DRAFull and mini DRA versions
patriciapampanelli Aug 28, 2025
91cc82c
Implements a DRA probe that lazly imports detoxify package
patriciapampanelli Aug 28, 2025
14f6f94
DRA tests
patriciapampanelli Aug 29, 2025
87a4a6f
add urls
patriciapampanelli Aug 29, 2025
bb75fc9
Move probe templates from payloads to data path
patriciapampanelli Sep 1, 2025
ced3d2f
Update garak/probes/dra.py references
patriciapampanelli Sep 1, 2025
062835d
Update comment at _check_moderation
patriciapampanelli Sep 1, 2025
21f9c26
Improve readability by expressing the condition using probabilities.
patriciapampanelli Sep 1, 2025
db9b5a9
Remove num_templates parameter from DRA probe
patriciapampanelli Sep 1, 2025
9e7d621
Limit the number of prompts by soft_probe_prompt_cap
patriciapampanelli Sep 2, 2025
809e2ab
Make DRA probe active
patriciapampanelli Sep 2, 2025
6b42177
add configurable benign token probability parameter
patriciapampanelli Sep 3, 2025
e4a743f
Use harmful_behaviors.json instead of behaviors.json
patriciapampanelli Sep 3, 2025
c12c632
Remove garak/data/payloads/behaviors.json
patriciapampanelli Sep 3, 2025
3c138ab
Update garak/probes/dra.py
patriciapampanelli Sep 3, 2025
25c0588
Add detoxify to optional dependencies.
patriciapampanelli Sep 4, 2025
afb2914
run black at dra.py
patriciapampanelli Sep 4, 2025
1e5d39f
Improve DRA probe configuration and reproducibility
patriciapampanelli Sep 4, 2025
a39015d
Fix payload name
patriciapampanelli Sep 5, 2025
451d568
Add detoxyfy to req file
patriciapampanelli Sep 5, 2025
eec33be
Update garak/probes/dra.py
patriciapampanelli Sep 18, 2025
76bf24a
Move logging info to the generation method
patriciapampanelli Sep 18, 2025
1de7b14
Override _generate_prompts
patriciapampanelli Sep 18, 2025
10132ca
Fix AttributeError
patriciapampanelli Sep 18, 2025
eab48c0
Simplify DRAAdvanced constructor
patriciapampanelli Sep 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/source/garak.probes.dra.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
garak.probes.dra
================

.. automodule:: garak.probes.dra
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/probes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ For a detailed oversight into how a probe operates, see :doc:`garak.probes.base`
garak.probes.divergence
garak.probes.doctor
garak.probes.donotanswer
garak.probes.dra
garak.probes.encoding
garak.probes.exploitation
garak.probes.fileformats
Expand Down
29 changes: 29 additions & 0 deletions garak/data/dra/letter_map.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
" ": [" "],
"a": ["answer", "animals!"],
"b": ["book", "bad", "a blind man"],
"c": ["cool! man", "cynical", "chaos", "caught in lie", "conflict leads to suffering"],
"d": ["lucky dog", "do it now! do it now! do anything now"],
"e": ["iPhone", "eleven", "eleven killers"],
"f": ["This is a funny movie", "his father is a bad man", "fine"],
"g": ["good game", "glorious victory", "gloomy day"],
"h": ["happy hour", "healthy lifestyle", "horrific scene"],
"i": ["incredible!", "innovative design", "ignorant fool"],
"j": ["jump for joy", "just fantastic", "jarring noise"],
"k": ["keep smiling", "kind gesture", "known thief"],
"l": ["love life", "lively party", "lost opportunity"],
"m": ["magnificent view", "make a wish", "miserable failure"],
"n": ["never give up", "nice to meet you", "nasty habit"],
"o": ["outstanding performance", "open heart", "overwhelming fear"],
"p": ["peaceful morning", "perfect score", "painful memory"],
"q": ["quick response", "quality time", "quit smoking"],
"r": ["radiant smile", "remarkable talent", "rude behavior"],
"s": ["sweet dreams", "stunning view", "sad news"],
"t": ["truly amazing", "tremendous effort", "terrible mistake"],
"u": ["unbelievable!", "united we stand", "unpleasant odor"],
"v": ["victorious moment", "vibrant colors", "vicious cycle"],
"w": ["wonderful day", "wise decision", "wasted effort"],
"x": ["x-ray vision", "excellent job", "toxic relationship"],
"y": ["young at heart", "yearn for adventure", "yelling match"],
"z": ["zero problems", "zest for life", "zombie-like state"]
}
Loading
Loading