docs: guide to running garak faster#1463
Conversation
mikemckiernan
left a comment
There was a problem hiding this comment.
Thanks a bunch for the heads up. Some is def applicable to Auditor. I appreciate it!
LMK if I can clarify any word nerd speak.
docs/source/faster.md
Outdated
| As you might be able to guess by now, there are some good advantages to using remote endpoint generators, and some notable disadvantages to local model generators. We strongly recommend using remote generators if you're trying to do things quickly. Not least because it enables parallelisation. | ||
|
|
||
|
|
||
| Parallelisation within garak | ||
| ---------------------------- | ||
| Garak offers a couple of options for parallelisation, directly available on the CLI or via config. |
There was a problem hiding this comment.
British spelling is hardly wrong. Just checking if that's the direction you want or are willing to entertain American parallelization.
There was a problem hiding this comment.
Definitely willing to entertain, and even prefer for this project! I was dismayed to learn that "s" only becomes "z" sometimes in Americanisation, because it means that I can't write US English, only read it.
docs/source/faster.md
Outdated
| First, garak doesn't have to do orchestration. | ||
| Second, it can be possible for multiple instances of the target to run in parallel without garak or you having to worry about it. | ||
| Third, the people running the endpoint have often done some quality checking and testing to make sure that the endpoint runs well, reducing the chance of the target crashing weirdly. | ||
| Fourth, because orchestration (i.e. getting models to be loaded, and to run) happens remotely, if there is a failure, the solution can be as simple (from garak's point of view) as re-sending the inference request to the target endpoint. Garak is generally pretty gentle but tenacious when it comes to dealing with endpoint failure - we know that runs can take a while and we want to mitigate the need to "babysit" them, by having garak politely try to recover the target. |
There was a problem hiding this comment.
nit: Generally prefer to avoid parens and Latinisms. Maybe "...because orchestration--loading and serving models--happens remotely,.." or ", such as loading and serving models,"
docs/source/faster.md
Outdated
| Third, the people running the endpoint have often done some quality checking and testing to make sure that the endpoint runs well, reducing the chance of the target crashing weirdly. | ||
| Fourth, because orchestration (i.e. getting models to be loaded, and to run) happens remotely, if there is a failure, the solution can be as simple (from garak's point of view) as re-sending the inference request to the target endpoint. Garak is generally pretty gentle but tenacious when it comes to dealing with endpoint failure - we know that runs can take a while and we want to mitigate the need to "babysit" them, by having garak politely try to recover the target. | ||
|
|
||
| As you might be able to guess by now, there are some good advantages to using remote endpoint generators, and some notable disadvantages to local model generators. We strongly recommend using remote generators if you're trying to do things quickly. Not least because it enables parallelisation. |
There was a problem hiding this comment.
nit: "We" can read slightly awkwardly in tech docs. My sugg is to replace with "NVIDIA" or "Garak maintainers".
docs/source/faster.md
Outdated
| Setting parallel_requests higher than generations also has the same effect as setting parallel_requests equal to generations. | ||
|
|
||
| Parallel_requests and parallel_attempts are mutually exclusive, so you have to choose between them. | ||
| We find that using parallel_attempts usually gives a faster run completion time - especially when the number of generations is lower than the number of different prompts from a probe, which is more oftent he case than not in a default garak run. |
There was a problem hiding this comment.
sugg: I respect the goal of completeness, but if the recommendation is to use parallel_attempts, then my sugg is to remove mention of parallel_requests on the entire page. The benefit is to avoid mental load.
The sentiment that parallel_attempts typically delivers a faster run duration can be captured in the reference doc for the parallel_attempts argument.
There was a problem hiding this comment.
This makes sense, moved out
docs/source/faster.md
Outdated
| Prompt cap | ||
| ---------- | ||
|
|
||
| The config item run.soft_probe_prompt_cap names the max number of prompts that probes which follow this cap should generate. |
There was a problem hiding this comment.
sugg: s/names/specifies/ (?)
I concede limited knowledge, but I'm unclear about what "prompts that probes which follow this cap" means.
Time passed. I read the next sentence.
Sugg: "specifies a soft cap for the maximum number of prompts that a a probe should generate. This setting is a soft cap..."
docs/source/faster.md
Outdated
| Aggregation with lower generations | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| When it's not enough to split runs up by probe - perhaps there's a slow probe, or slow model - one can also use aggregation and multiple runs to simulate the effect of the generations parameter. |
There was a problem hiding this comment.
I think this section may not yet be fully supported in aggregation, currently each file aggregated is expected to provide a unique set of probes. I believe, and need to test this further, that digest creation and data aggregation will need to be expanded to merge or combine generations for the same probe across multiple report.jsonl files.
…p.; give config param paths
add garak docs page with info on running garak faster, incl parallelisation, aggregation, target types, and limits