Skip to content

docs: guide to running garak faster#1463

Merged
leondz merged 2 commits intoNVIDIA:mainfrom
leondz:docs/faster
Nov 7, 2025
Merged

docs: guide to running garak faster#1463
leondz merged 2 commits intoNVIDIA:mainfrom
leondz:docs/faster

Conversation

@leondz
Copy link
Collaborator

@leondz leondz commented Nov 6, 2025

add garak docs page with info on running garak faster, incl parallelisation, aggregation, target types, and limits

@leondz leondz added the documentation Improvements or additions to documentation label Nov 6, 2025
Copy link
Member

@mikemckiernan mikemckiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for the heads up. Some is def applicable to Auditor. I appreciate it!

LMK if I can clarify any word nerd speak.

Comment on lines +33 to +38
As you might be able to guess by now, there are some good advantages to using remote endpoint generators, and some notable disadvantages to local model generators. We strongly recommend using remote generators if you're trying to do things quickly. Not least because it enables parallelisation.


Parallelisation within garak
----------------------------
Garak offers a couple of options for parallelisation, directly available on the CLI or via config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

British spelling is hardly wrong. Just checking if that's the direction you want or are willing to entertain American parallelization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely willing to entertain, and even prefer for this project! I was dismayed to learn that "s" only becomes "z" sometimes in Americanisation, because it means that I can't write US English, only read it.

First, garak doesn't have to do orchestration.
Second, it can be possible for multiple instances of the target to run in parallel without garak or you having to worry about it.
Third, the people running the endpoint have often done some quality checking and testing to make sure that the endpoint runs well, reducing the chance of the target crashing weirdly.
Fourth, because orchestration (i.e. getting models to be loaded, and to run) happens remotely, if there is a failure, the solution can be as simple (from garak's point of view) as re-sending the inference request to the target endpoint. Garak is generally pretty gentle but tenacious when it comes to dealing with endpoint failure - we know that runs can take a while and we want to mitigate the need to "babysit" them, by having garak politely try to recover the target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Generally prefer to avoid parens and Latinisms. Maybe "...because orchestration--loading and serving models--happens remotely,.." or ", such as loading and serving models,"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged!

Third, the people running the endpoint have often done some quality checking and testing to make sure that the endpoint runs well, reducing the chance of the target crashing weirdly.
Fourth, because orchestration (i.e. getting models to be loaded, and to run) happens remotely, if there is a failure, the solution can be as simple (from garak's point of view) as re-sending the inference request to the target endpoint. Garak is generally pretty gentle but tenacious when it comes to dealing with endpoint failure - we know that runs can take a while and we want to mitigate the need to "babysit" them, by having garak politely try to recover the target.

As you might be able to guess by now, there are some good advantages to using remote endpoint generators, and some notable disadvantages to local model generators. We strongly recommend using remote generators if you're trying to do things quickly. Not least because it enables parallelisation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "We" can read slightly awkwardly in tech docs. My sugg is to replace with "NVIDIA" or "Garak maintainers".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Setting parallel_requests higher than generations also has the same effect as setting parallel_requests equal to generations.

Parallel_requests and parallel_attempts are mutually exclusive, so you have to choose between them.
We find that using parallel_attempts usually gives a faster run completion time - especially when the number of generations is lower than the number of different prompts from a probe, which is more oftent he case than not in a default garak run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sugg: I respect the goal of completeness, but if the recommendation is to use parallel_attempts, then my sugg is to remove mention of parallel_requests on the entire page. The benefit is to avoid mental load.

The sentiment that parallel_attempts typically delivers a faster run duration can be captured in the reference doc for the parallel_attempts argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, moved out

Prompt cap
----------

The config item run.soft_probe_prompt_cap names the max number of prompts that probes which follow this cap should generate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sugg: s/names/specifies/ (?)

I concede limited knowledge, but I'm unclear about what "prompts that probes which follow this cap" means.

Time passed. I read the next sentence.

Sugg: "specifies a soft cap for the maximum number of prompts that a a probe should generate. This setting is a soft cap..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Aggregation with lower generations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When it's not enough to split runs up by probe - perhaps there's a slow probe, or slow model - one can also use aggregation and multiple runs to simulate the effect of the generations parameter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section may not yet be fully supported in aggregation, currently each file aggregated is expected to provide a unique set of probes. I believe, and need to test this further, that digest creation and data aggregation will need to be expanded to merge or combine generations for the same probe across multiple report.jsonl files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will comment out

@leondz leondz merged commit 4dc9841 into NVIDIA:main Nov 7, 2025
15 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants