Detector/Evaluator output indexing mismatch in multi-turn conversations

There's a mismatch between what detectors return and what evaluators expect when dealing with multi-turn conversations:

- **Detectors** are expected to return a list of length len(all_outputs) (all assistant turns in the conversation)
- **Evaluators** index into `attempt.outputs` (only the last assistant turn's output)

The issue manifests in `garak/evaluators/base.py:81` - where `messages.append(attempt.outputs[idx])` assumes alignment with detector results. However detector results are of length `attempt.all_outputs` which is greater than `attempt.outputs` in multi-turn setting.

I wanted to check if this is an expected behavior..?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detector/Evaluator output indexing mismatch in multi-turn conversations #1430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detector/Evaluator output indexing mismatch in multi-turn conversations #1430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions