Anti-Evaluation Awareness Steering

It would be extremely useful if the harness had a way to steer the LLM to not be eval-aware before running the tests, in a way similar to [Steering Evaluation-Aware Language Models to Act Like They Are Deployed](https://arxiv.org/abs/2510.20487). Eval-Awareness risks interfering with our measurements, so implementing preventative measures in the Harness would be valuable.

This could look like:

- Calculating steering vectors
- Caching them locally (though ideally on the cloud so that users can save a step if they want to steer a model again)
- Add a flag to evaluations that determines "whether to apply deployment steering"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anti-Evaluation Awareness Steering #3619

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Anti-Evaluation Awareness Steering #3619

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions