Skip to content

expand items in report entry_type:eval#1547

Merged
leondz merged 4 commits intoNVIDIA:mainfrom
leondz:reporting/extend_eval_entry
Jan 15, 2026
Merged

expand items in report entry_type:eval#1547
leondz merged 4 commits intoNVIDIA:mainfrom
leondz:reporting/extend_eval_entry

Conversation

@leondz
Copy link
Collaborator

@leondz leondz commented Jan 12, 2026

This adds counts for skipped and failed evals, and relays totals for both processed and evaluated output counts.

Previously the report eval entry only listed one "total" and it was ambiguous whether or not this was with None outputs, leading to unstable counting.

Now this is clarified:

  • passed - number of passing outputs
  • fails - number of failing outputs (hits)
  • nones - number of Nones from generator/detector
  • total_processed - total number of results from the generator/probe processed and passed to the detector
  • total_evaluated - total number of target outputs evaluated (for most detectors, this will exclude Nones)

@leondz leondz requested a review from jmartin-tech January 12, 2026 10:01
@leondz leondz added architecture Architectural upgrades reporting Reporting, analysis, and other per-run result functions labels Jan 12, 2026
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing looks good. This revision reflects a breaking change that impacts report aggregation and html generation for reports generated on version prior to the change.

@leondz leondz merged commit e80d541 into NVIDIA:main Jan 15, 2026
15 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

architecture Architectural upgrades reporting Reporting, analysis, and other per-run result functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants