Add ce_loss metric and TriviaQA/NaturalQuestions tasks by OyvindTafjord · Pull Request #520 · allenai/OLMo

OyvindTafjord · 2024-03-23T01:31:00Z

With @yulinggu-cs we have added a new ce_loss metric which just scores the cross-entropy loss of the gold answer to tasks, which should have positive correlation with actual task performance.

As examples of using this metric, we added these new tasks:

trivia_qa_wiki_ppl: TrivaQA 8k validation set (Wiki subset, used in Llama-2 paper)
natural_qs_open_ppl: NaturalQuestions (nq_open validation subset, 3.6k questions, used in Llama papers)
arc_easy_ppl: Sample example of adding a CE-loss version of existing task (this is somewhat inefficient, redoing part of the earlier evaluation, so should be optimized if we end up doing this for existing tasks)

We log tasks with this metric to a different tab in wandb.

OyvindTafjord added 8 commits March 22, 2024 17:27

Add ce_loss metric type

9255ea6

Log ce_loss evaluations to separate panel

64b5e66

Add trivia_qa and natural_qs tasks (ce loss)

c2e630e

Fix bug

cba6076

Fix and simplify ce_loss computation

13de135

Fix sign of ce_loss

ed9250b

Update CHANGELOG.md

9a9cc8e

Rename _web to _wiki

ea54c12

OyvindTafjord requested review from dirkgr and epwalsh March 23, 2024 01:31

epwalsh approved these changes Mar 27, 2024

View reviewed changes

Merge branch 'main' into add-ce-loss-metric

ddb9d04

OyvindTafjord merged commit 829f1d6 into main Apr 25, 2024

OyvindTafjord deleted the add-ce-loss-metric branch April 25, 2024 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add ce_loss metric and TriviaQA/NaturalQuestions tasks#520

Add ce_loss metric and TriviaQA/NaturalQuestions tasks#520
OyvindTafjord merged 9 commits intomainfrom
add-ce-loss-metric

OyvindTafjord commented Mar 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

OyvindTafjord commented Mar 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants