Skip to content

Commit e70a45e

Browse files
committed
final commit
1 parent 7c93c13 commit e70a45e

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

eval/chat_benchmarks/AIME24/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from typing import Dict, List
22

33
import datasets
4-
from lm_eval.tasks.hendrycks_math.utils import is_equiv
4+
from lm_eval.tasks.hendrycks_math.utils import remove_boxed, last_boxed_only_string, is_equiv
55

66
def extract_answer(output: str) -> str:
77
'''

eval/chat_benchmarks/AMC23/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from typing import Dict, List
22

33
import datasets
4-
from lm_eval.tasks.hendrycks_math.utils import is_equiv
4+
from lm_eval.tasks.hendrycks_math.utils import remove_boxed, last_boxed_only_string, is_equiv
55

66
def extract_answer(output: str) -> str:
77
'''

reproduced_benchmarks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,5 +51,5 @@
5151
| ZeroEval | Negin | meta-llama/Llama-3.1-8B-Instruct |crux | 40.75 | 39.88
5252
| | | |math-l5 | 24.69 | 22.19
5353
| | | |zebra | 11.70 | 12.8
54-
| AMC23 | Hritik | Qwen/Qwen2.5-7B-Instruct | | 20/40 | 24/40 |
55-
| AIME24 | Hritik | Qwen/Qwen2.5-7B-Instruct | | 4/30 | 3/30 |
54+
| AMC23 | Hritik | Qwen/Qwen2.5-7B-Instruct | | 21/40 | 24/40 |
55+
| AIME24 | Hritik | Qwen/Qwen2.5-7B-Instruct | | 3/30 | 3/30 |

0 commit comments

Comments
 (0)