- Add example for other model (DeepScaleR-1.5B-Preview) and dataset in
docs/guides/eval.md.
(Default eval config is test AIME on Qwen2.5-Math-1.5B-Instruct)
- Test and add eval link in the training guides:
docs/guides/dpo|grpo|sft.md.
I think there is no need to add convert guide in the training guides since it is already in eval guide.
- Support Pass@1 accuracy averaged over n samples.
Link docs/guides/eval.md on the front page readme after all these done.
docs/guides/eval.md.(Default eval config is test AIME on Qwen2.5-Math-1.5B-Instruct)
docs/guides/dpo|grpo|sft.md.I think there is no need to add convert guide in the training guides since it is already in eval guide.
Link
docs/guides/eval.mdon the front page readme after all these done.