[Dataset] Add R-Bench (ICML 2025) by uyzhang · Pull Request #2091 · open-compass/opencompass

uyzhang · 2025-05-11T05:28:24Z

R-Bench PR Description

Motivation

This PR adds support for the R-Bench dataset to OpenCompass. R-Bench is a graduate-level multi-disciplinary benchmark designed to evaluate complex reasoning capabilities of both language models (LLMs) and multimodal language models (MLLMs). By incorporating R-Bench into OpenCompass, we enable comprehensive evaluation of model performance on challenging reasoning tasks across 19 academic disciplines and over 100 subjects, available in both English and Chinese.

Modification

This PR adds the configuration file opencompass/configs/datasets/R-Bench/R-Bench.md which includes:

Detailed introduction of the R-Bench benchmark
Links to the official paper and resources
Current evaluation results from top models (both text-only and multimodal)
Citation information for proper reference

The file follows the standard OpenCompass dataset documentation format, similar to other benchmark configurations like QuALITY.

Use cases

R-Bench can be used to:

Evaluate advanced reasoning capabilities of LLMs across multiple disciplines
Compare model performance on complex graduate-level problems requiring deep reasoning
Test reasoning abilities in both English and Chinese languages
Assess multimodal reasoning through its dedicated multimodal test set
Provide a more challenging benchmark that even state-of-the-art models struggle with (top model achieves only 53.2% on multimodal tasks)

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
The documentation has been modified accordingly, like docstring or example tutorials.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. (Not applicable as this is a new feature)

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

tonysy · 2025-05-14T10:15:27Z

Please update the lint

uyzhang · 2025-05-19T03:34:20Z

Please update the lint

I've updated it, can you help run CI?

tonysy · 2025-06-05T13:10:21Z

Hi, have you tried using OpenCompass to reproduce your reported performance?

tonysy · 2025-06-05T13:11:05Z

Also please check the pre-commit again. Thanks.

uyzhang · 2025-06-05T15:12:23Z

Hi, have you tried using OpenCompass to reproduce your reported performance?

Yes, we conducted the experiment using opencompass and reproduced the previous results on this pr.

tonysy

LGTM

MaiziXiao

Tested. LGTM

* [Dataset] Add R-Bench (ICML 2025) * fixed lint * format rbench.py by isort * rbench fix * r-bench fix * update --------- Co-authored-by: leoyizhang <leoyizhang@tencent.com> Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>

[Dataset] Add R-Bench (ICML 2025)

5d8c96b

uyzhang temporarily deployed to prod May 13, 2025 05:48 — with GitHub Actions Inactive

fixed lint

01af69a

uyzhang temporarily deployed to prod May 28, 2025 06:47 — with GitHub Actions Inactive

format rbench.py by isort

d679be0

uyzhang temporarily deployed to prod June 5, 2025 13:07 — with GitHub Actions Inactive

rbench fix

3fe5f9b

Myhs-phz temporarily deployed to prod June 6, 2025 08:43 — with GitHub Actions Inactive

r-bench fix

4a7e4c2

Myhs-phz temporarily deployed to prod June 6, 2025 09:12 — with GitHub Actions Inactive

tonysy approved these changes Jun 6, 2025

View reviewed changes

tonysy requested review from MaiziXiao, Myhs-phz and liushz June 6, 2025 16:29

update

91c4997

MaiziXiao approved these changes Jun 13, 2025

View reviewed changes

MaiziXiao temporarily deployed to prod June 13, 2025 09:07 — with GitHub Actions Inactive

MaiziXiao merged commit 4f42c12 into open-compass:main Jun 13, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dataset] Add R-Bench (ICML 2025)#2091

[Dataset] Add R-Bench (ICML 2025)#2091
MaiziXiao merged 6 commits intoopen-compass:mainfrom
uyzhang:main

uyzhang commented May 11, 2025

Uh oh!

tonysy commented May 14, 2025

Uh oh!

uyzhang commented May 19, 2025

Uh oh!

tonysy commented Jun 5, 2025

Uh oh!

tonysy commented Jun 5, 2025

Uh oh!

uyzhang commented Jun 5, 2025

Uh oh!

tonysy left a comment

Uh oh!

MaiziXiao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

uyzhang commented May 11, 2025

R-Bench PR Description

Motivation

Modification

Use cases

Checklist

Uh oh!

tonysy commented May 14, 2025

Uh oh!

uyzhang commented May 19, 2025

Uh oh!

tonysy commented Jun 5, 2025

Uh oh!

tonysy commented Jun 5, 2025

Uh oh!

uyzhang commented Jun 5, 2025

Uh oh!

tonysy left a comment

Choose a reason for hiding this comment

Uh oh!

MaiziXiao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants