MolRecBench-Wild

A Real-World Benchmark for Optical Chemical Structure Recognition

🔥 News

🚀 [04/07/2026] Our paper is accepted by CVPRF 2026!

📊 Dataset Statistics

Feature	MolRecBench-Wild	Traditional Benchmarks (e.g., USPTO, Staker)
Source	Academic Articles	Patents / Synthetic
Sample Count	5029	Varies (usually larger but simpler)
Visual Difficulty Labels	18 Categories	< 10 Categories
Chemical Difficulty Labels	19 Categories (MOSAIC subset)	< 3 Categories
Ground Truth	CARBON, Graph, SMILES	SMILES, MolFile
Complex Structure Support	Non-standard bonds, icon groups, mixed valences	Standard structures only

CARBON Notation

{
  "symbols": ["[R]", "C", "[R']", "C", "C", "H", "C", "[Ar]", "C", "C"], 
  "charges": [null, null, null, null, null, null, null, null, null, null], 
  "radicals": [null, null, null, null, null, null, null, null, null, null], 
  "valences": [null, null, null, null, null, null, null, null, null, null], 
  "isotopes": [null, null, null, null, null, null, null, null, null, null], 
  "attach_points": [null, null, null, null, null, null, null, null, null, null], 
  "coords": [
      [10.8075, -9.3566], 
      [11.7673, -9.3302], 
      [11.4253, -10.2699], 
      [12.6333, -9.8302], 
      [13.4993, -9.3302], 
      [14.3654, -9.8302], 
      [13.4993, -8.3302], 
      [14.3654, -7.8302], 
      [12.6333, -7.8302], 
      [11.7673, -8.3302]
    ], 
  "bonds": [
      [0, 1, 1], 
      [1, 2, 1], 
      [1, 3, 1], 
      [1, 9, 7], 
      [3, 4, 1], 
      [4, 5, 1], 
      [4, 6, 2], 
      [6, 7, 1], 
      [6, 8, 1], 
      [8, 9, 7]
    ], 
  "brackets": [
      {
        "alias": "n", 
        "atoms": [3], 
        "display_rects": [
            [11.9503, -10.0132, 12.4503, -9.1472], 
            [12.8163, -9.1472, 13.3163, -10.0132]
          ]
      }
      ]
}

⚡ Quick Start

Step 1: Setup Environment

git clone https://github.com/your-username/MolRecBench-Wild.git
cd MolRecBench-Wild

# Install dependencies
conda create -n molrec python=3.10 -y
pip install -r requirements.txt

Step 2: Setup VLMEvalKit

We use VLMEvalKit as the inference backend, with minimal patches to add chemistry-specific model adapters and datasets. Our patches are provided in patches/ for full transparency — we do not redistribute VLMEvalKit itself.

Run the one-click setup script:

bash setup_vlmevalkit.sh

After setup, create a file named ".env" in the VLMEvalKit directory and configure your API keys:

# VLMEvalKit/.env
OPENAI_API_BASE=https://your-api-base-url
OPENAI_API_KEY=your-api-key

Step 3: Download & Convert Data

Download the dataset from HuggingFace and convert it to VLMEvalKit TSV format in one step:

# Defaulit: download all tracks data
python download_and_convert.py --prompt all             # generate TSV for all three tracks

# Download dataset and convert to SMILES track TSV
python download_and_convert.py --prompt smiles

python download_and_convert.py --prompt smiles --skip-download  # skip download if dataset/ already exists

The script will:

Download images to ./dataset/images/ and save ground truth to ./dataset/annotation.jsonl
Generate TSV files to ./LMUData/
Automatically register the LMUData path in VLMEvalKit/.env so VLMEvalKit can find the TSV files

Step 4: Run Inference

cd VLMEvalKit

# Run a single task (SMILES)
python run.py --data smiles --model GPT4o_20241120

# Run all three tasks at once (SMILES, Simplified Graph, Graph)
python run.py --data smiles simple_graph carbon --model GPT4o_20241120

# Increase parallel API calls for faster inference
python run.py --data smiles --model GPT4o_20241120 --api-nproc 32

# Resume an interrupted run (skip already completed samples)
python run.py --data smiles --model GPT4o_20241120 --reuse

Key arguments:

Argument	Description
`--data`	Recognition task to run: SMILES, Simplified Graph, or Graph
`--model`	Model name as defined in `vlmeval/config.py`
`--work-dir`	Output directory (default: `./outputs`)
`--api-nproc`	Number of parallel API calls (default: 4, increase for faster inference)
`--reuse`	Reuse existing prediction files to resume interrupted runs

Prediction results will be saved to VLMEvalKit/outputs/<model_name>/.

Testing with your own model:

To evaluate a custom model, you need to implement a model wrapper in VLMEvalKit. At minimum, create a class with a generate_inner(msgs, dataset=None) method that takes a multi-modal message list and returns the model's prediction string. Then register it in vlmeval/config.py. For details, see the VLMEvalKit Development Guide.

Step 5: Convert Results

VLMEvalKit outputs an XLSX file per run. Convert it to the JSONL format expected by the Evaluator:

# Convert XLSX → Evaluator JSONL
python convert_result.py \
    -i "VLMEvalKit/outputs/GPT4o_20241120/T20260413_G/GPT4o_20241120_chem_smiles.xlsx" \
    -o "results/GPT4o_20241120_chem_smiles.jsonl"

Step 6: Evaluation

After inference, use the Evaluator to compute accuracy on three tracks. The Evaluator takes two JSONL files — ground truth and predictions.

Evaluation metrics:

Metric	What it compares	Description
SMILES Accuracy	SMILES strings	Converts both GT and prediction to SMILES, then compares canonical SMILES string.
Simplified Graph Accuracy	Atom symbols + bond types	Graph isomorphism on simplified molecular graph (ignoring charges, radicals, valences, isotopes, attachment point, brackets).
Graph Accuracy	CARBON	Graph isomorphism on the complete molecular graph including all attributes.

Running evaluation:

python evaluate/eval_SMILES.py --gt_path dataset/annotation.jsonl --pred_path results/GPT4o_20241120_chem_smiles.jsonl
# Output:
# SMILES Precision: 0.0797

python evaluate/eval_S_GRAPH.py --gt_path dataset/annotation.jsonl --pred_path results/GPT4o_20241120_chem_graph_simple.jsonl
# Output:
# Simplified Graph Precision: 0.0374

python evaluate/eval_GRAPH.py --gt_path dataset/annotation.jsonl --pred_path results/GPT4o_20241120_chem.jsonl
# Output:
# SMILES Precision          : 0.0
# Simplified Graph Precision: 0.0344
# Graph Precision           : 0.0298

Benchmark Results

We evaluated 18 mainstream models(The inference results are saved in the results folder), revealing that existing methods suffer significant performance drops in real-world scenarios. Underlined values indicate the best results within each class, and bold values represent the overall best results across all classes.

Method	SMILES	Simplified Graph	Graph
SMILES-based Expert Models
OCSU	6.06	-	-
DECIMERv2.2	22.84	-	-
Graph-based Expert Models
MolGrapher	20.33	22.81	-
MolNexTR	40.9	34.42	-
MolScribe	41.05	34.74	-
GTR-Mol-VLM	40.43	35.22	-
Vision Language Models
GPT-4o	7.94	3.74	2.94
Qwen-VL-Max	6.95	5.83	3.66
InternVL3.5	25.6	6.88	3.08
ChemVLM†	4.79	-	-
ChemDFM-X†	9.75	-	-
Vision Reasoning Models
GPT-5	19.68	10.0	8.19
Seed1.6-Thinking	15.6	7.14	4.61
Intern-S1	18.98	6.62	3.46
Gemini 2.5 Pro	30.06	15.67	13.04
GLM-4.5V	12.13	7.89	4.26
Tools
Mathpix	27.88	-	-
Logics-Parsing	15.47	-	-

Please refer to the paper for complete results.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
demo_scripts		demo_scripts
evaluate		evaluate
inference		inference
patches		patches
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
convert_result.py		convert_result.py
download_and_convert_dataset.py		download_and_convert_dataset.py
requirements.txt		requirements.txt
setup_vlmevalkit.sh		setup_vlmevalkit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MolRecBench-Wild

🔥 News

📊 Dataset Statistics

CARBON Notation

⚡ Quick Start

Step 1: Setup Environment

Step 2: Setup VLMEvalKit

Step 3: Download & Convert Data

Step 4: Run Inference

Step 5: Convert Results

Step 6: Evaluation

Benchmark Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MolRecBench-Wild

🔥 News

📊 Dataset Statistics

CARBON Notation

⚡ Quick Start

Step 1: Setup Environment

Step 2: Setup VLMEvalKit

Step 3: Download & Convert Data

Step 4: Run Inference

Step 5: Convert Results

Step 6: Evaluation

Benchmark Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages