Skip to content

opendatalab/MolRecBench-Wild

Repository files navigation

Open Data Lab

MolRecBench-Wild

A Real-World Benchmark for Optical Chemical Structure Recognition

arXiv Dataset License

πŸ‡¨πŸ‡³ δΈ­ζ–‡ζ–‡ζ‘£

Why Molrecbench Wild

πŸ”₯ News

  • πŸš€ [04/07/2026] Our paper is accepted by CVPRF 2026!

πŸ“Š Dataset Statistics

Feature MolRecBench-Wild Traditional Benchmarks (e.g., USPTO, Staker)
Source Academic Articles Patents / Synthetic
Sample Count 5029 Varies (usually larger but simpler)
Visual Difficulty Labels 18 Categories < 10 Categories
Chemical Difficulty Labels 19 Categories (MOSAIC subset) < 3 Categories
Ground Truth CARBON, Graph, SMILES SMILES, MolFile
Complex Structure Support Non-standard bonds, icon groups, mixed valences Standard structures only

CARBON Notation

Example Image

{
  "symbols": ["[R]", "C", "[R']", "C", "C", "H", "C", "[Ar]", "C", "C"], 
  "charges": [null, null, null, null, null, null, null, null, null, null], 
  "radicals": [null, null, null, null, null, null, null, null, null, null], 
  "valences": [null, null, null, null, null, null, null, null, null, null], 
  "isotopes": [null, null, null, null, null, null, null, null, null, null], 
  "attach_points": [null, null, null, null, null, null, null, null, null, null], 
  "coords": [
      [10.8075, -9.3566], 
      [11.7673, -9.3302], 
      [11.4253, -10.2699], 
      [12.6333, -9.8302], 
      [13.4993, -9.3302], 
      [14.3654, -9.8302], 
      [13.4993, -8.3302], 
      [14.3654, -7.8302], 
      [12.6333, -7.8302], 
      [11.7673, -8.3302]
    ], 
  "bonds": [
      [0, 1, 1], 
      [1, 2, 1], 
      [1, 3, 1], 
      [1, 9, 7], 
      [3, 4, 1], 
      [4, 5, 1], 
      [4, 6, 2], 
      [6, 7, 1], 
      [6, 8, 1], 
      [8, 9, 7]
    ], 
  "brackets": [
      {
        "alias": "n", 
        "atoms": [3], 
        "display_rects": [
            [11.9503, -10.0132, 12.4503, -9.1472], 
            [12.8163, -9.1472, 13.3163, -10.0132]
          ]
      }
      ]
}

⚑ Quick Start

Step 1: Setup Environment

git clone https://github.com/your-username/MolRecBench-Wild.git
cd MolRecBench-Wild

# Install dependencies
conda create -n molrec python=3.10 -y
pip install -r requirements.txt

Step 2: Setup VLMEvalKit

We use VLMEvalKit as the inference backend, with minimal patches to add chemistry-specific model adapters and datasets. Our patches are provided in patches/ for full transparency β€” we do not redistribute VLMEvalKit itself.

Run the one-click setup script:

bash setup_vlmevalkit.sh

After setup, create a file named ".env" in the VLMEvalKit directory and configure your API keys:

# VLMEvalKit/.env
OPENAI_API_BASE=https://your-api-base-url
OPENAI_API_KEY=your-api-key

Step 3: Download & Convert Data

Download the dataset from HuggingFace and convert it to VLMEvalKit TSV format in one step:

# Defaulit: download all tracks data
python download_and_convert.py --prompt all             # generate TSV for all three tracks

# Download dataset and convert to SMILES track TSV
python download_and_convert.py --prompt smiles

python download_and_convert.py --prompt smiles --skip-download  # skip download if dataset/ already exists

The script will:

  1. Download images to ./dataset/images/ and save ground truth to ./dataset/annotation.jsonl
  2. Generate TSV files to ./LMUData/
  3. Automatically register the LMUData path in VLMEvalKit/.env so VLMEvalKit can find the TSV files

Step 4: Run Inference

cd VLMEvalKit

# Run a single task (SMILES)
python run.py --data smiles --model GPT4o_20241120

# Run all three tasks at once (SMILES, Simplified Graph, Graph)
python run.py --data smiles simple_graph carbon --model GPT4o_20241120

# Increase parallel API calls for faster inference
python run.py --data smiles --model GPT4o_20241120 --api-nproc 32

# Resume an interrupted run (skip already completed samples)
python run.py --data smiles --model GPT4o_20241120 --reuse

Key arguments:

Argument Description
--data Recognition task to run: SMILES, Simplified Graph, or Graph
--model Model name as defined in vlmeval/config.py
--work-dir Output directory (default: ./outputs)
--api-nproc Number of parallel API calls (default: 4, increase for faster inference)
--reuse Reuse existing prediction files to resume interrupted runs

Prediction results will be saved to VLMEvalKit/outputs/<model_name>/.

Testing with your own model:

To evaluate a custom model, you need to implement a model wrapper in VLMEvalKit. At minimum, create a class with a generate_inner(msgs, dataset=None) method that takes a multi-modal message list and returns the model's prediction string. Then register it in vlmeval/config.py. For details, see the VLMEvalKit Development Guide.

Step 5: Convert Results

VLMEvalKit outputs an XLSX file per run. Convert it to the JSONL format expected by the Evaluator:

# Convert XLSX β†’ Evaluator JSONL
python convert_result.py \
    -i "VLMEvalKit/outputs/GPT4o_20241120/T20260413_G/GPT4o_20241120_chem_smiles.xlsx" \
    -o "results/GPT4o_20241120_chem_smiles.jsonl"

Step 6: Evaluation

After inference, use the Evaluator to compute accuracy on three tracks. The Evaluator takes two JSONL files β€” ground truth and predictions.

Evaluation metrics:

Metric What it compares Description
SMILES Accuracy SMILES strings Converts both GT and prediction to SMILES, then compares canonical SMILES string.
Simplified Graph Accuracy Atom symbols + bond types Graph isomorphism on simplified molecular graph (ignoring charges, radicals, valences, isotopes, attachment point, brackets).
Graph Accuracy CARBON Graph isomorphism on the complete molecular graph including all attributes.

Running evaluation:

python evaluate/eval_SMILES.py --gt_path dataset/annotation.jsonl --pred_path results/GPT4o_20241120_chem_smiles.jsonl
# Output:
# SMILES Precision: 0.0797

python evaluate/eval_S_GRAPH.py --gt_path dataset/annotation.jsonl --pred_path results/GPT4o_20241120_chem_graph_simple.jsonl
# Output:
# Simplified Graph Precision: 0.0374

python evaluate/eval_GRAPH.py --gt_path dataset/annotation.jsonl --pred_path results/GPT4o_20241120_chem.jsonl
# Output:
# SMILES Precision          : 0.0
# Simplified Graph Precision: 0.0344
# Graph Precision           : 0.0298

Benchmark Results

We evaluated 18 mainstream models(The inference results are saved in the results folder), revealing that existing methods suffer significant performance drops in real-world scenarios. Underlined values indicate the best results within each class, and bold values represent the overall best results across all classes.

Method SMILES Simplified Graph Graph
SMILES-based Expert Models
OCSU 6.06 - -
DECIMERv2.2 22.84 - -
Graph-based Expert Models
MolGrapher 20.33 22.81 -
MolNexTR 40.9 34.42 -
MolScribe 41.05 34.74 -
GTR-Mol-VLM 40.43 35.22 -
Vision Language Models
GPT-4o 7.94 3.74 2.94
Qwen-VL-Max 6.95 5.83 3.66
InternVL3.5 25.6 6.88 3.08
ChemVLM† 4.79 - -
ChemDFM-X† 9.75 - -
Vision Reasoning Models
GPT-5 19.68 10.0 8.19
Seed1.6-Thinking 15.6 7.14 4.61
Intern-S1 18.98 6.62 3.46
Gemini 2.5 Pro 30.06 15.67 13.04
GLM-4.5V 12.13 7.89 4.26
Tools
Mathpix 27.88 - -
Logics-Parsing 15.47 - -

Please refer to the paper for complete results.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors