Skip to content

OneIG-Bench might be a better image generation benchmark? #62

@wchengad

Description

@wchengad

DPG-Bench introduced dense prompt evaluation for text-to-image (T2I) model benchmarking, becoming one of the most widely used benchmarks in this field. However, as better image generation models continuously emerge and improve, model performance evaluation needs to extend beyond just dense prompts. Aspects such as stylization, text rendering, reasoning, multilingual support, and more now require detailed evaluation.

To address this, in the newly proposed OneIG-Bench (https://arxiv.org/abs/2506.07977), the authors conduct an Omni-dimensional Nuanced Evaluation for the Image Generation task.

Key Features of OneIG-Bench:

  1. Comprehensive Prompt Sets:

    • Six specialized categories:
      • 245 Anime & Stylization prompts (EN/ZH)
      • 244 Portrait prompts (EN/ZH)
      • 206 General Object prompts (EN/ZH)
      • 200 Text Rendering prompts (EN/ZH)
      • 225 Knowledge & Reasoning prompts (EN/ZH)
      • 200 Multilingualism prompts
    • Bilingual coverage: First five sets available in both English and Chinese
    • Designed for holistic evaluation of modern text-to-image models
  2. Systematic Quantitative Framework:

    • Enables objective capability ranking via standardized metrics
    • Ensures direct cross-model comparability
    • Dimension-specific evaluation protocol:
      • Models generate images only for prompts within one evaluation dimension
      • Performance assessed exclusively within that targeted dimension

Here are the evaluation visualization of the most representative SOTA T2I models⬇️

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions