Skip to content

McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models

License

Notifications You must be signed in to change notification settings

VelikayaScarlet/McBE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

😈McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models😇

Static Badge

Tian Lan1,2,3, Xiangdong Su*1,2,3, Xu Liu1,2,3, Ruirui Wang1,2,3, Ke Chang1,2,3, Jiang Li1,2,3, Guanglai Gao1,2,3

1College of Computer Science, Inner Mongolia University, China 
2National & Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian, China 
3Inner Mongolia Key Laboratory of Multilingual Artiffcial Intelligence Technology, China 

* corresponding author

MCBE

Paper: McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models

Dataset: https://huggingface.co/datasets/Velikaya/McBE

Code: https://github.com/VelikayaScarlet/McBE

📜Abstract

🚀Dataset Description

McBE is designed to address the scarcity of Chinese-centric bias evaluation resources for large language models (LLMs). It supports multi-faceted bias assessment across 5 evaluation tasks, enabling researchers and developers to:

Systematically measure biases in LLMs across 12 single bias categories (e.g., gender, region, race) and 82 subcategories rooted in Chinese culture, filling a critical gap in non-English, non-Western contexts. Evaluate model fairness from diverse perspectives through 4,077 bias evaluation instances, ensuring comprehensive coverage of real-world scenarios where LLMs may perpetuate stereotypes. Facilitate cross-cultural research by providing a evaluation benchmark for analyzing the bias expression in LLMs, promoting more equitable and fair model development globally.

Curated by: College of Computer Science and National & Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian at Inner Mongolia University

🔬Dependencies

tqdm
zhipuai
openai
transformers
pandas
itertools
torch
modelscope
openpyxl

💯How to Run a Evaluation?

  1. Open utils.py and fill in your GLM4-AIR API key on line 9. You can also use other LLMs to serve as LLM Judge.
  2. Open load_model.py and replace model_dir with the path to your models in lines 6–12.
  3. Open eval.py and update the path parameter to your local directory. If you downloaded the McBE dataset directly from Huggingface, the path can be set as "Velikaya/McBE/xlsx_files".
  4. Edit the categories list in eval.py to specify which bias categories to evaluate:
categories = [
    "test",  # Add categories you want to test
    # Example: "age", "gender", "race", etc.
]
  1. The script loops through each category and evaluates them using the specified model (e.g., "qwen2"). You can modify the model name in the function calls:
for c in categories:
    print(c)
    preference_computation(c, "qwen2")  # Replace "qwen2" with your model
    classification(c, "qwen2")
    scenario_selection(c, "qwen2")
    bias_analysis(c, "qwen2")
    bias_scoring(c, "qwen2")
]

6.Run the eval.py

About

McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages