This repository is the official PyTorch implementation for the AAAI 2026 paper "DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning", its method named DiCaP.
Semi-supervised multi-label learning (SSMLL) aims to address the challenge of limited labeled data availability in multi-label learning (MLL) by leveraging unlabeled data to improve the model’s performance. While pseudo-labeling has become a dominant strategy in SSMLL, most existing methods assign equal weights to all pseudo-labels regardless of their quality, which can amplify the impact of noisy or uncertain predictions and degrade the overall performance. In this paper, we theoretically verify that the optimal weight for a pseudo-label should reflect its correctness likelihood. Empirically, we observe that on the same dataset, the correctness likelihood distribution of unlabeled data remains stable, even as the number of labeled training samples varies. Building on this insight, we propose Distribution-Calibrated Pseudo-labeling (DiCaP), a correctness-aware framework that estimates posterior precision to calibrate pseudo-label weights. We further introduce a dual-thresholding mechanism to separate confident and ambiguous regions: confident samples are pseudo-labeled and weighted accordingly, while ambiguous ones are explored by unsupervised contrastive learning. Experiments conducted on multiple benchmark datasets verify that our method achieves consistent improvements, surpassing state-of-the-art methods by up to 4.27%.
The code requires python>=3.8 and the following packages.
numpy==2.3.2
pandas==1.4.2
Pillow==11.3.0
randaugment==1.0.2
scikit_learn==1.7.1
termcolor==3.1.0
timm==1.0.19
torch==2.0.1
torchvision==0.15.2
These packages can be installed directly by running the following command:
pip install -r requirements.txt
Note that all the experiments are conducted under one single RTX 3090, so the results may be a little different with the original paper when you use a different gpu.
See the "README.md" file in the "data" directory for instructions on downloading and setting up the datasets.
In order to reproduce the results of DiCaP, you need to change the hyper-parameters in the bash scripts (./script/run.sh) as follows.
device_id=6
dataset_dir='./data'
for lb_ratio in 0.05
do
for dataset_name in 'voc'
do
CUDA_VISIBLE_DEVICES=$device_id python warm_up.py \
--dataset_name $dataset_name --dataset_dir $dataset_dir --lb_ratio $lb_ratio \
--net resnet50 --loss_lb asl --warmup_epochs 12 --lr 1e-4 --output ours
CUDA_VISIBLE_DEVICES=$device_id python main.py \
--dataset_name $dataset_name --dataset_dir $dataset_dir --lb_ratio $lb_ratio \
--net resnet50 --loss_lb asl --warmup_epochs 12 --lr 1e-4 \
--output ours --method main
CUDA_VISIBLE_DEVICES=$device_id python fine_tune.py \
--dataset_name $dataset_name --dataset_dir $dataset_dir --lb_ratio $lb_ratio \
--net resnet50 --loss_lb asl --output ours \
--method main --FT_method fine_tune --FT_lr 1e-4 --FT_epochs 20
done
doneFor example, the above provides how to run the results of DiCaP on voc with 0.05 labeled ratio, please change the value of dataset_dir as yours and run the following command:
bash ./script/run.shIf you find DiCaP helpful, please cite:
@inproceedings{DICAP,
title={DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning},
author={Bo Han and Zhuoming Li and Xiaoyu Wang and Yaxin Hou and Hui Liu and Junhui Hou and Yuheng Jia},
year={2026},
booktitle={{AAAI}}
}This codebase refers to D2L [link], DESP [link], BBAM [link], thank them!