CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

Yating Liu^*, Yujie Zhang^*, Ziyu Shan, Yiling Xu^†

Shanghai Jiao Tong University

^* Equal Contribution ^†Corresponding author

AAAI2025

🚩 Introduction

In recent years, No-Reference Point Cloud Quality Assessment (NR-PCQA) research has achieved significant progress. However, existing methods mostly seek a direct mapping function from visual data to the Mean Opinion Score (MOS), which is contradictory to the mechanism of practical subjective evaluation. To address this, we propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions (e.g., "excellent" and "poor") rather than specific scores, we adopt a retrieval-based mapping strategy to simulate the process of subjective assessment. More specifically, based on the philosophy of CLIP, we calculate the cosine similarity between the visual features and multiple textual features corresponding to different quality descriptions, in which process an effective contrastive loss and learnable prompts are introduced to enhance the feature extraction. Meanwhile, given the personal limitations and bias in subjective experiments, we further covert the feature similarities into probabilities and consider the Opinion Score Distribution (OSD) rather than a single MOS as the final target. Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.

🛠️ Installation

All experiments are conducted on Ubuntu 20.04 and CUDA 12.4.

conda create --name clip_pcqa python=3.8

conda activate clip_pcqa

git clone https://github.com/Olivialyt/CLIP-PCQA.git

cd CLIP-PCQA

pip install -r requirements.txt

📦 Data Preparation

We provide the download links of the projected images, which can be accessed here (BaiduYunpan, OneDrive).

You can also run get_projections.py to generate the projected color maps and depth maps. Note that you need to install pytorch3d to use this script.

By unzipping the files, you should get the file structure like:

├── LS-PCQA_maps
│   ├── 6view
│   │   ├── Wood_Octree_3_view_0.png
│   │   ├── Wood_Octree_3_view_1.png
│   │   ├── Wood_Octree_3_view_2.png
...
│   ├── 6view_depth
│   │   ├── Wood_Octree_3_view_0.png
│   │   ├── Wood_Octree_3_view_1.png
│   │   ├── Wood_Octree_3_view_2.png
...

Given the limited size of the databases, k-fold cross-validation is employed to provide a more accurate estimate of the proposed method's performance. We partition the databases according to content (reference point clouds), and the K-fold Training and Test Set Files are provided in the csvfiles folder.

🚆 Training

Take LS-PCQA for example, you can simply train the CLIP-PCQA by referring to train.sh with the following command:

CUDA_VISIBLE_DEVICES=0,1 python -u main.py \
--learning_rate 0.000004 \
--batch_size  16 \
--database LS_PCQA_part  \
--img_length_read 6 \
--data_dir_color ./dataset/LS-PCQA_maps/6view \
--data_dir_depth ./dataset/LS-PCQA_maps/6view_depth \
--num_epochs 50 \
--k_fold_num 5 \
>> logs/LS_PCQA_part.log

Then change the path of data_dir_color and data_dir_depth to path.../LS-PCQA_maps/6view and path.../LS-PCQA_maps/6view_depth, respectively.

If you want to train the model on the complete database, simply run the provided script train_total.sh.

🧺 Pretrained models

We provide the pretrained models trained on three datasets with available raw opinion scores: SJTU-PCQA, LS-PCQA Part I and BASICS. The pretrained models can be downloaded here (BaiduYunpan, OneDrive).

Using these pretrained models, you can evaluate cross-database generalizability by running cross_validation.py.

📈 Results

Metric	SJTU-PCQA	WPC	LS-PCQA	BASICS
PLCC	0.956	0.894	0.755	0.932
SRCC	0.936	0.890	0.736	0.872
RMSE	0.693	10.112	0.533	0.382

📖 Acknowlegement

This repo is built upon several opensourced codebases, shout out to them for their amazing works.

(MM-PCQA)
(CoOp)
(CLIP)

🔖 Citation

If you find this work useful in your research, please cite

@article{liu2025clip,
  title={CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  author={Liu, Yating and Zhang, Yujie and Shan, Ziyu and Xu, Yiling},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/32607},
  year={2025},
  month={Apr.},
  volume={39},
  number={6},
  pages={5694-5702}
}

🔍 Bugs

If you find any bugs in this repo, please let me know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

🚩 Introduction

🛠️ Installation

📦 Data Preparation

🚆 Training

🧺 Pretrained models

📈 Results

📖 Acknowlegement

🔖 Citation

🔍 Bugs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
clip		clip
csvfiles		csvfiles
logs		logs
models		models
utils		utils
README.md		README.md
cross_validation.py		cross_validation.py
get_projections.py		get_projections.py
main.py		main.py
requirements.txt		requirements.txt
train.sh		train.sh
train_total.py		train_total.py
train_total.sh		train_total.sh

Folders and files

Latest commit

History

Repository files navigation

CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

🚩 Introduction

🛠️ Installation

📦 Data Preparation

🚆 Training

🧺 Pretrained models

📈 Results

📖 Acknowlegement

🔖 Citation

🔍 Bugs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages