Identifying and Classifying Multiple Sourced and Categorized Self-Admitted Technical Debts: A Pipeline

This repository contains the models, dataset and experimental code mentioned in the paper. Specifically, experimental code includes the implementation code of our pipeline and the process of training, the dataset includes the preprocessed complete dataset used for training and testing, and models includes our models in pipeline and RQ1-3.

dataset and models

Dataset&Models	Description	Link
Dataset-satd_aug	Total augmented dataset including all SATD sentence, instruction and categories.	Link1
Model-glm4-9b-chat-sft-9class	Model training to classify all SATD sentences into 9 categories.	Link2
Model-glm4-9b-chat-sft	Model training to apply in pipeline to classify isSATD sentences into 8 categories.	Link3
Model-satd-glm4-9b-chat-sft-noaug	Model training to classify isSATD sentences into 8 categories with data not augmented.	Link4
Model-MT-Bert	Model traing to identify all SATD sentences into 2 categories(isSATD or nonSATD)	Link4

code

The following is an introduction to code to make it easier for readers to use.

The files main_pipeline_0shot.py and main_pipeline_fewshot.py are the main files to run our pipeline. And The others are tools which used by the two files. The folder bert_config contents the related configuration of our bert model. The folder train_bert contents the training details of our bert model.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
bert_config		bert_config
comparison_work		comparison_work
train_bert		train_bert
K_sample.py		K_sample.py
README.md		README.md
get_isSATD.py		get_isSATD.py
glm_knn.py		glm_knn.py
main_pipline_0shot.py		main_pipline_0shot.py
main_pipline_fewshot.py		main_pipline_fewshot.py
modeling_multitask_predict.py		modeling_multitask_predict.py
process.py		process.py
query_with_nl.py		query_with_nl.py
tokenization.py		tokenization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying and Classifying Multiple Sourced and Categorized Self-Admitted Technical Debts: A Pipeline

dataset and models

code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Identifying and Classifying Multiple Sourced and Categorized Self-Admitted Technical Debts: A Pipeline

dataset and models

code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages