Finetuning Baichuan2 for Story Generation

We fine-tuned Baichuan2 on a corpus consisting of 14988 short stories from STORAL, 6500 news from THUCNews, 919 documents from WikiPedia, and 27 novels from modern Chinese literature.

Installation

$ git clone git@github.com:xgao922/Baichuan2-finetuning.git
$ pip install -r requirements.txt

Data preprocess

The training data should be placed at /data, and we preprocess the corpus by removing the punctuation and carefully dealing with the blanks.

$ cd /scripts
$ python preprocess_corpus.py

Training

$ cd /fine-tune
$ bash train.sh

Inference

$ cd /inference
$ bash run_predict.sh

Evaluation

The metric used for evaluation is topk.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
fine-tune		fine-tune
inference		inference
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finetuning Baichuan2 for Story Generation

Installation

Data preprocess

Training

Inference

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Finetuning Baichuan2 for Story Generation

Installation

Data preprocess

Training

Inference

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages