Project for this article

Scripts used for the edition of fasta files made with python. The s_trim_fasta_seq.py is the original script used in the publication. The GUI_trim_fasta_seq.py is the same script working from a PySide6 interface.

Purpose

Change each label in fasta format and remove unwanted sequences for each fasta files contained in a folder

Description

From the website https://www.ncbi.nlm.nih.gov/nuccore/ it is possible to download nucleotide sequences related to a specific gene.

The unwanted sequences are removed if:

1 - The general family name ending by "idae"

2 - The line containing "sp.", "sp", "cf", "cf." or "mitochondrion"

3 - We keep only the 3 longest exemplars of the same type of sequences

Example

Search "Lumbrineris coi" (direct link at https://www.ncbi.nlm.nih.gov/nuccore/?term=Lumbrineris+coi) gives 65 results that can be download by going to: send to: > Complete record > File > FASTA format

The beginning of the file is:

>HQ932670.1 Lumbrineris japonica voucher BIOUG<CAN_:BP2010-346 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial

The script will change every line description to:

>HQ932670_Lumbrineris_japonica

Then all the sequences containing in upper or lowercase "sp.", "sp", "cf", "cf." or "mitochondrion" and/or having family name ending by "idae" are removed. Finally, only the 3 longest sequences belonging to the same name are kept.

Usage

To use this script, you must have Python 3 installed. You can download Python from the official Python website. Once you have Python installed, you can run the script s_trim_fasta_seq.py from the command line. Here is an example usage, in the shell or terminal type:

python Path_to_script/script.py Path_to_folder_to_be_processed

To run the GUI script, you need to install PySide6

Every FASTA files in the selected folder will be processed and the corresponding files with trimmed and removed sequences will be created in the same folder

Contributing

If you would like to contribute to this script, please feel free to submit a pull request or write at thomas.guilment@gmail.com.

Author: Thomas Guilment Contributor: Rannyele Passos Ribeiro

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
GUI_trim_fasta_seq.py		GUI_trim_fasta_seq.py
LICENSE		LICENSE
README.md		README.md
s_trim_fasta_seq.py		s_trim_fasta_seq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project for this article

Purpose

Description

Example

Usage

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project for this article

Purpose

Description

Example

Usage

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages