Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
209 changes: 209 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be added to the global gitignore or merged into this project gitignore. For a PyCharm
# project, it is recommended to include the following files:
# .idea/
# *.iml
# *.ipr
# *.iws
.idea/
*.iml
*.ipr
*.iws

# VS Code
.vscode/

# uv
# Note: uv.lock should be committed for reproducible builds
# .uv_cache/ is created by uv and should be ignored
.uv_cache/

# Project-specific ignores
# Large data files that shouldn't be in version control
*.tsv.gz
*.fastq
*.fastq.gz
*.fq
*.fq.gz
*.bam
*.sam
*.vcf
*.vcf.gz

# Output directories
output/
results/
tmp/
temp/

# Log files
*.log

# OS-specific files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Temporary files
*.tmp
*.temp
*~
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
85 changes: 77 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,78 @@ GIANA is written in Python3, with the following dependencies:

After installing these dependencies, please download the latest version of GIANA source code (currently v4), query.py and the associated TRBV allele data (Imgt_Human_TRBV.fasta).

## Installation with uv

This project uses [uv](https://docs.astral.sh/uv/) for fast Python package management. If you don't have uv installed, you can install it with:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### Setting up the environment

1. **Clone the repository and navigate to the project directory:**
```bash
git clone <repository-url>
cd GIANA
```

2. **Create and activate a virtual environment with uv:**
```bash
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```

3. **Install dependencies:**
```bash
uv sync
```

This will install all dependencies specified in `pyproject.toml` and create a lock file (`uv.lock`) for reproducible builds.

### Alternative: Run without activating the environment

You can also run GIANA directly with uv without manually activating the environment:

```bash
uv run python GIANA4.1.py -h
```

### Managing dependencies

- **Add a new dependency:**
```bash
uv add package-name
```

- **Add a development dependency:**
```bash
uv add --dev package-name
```

- **Remove a dependency:**
```bash
uv remove package-name
```

- **Update all dependencies:**
```bash
uv lock --upgrade
```

- **Sync environment with lock file:**
```bash
uv sync
```

## Usage

Type `python GIANA.py -h` to display all the commandline options:
**Note:** This project contains multiple versions of GIANA:
- `GIANA4.1.py` - Latest version (recommended)
- `GIANA4.py` - Previous version
- `GIANAsv.py` - Alternative version

Type `python GIANA4.1.py -h` to display all the commandline options:

|Commands|Description|
|--|--|
Expand Down Expand Up @@ -52,39 +121,39 @@ Input of GIANA is flexible. The first column is kept for CDR3 amino acid sequenc

The following code performs standard TCR clustering on an input data, using the TRBV allele information:

`python GIANA.py -f tutorial.txt`
`python GIANA4.1.py -f tutorial.txt`

GIANA can also be applied to a folder containing a list of input files:

`python GIANA.py -d input_dir/`
`python GIANA4.1.py -d input_dir/`

### 3. Clustering without TRBV variable gene

Even with TRBV gene as the second column, GIANA can perform clustering without the TRBV allele information:

`python GIANA.py -f tutorial.txt -v`
`python GIANA4.1.py -f tutorial.txt -v`

This option is useful when TRBV gene information is not available (in that case, the input data can contain only one column of CDR3s). The user can choose a more stringent cut-off of Smith-Waterman alignment score (higher score is more stringent):

`python GIANA.py -f tutorial.txt -v -S 4`
`python GIANA4.1.py -f tutorial.txt -v -S 4`

### 4. Clustering in non-exact mode

By default, GIANA will run Smith-Waterman clustering for all the pre-clusters identified in the isometric nearest neighbor search, which is referred to as the 'exact' mode. The user can choose to disable the exact mode to gain over 10X computational efficiency, at the cost of less specific TCR clustering:

`python GIANA.py -f tutorial.txt -e`
`python GIANA4.1.py -f tutorial.txt -e`

This mode is not necessary when processing less than 1 million sequences. In non-exact mode, the users are recommended to apply a more stringent isometric distance cut-off to increase specificity:

`python GIANA.py -f tutorial.txt -e -t 5`
`python GIANA4.1.py -f tutorial.txt -e -t 5`

Here smaller `-t` value is more stringenet.

### 5. Running GIANA in query mode

Assume the reference TCR data is ref.txt. After running clustering (for example, mode 2), GIANA produces a cluster file ref--RotationEncodingBL62.txt. Put this file in the same directory as ref.txt. GIANA will automatically search for this file when running in query mode, for example:

`python GIANA.py -q TestReal-ADIRP0000023_TCRB.tsv -r hc10s10.txt -S 3.3 -o tmp/`
`python GIANA4.1.py -q TestReal-ADIRP0000023_TCRB.tsv -r hc10s10.txt -S 3.3 -o tmp/`

The input query file is designated by `-q` option, which also accepts a file directory, if ending with '/'. Reference file is followed by the `-r` option.

Expand Down
13 changes: 13 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[project]
name = "giana"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"biopython>=1.85",
"faiss-cpu>=1.12.0",
"numpy>=2.3.3",
"pandas>=2.3.2",
"scikit-learn>=1.7.2",
]
Loading