Affinity Map: Few-Shot Protein Classification

Affinity Map is a meta-learning framework designed to classify proteins into functional families using only a handful of examples ($K$-shot learning). By leveraging Prototypical Networks and ESM-2 Protein Language Models, this project enables the annotation of rare or novel protein sequences where traditional HMM-based methods (like Pfam) fail due to data scarcity.

Key Highlights

State-of-the-Art Foundation Models: Utilizes Meta’s ESM-2 (8M to 650M parameters) as a sequence encoder.
Novel Research Insight: Discovered a $K$-dependent interaction where LoRA (Low-Rank Adaptation) episodic fine-tuning improves single-shot ($K=1$) accuracy by +2.5% but requires specific regularization for multi-shot scenarios.
Rigorous Benchmarking: Evaluated against BLAST (bioinformatics gold standard) and k-mer compositional baselines.

Methodology

The pipeline treats protein classification as an episodic task:

Encoding: Raw amino acid sequences are embedded into a high-dimensional metric space.
Prototyping: A "Class Prototype" is calculated as the mean embedding of $K$ support sequences.
Classification: Query sequences are assigned to the family of the nearest prototype via Cosine Similarity.

Model Tiers Evaluated:

Encoder	Params	Accuracy (5-way 5-shot)
1D-CNN (From Scratch)	228K	71.0%
k-mer ProtoNet	N/A	86.2%
ESM-2 (Frozen)	8M	88.7%
ESM-2 + LoRA	8M + 61K	91.3% ($K=1$ Optimized)

Results & Visualization

The model learns a biologically meaningful embedding space where proteins cluster by structural and functional similarity.

Top : PCA projection of protein embeddings. Bottom: Confusion matrix showing structural overlaps between families like Immunoglobulins and Cupins.

Installation & Usage

1. Setup Environment

git clone https://github.com/mderaznasr/Protein-fewshot.git
cd Protein-fewshot
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Run Inference / Evaluation

# Evaluate the best ESM-2 LoRA checkpoint
python3 script/run_experiments.py --model esm2_lora --k_shot 5

📄 Documentation & Paper

For a deep dive into the mathematical framework and statistical significance tests, see the full paper:

Paper: paper/affinity_map_paper.pdf
Tech Stack: PyTorch, HuggingFace (Peft/Transformers), Biopython, Scikit-learn, UMAP, Streamlit.

Developed by Mohammed El-Raznasr at Georgia Institute of Technology.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
checkpoints		checkpoints
colab		colab
data		data
memory		memory
models		models
notebooks		notebooks
paper		paper
results		results
script		script
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
README2.md		README2.md
app.py		app.py
debug_episode.py		debug_episode.py
protein_fewshot.zip		protein_fewshot.zip
requirements-ui.txt		requirements-ui.txt
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
test.py		test.py
train_protonet.py		train_protonet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Affinity Map: Few-Shot Protein Classification

Key Highlights

Methodology

Model Tiers Evaluated:

Results & Visualization

Installation & Usage

1. Setup Environment

2. Run Inference / Evaluation

📄 Documentation & Paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Affinity Map: Few-Shot Protein Classification

Key Highlights

Methodology

Model Tiers Evaluated:

Results & Visualization

Installation & Usage

1. Setup Environment

2. Run Inference / Evaluation

📄 Documentation & Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages