Arpeggia

This is a port of the Arpeggio library to Rust, with a focus on identifying certain protein-protein interactions in PDB and mmCIF files.

Features

Installation

Python Package (Recommended)

Install using pip:

pip install arpeggia

Or install from source using maturin:

git clone https://github.com/y1zhou/arpeggia.git
cd arpeggia
pip install maturin
maturin develop -v --release --features python

Rust Binary

For the command-line tool, you can install pre-built binaries from the GitHub Releases page, or build from source:

git clone https://github.com/y1zhou/arpeggia.git
cd arpeggia
cargo install --path .

This will install the arpeggia binary to your Cargo binary directory (usually ~/.cargo/bin).

Usage

Python API

import arpeggia

# Analyze protein contacts
contacts_df = arpeggia.contacts(
    "structure.pdb",
    groups="/",                    # All-to-all chain interactions
    vdw_comp=0.1,                 # VdW radii compensation
    dist_cutoff=6.5,              # Distance cutoff in Ångströms
    ignore_zero_occupancy=False   # Set True to ignore zero occupancy atoms
)
print(f"Found {len(contacts_df)} contacts")
print(contacts_df.head())

# Calculate solvent accessible surface area
# Atom-level (default)
sasa_df = arpeggia.sasa("structure.pdb", level="atom", probe_radius=1.4, n_points=100, model_num=0)
print(f"Calculated SASA for {len(sasa_df)} atoms")

# Residue-level SASA
residue_sasa = arpeggia.sasa("structure.pdb", level="residue")
print(f"Calculated SASA for {len(residue_sasa)} residues")

# Chain-level SASA for specific chains only
chain_sasa = arpeggia.sasa("structure.pdb", level="chain", chains="A,B")
print(f"Calculated SASA for chains A and B")

# Calculate relative SASA (RSA) normalized by Tien et al. (2013) MaxASA values
rsa_df = arpeggia.relative_sasa("structure.pdb")
print(f"Calculated RSA for {len(rsa_df)} residues")

# Calculate Spatial Aggregation Propensity (SAP) scores for aggregation prediction
sap_df = arpeggia.sap_score("antibody.pdb", level="residue")
print(f"Calculated SAP for {len(sap_df)} residues")

# SAP for specific chains (e.g., antibody heavy and light chains)
sap_hl = arpeggia.sap_score("antibody.pdb", chains="H,L", sap_radius=5.0)
print(f"Calculated SAP for H and L chains")

# Calculate buried surface area at the interface
bsa = arpeggia.dsasa("structure.pdb", groups="A,B/C,D")
print(f"Buried surface area: {bsa:.2f} Å²")

# Calculate Shape Complementarity at an interface
sc_score = arpeggia.sc("antibody_antigen.pdb", groups="H,L/A")
print(f"Shape Complementarity: {sc_score:.3f}")  # Typical values: 0.5-0.7

# Extract protein sequences
sequences = arpeggia.pdb2seq("structure.pdb")
for chain_id, seq in sequences.items():
    print(f"Chain {chain_id}: {seq}")

The functions return Polars DataFrames for efficient data manipulation. You can easily convert to pandas if needed:

import polars as pl

# Convert to pandas
contacts_pd = contacts_df.to_pandas()

# Or save directly to various formats
contacts_df.write_csv("contacts.csv")
contacts_df.write_parquet("contacts.parquet")

Command-Line Interface

The CLI provides the same functionality:

# Analyze contacts
arpeggia contacts -i structure.pdb -o output_dir -g "A,B/C,D" -t csv

# Analyze contacts, ignoring atoms with zero occupancy
arpeggia contacts -i structure.pdb -o output_dir --ignore-zero-occupancy

# Calculate SASA at different levels (atom, residue, chain)
arpeggia sasa -i structure.pdb -o output_dir --level atom
arpeggia sasa -i structure.pdb -o output_dir --level residue
arpeggia sasa -i structure.pdb -o output_dir --level chain

# Calculate SASA for specific chains only
arpeggia sasa -i structure.pdb -o output_dir --level residue --chains "A,B"

# Calculate relative SASA (RSA) for each residue
arpeggia relative-sasa -i structure.pdb -o output_dir

# Calculate SAP scores for aggregation prediction
arpeggia sap -i antibody.pdb -o output_dir --level residue

# Calculate SAP for specific chains (e.g., antibody H and L chains)
arpeggia sap -i antibody.pdb -o output_dir --chains "H,L"

# Calculate buried surface area at the interface
arpeggia dsasa -i structure.pdb -g "A,B/C,D"

# Calculate Shape Complementarity at an interface
arpeggia sc -i antibody_antigen.pdb -g "H,L/A"

# Extract sequences
arpeggia seq structure.pdb

To see all available options:

arpeggia help
arpeggia contacts --help

Chain Groups Specification

The groups parameter allows you to specify which chains interact with each other:

"/" - All chains interact with all chains (including self)
"A,B/C,D" - Chains A,B interact with chains C,D
"A/" - Chain A interacts with all other chains
"A,B/" - Chains A,B interact with all remaining chains

Development

To build the Python package in development mode:

pip install maturin polars
maturin develop -v --release --features python
python python/test_arpeggia.py

To run Rust tests:

cargo test

License

MIT License - see LICENSE file for details.

Credit

This project would not be possible without the following resources:

Arpeggio: Original Python library for protein-protein interaction analysis.
pdbtbx: The structural file parser doing all the heavy lifting.
RustSASA: Library for calculating solvent accessible surface area.
sc-rs: Library for calculating the Shape Complementarity by Lawrence & Colman (1993).
Rosetta: Where the Spatial Aggregation Propensity (SAP) score calculations are inspired from.

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
.github/workflows		.github/workflows
python		python
src		src
test-data		test-data
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
BUILD.md		BUILD.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
Cargo.toml		Cargo.toml
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arpeggia

Features

Installation

Python Package (Recommended)

Rust Binary

Usage

Python API

Command-Line Interface

Chain Groups Specification

Development

License

Credit

About

Uh oh!

Releases 12

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arpeggia

Features

Installation

Python Package (Recommended)

Rust Binary

Usage

Python API

Command-Line Interface

Chain Groups Specification

Development

License

Credit

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Uh oh!

Contributors

Uh oh!

Languages