Skip to content

smiles724/Molformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Molformer

Introduction

This is the repository for our Molformer.

drawing

Intsallation

# Install packages
pip install pytorch scikit-learn mendeleev
pip install rdkit-pypi

Dataset

We test our model in three different domains: quantum chemistry, physiology, and biophysics. We also provide information on datasets regarding the material science used in the preceding 3D-Transformer. You can download the raw datasets from the following links.

Quantum Chemistry

Physiology

Biophysics

Material Science

  • COREMOF
    Download (Baidu Drive): https://pan.baidu.com/s/12N8gM8_TQ1mpBGx6gdkAog (password:l41s)
    Reproduction of PointNet++: python coremof/reproduce/main_pn_coremof.py
    Reproduction of MPNN: python coremof/reproduce/main_mpnn_coremof.py
    Repredoction of SchNet:
    1. load COREMOF python coremof/reproduce/main_sch_coremof.py
    2. run SchNet spk_run.py train schnet custom ../../coremof.db ./coremof --split 900 100 --property LCD --features 16 --batch_size 20 --cuda
      (Note: official script of Schnet cannot be reproduced successfully due to the memory limitation.)

Models

models/tr_spe: 3D-Transformer with Sinusoidal Position Encoding (SPE)
models/tr_cpe: 3D-Transformer with Convolutional Position Encoding (CPE)
models/tr_msa: 3D-Transformer with Multi-scale Self-attention (MSA)
models/tr_afps: 3D-Transformer with Attentive Farthest Point Sampling (AFPS)
models/tr_full: 3D-Transformer with CPE + MSA + AFPS

Quick Tour

Model Usage

After processing the dataset, it is time to establish the model. Suppose there are N types of atoms, and n downstream multi-tasks. If you only need to predict a single property, set n = 1. For multi-scale self-attention, a dist_bar is needed to define the different scales of local regions, such as dist_bar=[1, 3, 5]. You can also specify the number of attention heads, the number of encodes, the dimension size, the dropout rate, etc. There, we only adopt the defaults.

>>> import torch 
>>> from model.tr_spe import build_model
 
# initialize the model 
>>> model = build_model(N, n).cuda()

# Take a 4-atom molecule for example
>>> x = torch.tensor([[1, 1, 6, 8]]).cuda()
>>> pos = torch.tensor([[[7.356203877, 9.058198382, 3.255188164],
                         [5.990730587, 3.951633382, 9.784664946],
                         [1.048332315, 3.912215133, 9.827313903],
                         [2.492201352, 9.097616820, 3.297837121]]]).cuda()
>>> mask = (x != 0).unsqueeze(1)
>>> out = model(x.long(), mask, pos)
>>> import torch 
>>> from model.tr_msa import build_model
 
# Initialize the model 
>>> model = build_model(N, n, dist_bar).cuda()

# Take a 4-atom molecule for example
>>> x = torch.tensor([[1, 1, 6, 8]]).cuda()
>>> pos = torch.tensor([[[7.356203877, 9.058198382, 3.255188164],
                         [5.990730587, 3.951633382, 9.784664946],
                         [1.048332315, 3.912215133, 9.827313903],
                         [2.492201352, 9.097616820, 3.297837121]]]).cuda()
>>> mask = (x != 0).unsqueeze(1)
>>> dist = torch.cdist(pos, pos).float()
>>> out = model(x.long(), mask, dist)

Motif Extraction

We rely on RDKit to extract motifs in small molecules. Given the SMILES representation of any molecule, we can manually define the substructures using Smarts.

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles(smiles)
>>> pattern = Chem.MolFromSmarts('C(=O)')
>>> mol.HasSubstructMatch(pattern) # check whether the molecule has the motif 'C(=O)'
>>> mol.GetSubstructMatches(pattern) # get atoms that belong to the motif 'C(=O)'

Citation

If you like our paper and find it helpful, please cite!

@inproceedings{wu2023molformer,
  title={Molformer: Motif-based transformer on 3d heterogeneous molecular graphs},
  author={Wu, Fang and Radev, Dragomir and Li, Stan Z},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={37},
  number={4},
  pages={5312--5320},
  year={2023}
}

Contact

If you are interested in our work, please cite.
Any questions and collaboration are welcome, please contact Fang Wu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages