-
Notifications
You must be signed in to change notification settings - Fork 21
Aligning proteins
First, download the DeepBLAST pretrained model from https://figshare.com/s/e414d6a52fd471d86d69
Once those two models are downloaded, you can load the DeepBLAST model.
from deepblast.utils import load_model
model = load_model("deepblast-v3.ckpt")
If you already have the protrans model downloaded, you can specify the path directly. This is beneficial if you are going to run TM-vec multiple times. You can download protrans as follows (assuming you have git-lfs already installed)
git lfs install
git clone https://huggingface.co/Rostlab/prot_t5_xl_uniref50
If you have the protrans model in the same directory as the deepblast model, you can run
model = load_model("deepblast-v3.ckpt", "prot_t5_xl_uniref50", device='cuda')
model = load_model("deepblast-v3.ckpt", device='cpu')
As another note, the load_model
function as an option to allow to specify what type of alignment you want to perform inference using the alignment_mode
option. You can either specify needleman-wunsch
for global alignment or smith-waterman
for local alignment.
Once the model is loaded, we can test out DeepBLAST by structurally aligning two proteins using only their sequences
x = 'IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQQFVANVEEEEAWINEKMTLVASED'
y = 'QQNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSDA'
# obtains alignment string specifying structural superposition
pred_alignment = model.align(x, y)
The resulting alignment specifies which residues are aligned. :
indicates matches, 1
indicates residues matched to sequence 1 (aka insertions) and 2
indicates residues matched to sequence 2 (aka deletions). To make this more human readable, we can directly visualize the alignment.
from deepblast.dataset.utils import states2alignment
x_aligned, y_aligned = states2alignment(pred_alignment, x, y)
print(x_aligned)
print(pred_alignment)
print(y_aligned)
Output
-IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQ-QFVANVEEEEAWINEKMTLVASED
21:::::::::::::::::::::::::::::::::::::2::::::::::::::::::::::1:
Q-QNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSD-A