Skip to content

Aligning proteins

Jamie Morton edited this page Nov 4, 2024 · 17 revisions

Downloading modeling

First, download the DeepBLAST pretrained model from https://figshare.com/s/e414d6a52fd471d86d69

Once those two models are downloaded, you can load the DeepBLAST model.

GPU model loading

from deepblast.utils import load_model
model = load_model("deepblast-v3.ckpt")

If you already have the protrans model downloaded, you can specify the path directly. This is beneficial if you are going to run TM-vec multiple times. You can download protrans as follows (assuming you have git-lfs already installed)

git lfs install
git clone https://huggingface.co/Rostlab/prot_t5_xl_uniref50

If you have the protrans model in the same directory as the deepblast model, you can run

model = load_model("deepblast-v3.ckpt", "prot_t5_xl_uniref50", device='cuda')

CPU modeling loading

model = load_model("deepblast-v3.ckpt", device='cpu')

As another note, the load_model function as an option to allow to specify what type of alignment you want to perform inference using the alignment_mode option. You can either specify needleman-wunsch for global alignment or smith-waterman for local alignment.

Visualizing alignments

Once the model is loaded, we can test out DeepBLAST by structurally aligning two proteins using only their sequences

x = 'IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQQFVANVEEEEAWINEKMTLVASED'
y = 'QQNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSDA'
# obtains alignment string specifying structural superposition
pred_alignment = model.align(x, y)

The resulting alignment specifies which residues are aligned. : indicates matches, 1 indicates residues matched to sequence 1 (aka insertions) and 2 indicates residues matched to sequence 2 (aka deletions). To make this more human readable, we can directly visualize the alignment.

from deepblast.dataset.utils import states2alignment
x_aligned, y_aligned = states2alignment(pred_alignment, x, y)
print(x_aligned)
print(pred_alignment)
print(y_aligned)

Output

-IGKEEIQQRLAQFVDHWKELKQLAAARGQRLEESLEYQ-QFVANVEEEEAWINEKMTLVASED
21:::::::::::::::::::::::::::::::::::::2::::::::::::::::::::::1:
Q-QNKELNFKLREKQNEIFELKKIAETLRSKLEKYVDITKKLEDQNLNLQIKISDLEKKLSD-A