DAQ is a computational tool using deep learning that can estimate the residue-wise local quality for protein models from cryo-Electron Microscopy (EM) maps.
Copyright (C) 2021 Genki Terashi* , Xiao Wang*, Sai Raghavendra Maddhuri Venkata Subramaniya, John J. G. Tesmer, and Daisuke Kihara, and Purdue University.
License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)
Contact: Daisuke Kihara (dkihara@purdue.edu)
For technical problems or questions, please reach to Xiao Wang (wang3702@purdue.edu).
@article{terashi2022residue,
title={Residue-wise local quality estimation for protein models from cryo-EM maps},
author={Terashi, Genki and Wang, Xiao and Maddhuri Venkata Subramaniya, Sai Raghavendra and Tesmer, John JG and Kihara, Daisuke},
journal={Nature Methods},
volume={19},
number={9},
pages={1116--1125},
year={2022},
publisher={Nature Publishing Group US New York}
}
Online Server: https://em.kiharalab.org/algorithm/daqscore
Colab Website (Online platform): https://bit.ly/daq-score or https://github.com/kiharalab/DAQ/blob/main/DAQ_Score.ipynb
All the functions in this github are available here. Related instructions are included in the Colab website.
An increasing number of protein structures are determined by cryogenic electron microscopy (cryo-EM). Although the resolution of determined cryo-EM density maps is improving in general, there are still many cases where amino acids of a protein are assigned with different levels of confidence, including those assigned with relatively high ambiguity. Here, we developed a method that identifies potential misassignment of residues in the map, including residue shifts along an otherwise correct main-chain trace. The score, named DAQ, computes the likelihood that the local density corresponds to different amino acids, atoms, and secondary structures from the map density distribution and assesses how well amino acids in the reconstructed model structure agree with the likelihood. DAQ is complementary to existing model validation scores for cryo-EM that examine local density gradient in the map or stereochemical geometry of the structure model. When DAQ was applied to different versions of model structure entries in PDB that were derived from the same density maps, a clear improvement of DAQ-score was observed in the newer versions of the models. The DAQ-score also found potential misassignment errors in a substantial number of over 4400 deposited protein structure models built into cryo-EM maps.
where
*If the assignment is correct, DAQ will be positive, and negative if the assignment may be incorrect.
*If a position in the map does not have distinct density pattern for the assigned amino acid (or secondary structure, Calpha atom), DAQ will be close to 0.
Python 3 : https://www.python.org/downloads/
Pymol(for visualization): https://pymol.org/2/
1. Install git
git clone https://github.com/kiharalab/DAQ && cd DAQ
3.1 install conda
.
conda create -n daq python=3.8.5
conda activate daq
conda install conda-forge::gxx
pip install -r requirements.txt
Each time when you want to run my code, simply activate the environment by
conda activate daq
conda deactivate(If you want to exit)
python3 main.py -h:
-h, --help show this help message and exit
-F F Map file path
-M M QA deep learning model path, default:"best_model/qa_model/Multimodel.pth"
-P P PDB file path
--mode MODE Running Mode
--stride STRIDE Stride size for scanning maps (default:1)
--voxel_size input voxel size (default:11)
--gpu GPU specify the gpu to use
--batch_size batch size for inference (default:256)
--cardinality ResNeXt cardinality
--window WINDOW half window size to smooth the score for output (default:9)
Since DAQ(AA) yields the best score
python main.py --mode=0 -F [Map_path] -P [Structure_path] --window [half_window_size] --stride [stride_size]
Please Run the script under "DAQ" directory, otherwise it may raise errors because of the complilation failure.
Here [Map_path] is the cryo-EM map file path in your computer, which can be *.mrc and *.mrc.gz format, [Structure_path] is the protein structure in pdb format; [half_window_size] is half of the window size that used for smoothing the residue-wise score based on a sliding window scanning the entire sequence, here half_window_size=(window_size-1)/2; [stride_size] is the stride step to scan the maps.
Output will be saved in "Predict_Result_WithPDB/[Input_Map_Name]".
python main.py --mode=0 -F example/2566_3J6B_9.mrc -P example/3J6B_9.pdb --window 9 --stride 2
Results of this example is saved in 2566_Result
If the cryo-EM map grid spacing is not 1, it typically takes longer time to resample the map to have grid spacing 1 by our script. Hence, you can also use ChimeraX to accelerate the speed by providing the script a resampled map:
1 open your map via chimeraX.
2 In the bottom command line to type command: vol resample #1 spacing 1.0
3 In the bottom command line to type command: save newmap.mrc model #2
4 Then you can use the resampled map to upload
In Pymol, open "daq_score_w9.pdb" file, please type the following command line:
spectrum b, red_white_blue, all, -1,1
Here blue region means the quality is acceptable while red region means the quality is not so good. Here we put DAQ(AA) score in the b-factor columns. Detailed explanation is here.
The detailed instructions are here.
To build the Docker image, change the current directory to DAQ_container. Use the following command to create the image: sudo docker build -t daq .
To use the image, please follow the rest of the user manual available at https://kiharalab.org/emsuites/daq.php.
For possible errors and solutions, please check QA.md