Fair-software.nl Recommendations | Badges |
---|---|
1. Code Repository | |
2. License | |
3. Community Registry | |
4. Enable Citation | |
5. Code Quality Checklist | |
Code Analysis |
PSSMGen: Generates Consistent PSSM and/or PDB Files for Protein-Protein Complexes
- Make sure BLAST is installed and its database is available on your machine. Otherwise, install BLAST and download its databases by following the BLAST guide. To calculate PSSM, the recommended database is the non-redundant protein sequences
nr
(i.e.nr.*.tar.gz
files from the ftp site). - Install the PSSMgen by
pip install PSSMGen
.
PSSMGen
is geared toward computing the pssm files for all models of a particular protein-protein complex.
This tool assumes your files have following structure:
workdir
|_ pdb
|_ fasta
|_ pssm_raw
|_ pssm
|_ pdb_nonmatch
workdir
is your working directory for one specific protein-protein complex.pdb
folder contains the PDB files (consistent PDB files)fasta
folder contains the protein sequence FASTA files. The code can generate the FASTA files by extracting sequences from thepdb
file , or you can manually create this folder and put customised FASTA files there.pssm_raw
folder stores the PSSM files. The code can automatically generate them, or you can manually create this folder and put customised PSSM files there.pssm
folder stores consistent PSSM files, whose sequences are aligned with those of PDB files. This folder and its files are created automatically.pdb_nonmatch
folder stores the inconsistent PDB files, while the related consistent PDB files are in thepdb
folder. This folder and its files are created automatically.
The code assumes you follow the naming rules for different file types:
- PDB files: caseID_*.chainID.pdb
- FASTA files: caseID.chainID.fasta
- PSSM files: caseID.chainID.pssm, caseID_*.chainID.pdb.pssm
Here are some examples for the complex 7CEI
.
The file structure and input files should look like
7CEI
├── pdb
│ ├── 7CEI_1w.pdb
│ ├── 7CEI_2w.pdb
│ └── 7CEI_3w.pdb
└── fasta
├── 7CEI.A.fasta
└── 7CEI.B.fasta
from pssmgen import PSSM
# initiate the PSSM object
gen = PSSM(work_dir='7CEI')
# set psiblast executable, database and other psiblast parameters (here shows the defaults)
gen.configure(blast_exe='/home/software/blast/bin/psiblast',
database='/data/DBs/blast_dbs/nr_v20180204/nr',
num_threads = 4, evalue=0.0001, comp_based_stats='T',
max_target_seqs=2000, num_iterations=3, outfmt=7,
save_each_pssm=True, save_pssm_after_last_round=True)
# generates raw PSSM files by running BLAST with fasta files
gen.get_pssm(fasta_dir='fasta', out_dir='pssm_raw', run=True, save_all_psiblast_output=True)
The code will automatically create pssm_raw
folder to store the generated PSSM files.
After getting the raw PSSMs from last example, we could map them to PDB files to get consistent PSSM and PDB files as following:
# map PSSM and PDB to get consisitent/mapped PSSM files
gen.map_pssm(pssm_dir='pssm_raw', pdb_dir='pdb', out_dir='pssm', chain=('A','B'))
# write consistent/mapped PDB files and move inconsistent ones to another folder for backup
gen.get_mapped_pdb(pdbpssm_dir='pssm', pdb_dir='pdb', pdbnonmatch_dir='pdb_nonmatch')
The code will automatically create pssm
and pdb_nonmatch
folders and related files.
If the FASTA files are not provided, you can also generate them from the PDB file.
The file structure and input files should look like
7CEI
└── pdb
├── 7CEI_1w.pdb
├── 7CEI_2w.pdb
└── 7CEI_3w.pdb
# initiate the PSSM object
gen = PSSM('7CEI')
# extract FASTA file from the reference pdb file.
# if `pdbref` is not set, the code will randomly select one pdb as reference.
gen.get_fasta(pdb_dir='pdb', pdbref='7CEI_1w.pdb', chain=('A','B'), out_dir='fasta')
The code will automatically create fasta
and pssm_raw
folders for fasta files and raw pssm files, repsectively.
You can provide raw PSSM files intead of calculating them.
The file structure and input files should look like
7CEI
├── pdb
│ ├── 7CEI_1w.pdb
│ ├── 7CEI_2w.pdb
│ └── 7CEI_3w.pdb
└── pssm_raw
├── 7CEI.A.pssm
└── 7CEI.B.pssm
from pssmgen import PSSM
# initiate the PSSM object
gen = PSSM('7CEI')
# map PSSM and PDB to get consisitent files
gen.map_pssm()
# write consistent files and move
gen.get_mapped_pdb()