Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local pdb + dssp = error #171

Closed
avivko opened this issue May 14, 2022 · 3 comments · Fixed by #172 or #176
Closed

local pdb + dssp = error #171

avivko opened this issue May 14, 2022 · 3 comments · Fixed by #172 or #176

Comments

@avivko
Copy link
Contributor

avivko commented May 14, 2022

Describe the bug
When using a local pdb file and a ProteinGraphConfig that contains a DSSPConfig, DSSPConfig always tries to download the pdb file, even if it is in the pdb_dir, and throws an error if the name of the pdb file is not a PDB ID.

To Reproduce

from functools import partial
from graphein.protein.subgraphs import extract_subgraph_from_chains
from graphein.protein.config import ProteinGraphConfig, DSSPConfig
from graphein.protein.features.nodes.amino_acid import expasy_protein_scale, meiler_embedding
from graphein.protein.features.nodes import asa, rsa
from graphein.protein.edges.distance import (add_peptide_bonds,
                                             add_hydrogen_bond_interactions,
                                             add_disulfide_interactions,
                                             add_ionic_interactions,
                                             add_aromatic_interactions,
                                             add_aromatic_sulphur_interactions,
                                             add_cation_pi_interactions
                                            )


conf_functions = {"edge_construction_functions": [add_peptide_bonds,
                                                  add_aromatic_interactions,
                                                  add_hydrogen_bond_interactions,
                                                  add_disulfide_interactions,
                                                  add_ionic_interactions,
                                                  add_aromatic_sulphur_interactions,
                                                  add_cation_pi_interactions],
                  "graph_metadata_functions": [asa, rsa],                                        # Add ASA and RSA features.
                  "node_metadata_functions": [meiler_embedding,partial(expasy_protein_scale, add_separate=True)], # Add expasy features (partial: each feature is added under a separate key)
                  "dssp_config":DSSPConfig(),                                                    # Add DSSP config in order to compute ASA and RSA.
                  "pdb_dir": '/vol/tmp/kormanav/pdb_dir'
                 }        
batch_config = ProteinGraphConfig(**conf_functions)

construct_graph(config=batch_config, pdb_path="/vol/tmp/kormanav/6rew_copy.pdb")

Results in:

DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1365 total nodes
INFO:graphein.protein.edges.distance:Found: 234 aromatic-aromatic interactions
INFO:graphein.protein.edges.distance:Found 532 hbond interactions.
INFO:graphein.protein.edges.distance:Found 55 hbond interactions.
DEBUG:graphein.protein.edges.distance:0 CYS residues found. Cannot add disulfide interactions with fewer than two CYS residues.
INFO:graphein.protein.edges.distance:Found 11848 ionic interactions.
Downloading PDB structure '6rew_copy'...
ERROR:graphein.protein.utils:PDB file 6rew_copy not found and no replacement                       structure found in obsolete lookup.
Desired structure doesn't exists
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [20], in <cell line: 39>()
     37 graph_list = []
     38 y_list = []
---> 39 construct_graph(config=batch_config, pdb_path="/vol/tmp/kormanav/6rew_copy.pdb")
     40 """for idx, pdb_p in enumerate(tqdm(pdb_paths)):
     41     print('pdb:', pdb_p)
     42     try:
   (...)
     46         print(f'PDB #{idx}: processing error!')
     47         pass"""

File ~/repos_dev/graphein/graphein/protein/graphs.py:614, in construct_graph(config, pdb_path, pdb_code, chain_selection, df_processing_funcs, edge_construction_funcs, edge_annotation_funcs, node_annotation_funcs, graph_annotation_funcs)
    612 # Annotate additional graph metadata
    613 if config.graph_metadata_functions is not None:
--> 614     g = annotate_graph_metadata(g, config.graph_metadata_functions)
    616 # Annotate additional edge metadata
    617 if config.edge_metadata_functions is not None:

File ~/repos_dev/graphein/graphein/utils/utils.py:69, in annotate_graph_metadata(G, funcs)
     58 """
     59 Annotates graph with graph-level metadata
     60 
   (...)
     66 :rtype: nx.Graph
     67 """
     68 for func in funcs:
---> 69     func(G)
     70 return G

File ~/repos_dev/graphein/graphein/protein/features/nodes/dssp.py:239, in asa(G)
    230 def asa(G: nx.Graph) -> nx.Graph:
    231     """
    232     Adds ASA of each residue in protein graph as calculated by DSSP.
    233 
   (...)
    237     :rtype: nx.Graph
    238     """
--> 239     return add_dssp_feature(G, "asa")

File ~/repos_dev/graphein/graphein/protein/features/nodes/dssp.py:174, in add_dssp_feature(G, feature)
    144 """
    145 Adds add_dssp_feature specified amino acid feature as calculated
    146 by DSSP to every node in a protein graph
   (...)
    171 :rtype: nx.Graph
    172 """
    173 if "dssp_df" not in G.graph:
--> 174     G = add_dssp_df(G, G.graph["config"].dssp_config)
    176 config = G.graph["config"]
    177 dssp_df = G.graph["dssp_df"]

File ~/repos_dev/graphein/graphein/protein/features/nodes/dssp.py:122, in add_dssp_df(G, dssp_config)
    120 # Check for existence of pdb file. If not, download it.
    121 if not os.path.isfile(config.pdb_dir / pdb_id):
--> 122     pdb_file = download_pdb(config, pdb_id)
    123 else:
    124     pdb_file = config.pdb_dir + pdb_id + ".pdb"

File ~/repos_dev/graphein/graphein/protein/utils.py:100, in download_pdb(config, pdb_code)
     95         log.error(
     96             f"PDB file {pdb_code} not found and no replacement \
     97                   structure found in obsolete lookup."
     98         )
     99 # Rename file to .pdb from .ent
--> 100 os.rename(
    101     config.pdb_dir / f"pdb{pdb_code}.ent",
    102     config.pdb_dir / f"{pdb_code}.pdb",
    103 )
    105 # Assert file has been downloaded
    106 assert any(pdb_code in s for s in os.listdir(config.pdb_dir))

FileNotFoundError: [Errno 2] No such file or directory: '/vol/tmp/kormanav/pdb_dir/pdb6rew_copy.ent' -> '/vol/tmp/kormanav/pdb_dir/6rew_copy.pdb'

Whereby this also happens when the file is placed in the pdb_dir:

ls /vol/tmp/kormanav/pdb_dir
3eiy.pdb
6rew_copy.pdb	
6rew.pdb

Expected behavior

  • I want to be able to use dssp features like rsa etc. on structures that can not be simply downloaded from the PDB database.
  • No PDB download should be triggered when using paths to local pdb files.

Desktop (please complete the following information):

  • OS: macOS
  • Python Version: 3.8
  • Graphein Version: Graphein on master branch (version 1.4.0), with pip install -e
@avivko
Copy link
Contributor Author

avivko commented May 14, 2022

@a-r-j do you want to fix it or should I have a go at it?

@a-r-j
Copy link
Owner

a-r-j commented May 14, 2022

I've penciled in finishing a few pending PRs today. If you want it fast, I'd appreciate a contribution :)

Should be an easy fix: I think it's just missing a .pdb extension in this line:

if not os.path.isfile(config.pdb_dir / pdb_id):

@avivko
Copy link
Contributor Author

avivko commented May 16, 2022

Was just about to work on the hotfix, but I see @OliverT1 was quicker than I was :)
Hopefully the PR will be merged soon!

a-r-j pushed a commit that referenced this issue May 16, 2022
avivko added a commit to avivko/graphein that referenced this issue May 22, 2022
a-r-j pushed a commit that referenced this issue May 23, 2022
* adds node name to hover

* fixes relative paths (threw errors before)

* refactored protein graphs to always have as params: name, pdb_code, pdb_path. Also fixes #171, which was not properly fixed by #172

* fixed notebook execution failure, ran black, fixed docstring

* adds test for PR #176: dssp with pdb code or local pdb

* ran black, added notebook show_edges visualization, added myself to CONTRIBUTORS.md

* dssp now reconstructs a pdb instead of downloading one if none available. pdb_dir default changed to /tmp

* re-ran black

* fixed tmp security issue. Updated changelog.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants