-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new helper functions to PDBManager
#322
Conversation
…with the PDBManager
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (less changelog)!
assert ( | ||
len(filtered_pdb) > 0 | ||
), "Filtered DataFrame must contain atoms." | ||
if "ATOM" in key and len(filtered_pdb) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be being a little pedantic here but I see two possible edge cases with this:
- Sometimes protein atoms are stored as HETATMs (typically modified residues but this kind of bad practice does happen as an abuse of the PDB format to suit some niche way to store structure data)
- Similarly, what if the desired selection is actually the HETATM data? Protein-nucleic acid complexes or protein-peptide complexes may store the ligand as a HETATM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Points 1 and 2 didn't come to my mind when I originally implemented this, and I kept expanding on it until now. In light of these points, I think it makes more sense to avoid skipping such DataFrames altogether. It will then be the user's responsibility to "vet" the exported PDB files for these kinds of edge cases in their selected PDBs. Better yet, we can simply issue a warning to users that no "standard" atoms were found post-filtering. However, we would then still export the PDB complex as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified this logic to only issue a warning in such an edge case now.
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## master #322 +/- ##
==========================================
+ Coverage 40.27% 43.74% +3.47%
==========================================
Files 48 113 +65
Lines 2811 7829 +5018
==========================================
+ Hits 1132 3425 +2293
- Misses 1679 4404 +2725
☔ View full report in Codecov by Sentry. |
Feel free to merge this PR if and when you are ready to. |
Kudos, SonarCloud Quality Gate passed! |
What does this implement/fix? Explain your changes
Adds some new helper functions to the
PDBManager
to allow one to work more easily with heterogeneous protein complexes.What testing did you do to verify the changes in this PR?
I evaluated these functions in a local copy of this repository.
Pull Request Checklist
./CHANGELOG.md
file (if applicable)./graphein/tests/*
directories (if applicable)./notebooks/
(if applicable)python -m py.test tests/
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,python -m py.test tests/protein/test_graphs.py
)black .
andisort .