-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Protein FEP Functionality #294
Conversation
mapping for large molecules. This allows for the set up of protein FEP calculations (for example point mutations and covalent modifications). The merge code has also been modified to allow for multiple perturbable region of interests, which allows for multiple mutations FEP calculations to be run at the same time.
…osimspace into feature_protein_FEP
…osimspace into feature_protein_FEP
This functionality is needed in special cases where default rdKit MCS algorithm fails to provide suitable mappings.
which allows for more fine-grained matching for protein residues
This code works by breaking the two proteins into per-residue-parts and aligning each residue individually. The coordinates of the aligned residues are then used to update the coordinates of the input protein to be aligned.
Instead the function takes in the ROI residue indices as inputs now, which is consistent with roiMatch and roiAlign functions
previously for debugging
…biosimspace into feature_protein_FEP_2
Thanks, @akalpokas. This looks great. I'll try to find a block of time to go through and review. |
TODO before merging:
|
They will need to be updated once the test input files are moved online.
Suggested refactoring if we wish to preserve separate matching and alignment functions for regular and roi implementions... First move the existing functions to private module only implementations (not exposed to the user), e.g.: def matchAtoms(...): --> def _match_atoms(...):
def roiMatch(...): --> def _match_roi(...):
def rmsdAlign(...): --> def _rmsd_align(...):
def roi_align(...): --> def _roi_align(...): Then you just need to create wrapper functions that match the original APIs of # Same as before, but with extra roi kwarg.
def matchAtoms(..., roi=None):
"""Same docstring as before, plus roi kwarg."""
# Use regular backend.
if roi is None:
return _match_atoms(...)
# Use ROI backend.
else:
return _match_roi(..., roi=roi)
# Same thing for rsmdAlign.
def rmsdAlign(..., roi=None):
"""Same docstring as before, plus roi kwarg."""
# Use regular backend.
if roi is None:
return _rmsd_align(...)
# Use ROI backend.
else:
return _roi_align(..., roi=roi) |
I've just merged some fixes into |
I've triggered the CI and all tests are passing, other than an unreleated IO error on Windows which I'm re-running now. (This is a periodic problem we see with the WIndows runners and is nothing to do with your code.) Just a couple of things to check before merging:
|
|
Great. I'll just re-run the CI as a sanity check, then merge tomorrow. Many thanks for this, it's a really nice piece of work. For the next release it would be nice if there was a blog post or tutorial section to highlight this new feature. I'll try to remember closer to the time. |
No worries, thanks for the review! I'm very happy to work on a tutorial for this as there are some peculiarities with the implementation (the way atom ordering needs to be exact between two proteins) for the code to work properly, and I feel like these need to be clearly highlighted with examples in order not to confuse people. |
This PR introduces protein FEP functionality to BioSimSpace and allows for creating of hybrid protein/peptide systems with single or multiple simultaneous modifications, which can include point mutations to canonical or non-canonical amino acids, as well as covalent modifications.
The PR is essentially based around region of interest (ROI) idea, and code wise does the following:
merge
function from Exscientia sandpit out of the sandpit into the standard merge code location. I have tested this code with ethane to methanol perturbation, and it gives the same exact perturbation as the non-sandpit merge code, (I looked at gromacs .top files specifically) although it is probably a good idea to test this with more complex ligand perturbations.roiMatch
function is then added to the Align module. This function rapidly computes mappings between two large input molecules based on the idea of region of interest. It supports standard BioSimSpace.Align.matchAtoms as well as Kartograf as mapping backends. Therefore an internal_kartograf_map
function is also added to the Align module. In my experience kartograf in the protein FEP context is really only needed for niche cases such when trying to compute a mapping between two enantiomers of two covalently modified residues, so it's not critical to have it as a mapping backend for most use cases. ThematchAtoms
function is also modified in order to expose some of more lower-level rdKit mapping functionality which I found useful sometimes when trying to force a specific mapping for covalently modified residues. It does not affect the default behaviour of the mapping function.roiAlign
function is also added to the Align module. This function aligns selected residues from molecule0 to molecule1, which allows us to drop the assumption that two input proteins need to be in the same conformation in order to create a merged protein. In most cases this assumption is satisfied (if you just create a carbon copy of the wild-type protein and mutate 1 or more residues), however with this function we can execute more complex workflows such as non-equilibrium switching protein FEP (where you would generate wild-type and mutant trajectories separately without alchemical code, and then start to create hybrid snapshots that now have to deal with two input protein structures which are not fully aligned). This function can either usermsdAlign
orflexAlign
functions internally, depending on how precise alignment is needed. In my experiencermsdAlign
works perfectly after some minimisation.Overall, this means that a hybrid proteins can be created with just few lines of code that are nearly identical to the normal input creation for FEP workflow:
A minimal code example with input files is also provided: minimal_pfep_example.tar.gz
While in theory the
roiMatch
androiAlign
functions could be incorporated to already existingmatchAtoms
andrmsdAlign
functions to have ROI functionality (in the way thatmerge
function has ROI functionality and not a separateroiMerge
function), I opted to have them as separate functions essentially to eliminate any risk of inadvertently modifying their behaviour and introducing bugs down the line. I think that this approach is safer overall, even if it introduces more functions to the Align module.Another thing to note is that
viewMapping
function will be quite useless with this current ROI implementation since the mappings provided to the function will usually be quite large, in the future I would like to modify this function to allow mapping viewing between ROI residues. It's also important to note that I haven't tested this implementation with every possible canonical amino acid mutation, therefore it should be still treated as somewhat experimental, although I do believe the overall workflow will work correctly and any potential issues would arise from how the residues are mapped with the backend MCS function. Proline mutations are of course still not supported.devel
into this branch before issuing this pull request (e.g. by runninggit pull origin devel
): ✅Suggested reviewers:
@lohedges,