Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Protein FEP Functionality #294

Merged
merged 41 commits into from
Jun 18, 2024

Conversation

akalpokas
Copy link
Contributor

This PR introduces protein FEP functionality to BioSimSpace and allows for creating of hybrid protein/peptide systems with single or multiple simultaneous modifications, which can include point mutations to canonical or non-canonical amino acids, as well as covalent modifications.

The PR is essentially based around region of interest (ROI) idea, and code wise does the following:

  • Moves the merge function from Exscientia sandpit out of the sandpit into the standard merge code location. I have tested this code with ethane to methanol perturbation, and it gives the same exact perturbation as the non-sandpit merge code, (I looked at gromacs .top files specifically) although it is probably a good idea to test this with more complex ligand perturbations.
  • This merge code is then modified to remove the need to provide ROI atom indices as inputs (this was a requirement in original Exscientia's implementation). Instead these are reconstructed inside the merge function which simplifies the overall workflow for the user, as now they only need to pass the ROI residue indices. The merge function is then modified to allow for multiple simultaneous merges to be performed between two molecules. This was tested with MDM2 T16G-I19G double mutant system and behaves as intended. Note that these modifications are only invoked when using ROI code only, it does not affect the standard merge code.
  • The roiMatch function is then added to the Align module. This function rapidly computes mappings between two large input molecules based on the idea of region of interest. It supports standard BioSimSpace.Align.matchAtoms as well as Kartograf as mapping backends. Therefore an internal _kartograf_map function is also added to the Align module. In my experience kartograf in the protein FEP context is really only needed for niche cases such when trying to compute a mapping between two enantiomers of two covalently modified residues, so it's not critical to have it as a mapping backend for most use cases. The matchAtoms function is also modified in order to expose some of more lower-level rdKit mapping functionality which I found useful sometimes when trying to force a specific mapping for covalently modified residues. It does not affect the default behaviour of the mapping function.
  • The roiAlign function is also added to the Align module. This function aligns selected residues from molecule0 to molecule1, which allows us to drop the assumption that two input proteins need to be in the same conformation in order to create a merged protein. In most cases this assumption is satisfied (if you just create a carbon copy of the wild-type protein and mutate 1 or more residues), however with this function we can execute more complex workflows such as non-equilibrium switching protein FEP (where you would generate wild-type and mutant trajectories separately without alchemical code, and then start to create hybrid snapshots that now have to deal with two input protein structures which are not fully aligned). This function can either use rmsdAlign or flexAlign functions internally, depending on how precise alignment is needed. In my experience rmsdAlign works perfectly after some minimisation.

Overall, this means that a hybrid proteins can be created with just few lines of code that are nearly identical to the normal input creation for FEP workflow:

import BioSimSpace as BSS
mdm2_wt = BSS.IO.readMolecules(f"mdm2_wt.pdb")[0]
mdm2_mut = BSS.IO.readMolecules(f"mdm2_v14g.pdb")[0]

mdm2_wt = BSS.Parameters.ff14SB(mdm2_wt, ensure_compatible=False).getMolecule()
mdm2_mut = BSS.Parameters.ff14SB(mdm2_mut, ensure_compatible=False).getMolecule()

mapping = BSS.Align.roiMatch(molecule0=mdm2_wt, molecule1=mdm2_mut, roi=[9])
aligned_wt = BSS.Align.roiAlign(molecule0=mdm2_wt, molecule1=mdm2_mut, roi=[9])
merged = BSS.Align.merge(molecule0=aligned_wt, molecule1=mdm2_mut, mapping=mapping, roi=[9])

A minimal code example with input files is also provided: minimal_pfep_example.tar.gz

While in theory the roiMatch and roiAlign functions could be incorporated to already existing matchAtoms and rmsdAlign functions to have ROI functionality (in the way that merge function has ROI functionality and not a separate roiMerge function), I opted to have them as separate functions essentially to eliminate any risk of inadvertently modifying their behaviour and introducing bugs down the line. I think that this approach is safer overall, even if it introduces more functions to the Align module.

Another thing to note is that viewMapping function will be quite useless with this current ROI implementation since the mappings provided to the function will usually be quite large, in the future I would like to modify this function to allow mapping viewing between ROI residues. It's also important to note that I haven't tested this implementation with every possible canonical amino acid mutation, therefore it should be still treated as somewhat experimental, although I do believe the overall workflow will work correctly and any potential issues would arise from how the residues are mapped with the backend MCS function. Proline mutations are of course still not supported.


  • I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): ✅
  • I confirm that I have added a test for any new functionality in this pull request: ❌ - I still need to write automated tests for the new code, although the code does contain some sanity checks to test whether same residues are being mapped, etc.
  • I confirm that I have added documentation (e.g. a new tutorial page or detailed guide) for any new functionality in this pull request: ❌ - This still needs to be done too.
  • I confirm that I have permission to release this code under the GPL3 license: ✅

Suggested reviewers:

@lohedges,

mapping for large molecules. This allows for the set up of protein FEP calculations (for example point mutations and covalent modifications).  The merge code has also been modified to allow for multiple perturbable region of interests, which allows for multiple mutations FEP calculations to be run at the same time.
This functionality is needed in special cases where default rdKit MCS algorithm fails to provide suitable mappings.
which allows for more fine-grained matching for protein residues
This code works by breaking the two proteins into per-residue-parts and aligning each residue individually. The coordinates of the aligned residues are then used to update the coordinates of the input protein to be aligned.
Instead the function takes in the ROI residue indices as inputs now, which is consistent with roiMatch and roiAlign functions
@akalpokas akalpokas marked this pull request as ready for review May 21, 2024 16:00
@lohedges
Copy link
Contributor

Thanks, @akalpokas. This looks great. I'll try to find a block of time to go through and review.

@akalpokas
Copy link
Contributor Author

akalpokas commented May 30, 2024

TODO before merging:

  • Write tests for the newly implemented ROI functions - Completed 11/06/24
  • Consolidate ROI functions into standard BSS.Align functions (i.e. roiMatch(roi=x) --> matchAtoms(roi=x)) - Completed 11/06/24
  • Update BSS.Align.viewMapping to support viewing ROIs - Completed 11/06/24

They will need to be updated once the test input files are moved online.
@lohedges
Copy link
Contributor

lohedges commented Jun 7, 2024

Suggested refactoring if we wish to preserve separate matching and alignment functions for regular and roi implementions...

First move the existing functions to private module only implementations (not exposed to the user), e.g.:

def matchAtoms(...): --> def _match_atoms(...):

def roiMatch(...): --> def _match_roi(...):

def rmsdAlign(...): --> def _rmsd_align(...):

def roi_align(...): --> def _roi_align(...):

Then you just need to create wrapper functions that match the original APIs of matchAtoms and rmsdAlign with the addition of the extra roi kwarg. Inside these functions we check if roi is None, then call the appropriate backend, i.e.:

# Same as before, but with extra roi kwarg.
def matchAtoms(..., roi=None):
    """Same docstring as before, plus roi kwarg."""
   
    # Use regular backend.
    if roi is None:
        return _match_atoms(...)
    # Use ROI backend.
    else:
        return _match_roi(..., roi=roi)

# Same thing for rsmdAlign.
def rmsdAlign(..., roi=None):
    """Same docstring as before, plus roi kwarg."""
   
    # Use regular backend.
    if roi is None:
        return _rmsd_align(...)
    # Use ROI backend.
    else:
        return _roi_align(..., roi=roi)

@lohedges
Copy link
Contributor

I've just merged some fixes into develso you'll need to merge across to your branch and resolve any conflicts. The only one that could cause an issue is this. If conflicts do arise, you should be able to look at the block of code added in the PR and just paste it at the right place in your edited file. Let me know if there are any problems.

@lohedges lohedges added the enhancement New feature or request label Jun 17, 2024
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 10:34 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 10:34 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 10:34 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 10:34 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 11:24 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 11:24 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 11:24 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 11:24 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 11:24 — with GitHub Actions Inactive
@lohedges
Copy link
Contributor

I've triggered the CI and all tests are passing, other than an unreleated IO error on Windows which I'm re-running now. (This is a periodic problem we see with the WIndows runners and is nothing to do with your code.)

Just a couple of things to check before merging:

  1. Did you manage to fix the circular import issue that you mentioned above, i.e. importing _validate_roi directly from the module file.
  2. Are you happy with the updated visualisation code, or does it require more polish before merging? As long as it works things should be okay. It can always be refactored at a later date. (Assuming the API doesn't need to change.)

@akalpokas
Copy link
Contributor Author

  1. I just tried it again now, and it seems to work for some reason? I'm not sure if the changes I merged from devel fixed it somehow? Regardless, it works now and doesn't trigger circular import.
  2. I think it should probably be polished for the next BioSimSpace release, as I basically just copied the code and changed it a bit to work for our inputs. However, for the development version I think it's good to go as it is. I tested it with simple ligand and protein mappings and it all looks good to me.

@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 15:09 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 15:09 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 15:09 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 15:09 — with GitHub Actions Inactive
@akalpokas akalpokas temporarily deployed to biosimspace-build June 17, 2024 15:09 — with GitHub Actions Inactive
@lohedges
Copy link
Contributor

Great. I'll just re-run the CI as a sanity check, then merge tomorrow. Many thanks for this, it's a really nice piece of work. For the next release it would be nice if there was a blog post or tutorial section to highlight this new feature. I'll try to remember closer to the time.

@akalpokas
Copy link
Contributor Author

No worries, thanks for the review! I'm very happy to work on a tutorial for this as there are some peculiarities with the implementation (the way atom ordering needs to be exact between two proteins) for the code to work properly, and I feel like these need to be clearly highlighted with examples in order not to confuse people.

@lohedges lohedges merged commit 6b23473 into OpenBioSim:devel Jun 18, 2024
5 checks passed
@akalpokas akalpokas deleted the feature_protein_FEP_2 branch June 18, 2024 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants