-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feature branch: Topology Biopolymer Refactor #951
Conversation
* improve performance of smarts matching on proteins * add cachetools to deps
Create T4 SDF using OpenEye
* Initial implementation of atom metadata + tests * add docstrings and update releasehistory * update CI to run on topology biopolymer refactor branch
* fix metadata dict serialization * Only cache connection table in to_openeye and to_rdkit * have find_smarts_matches only convert connection table to destination format * black
* Initial implementation of atom metadata + tests * black * black * add docstrings and update releasehistory * update CI to run on topology biopolymer refactor branch * isort * Adding serialized substructure dictionary file * Molecule perceive residues capability prototype. * Initial tests for perceive residue substructures * After working session with JW. Not working. * Correct atom names in serialized substructure file. * Residue perception with atom type data implementation. * Tests for residue perception. * Changing test for conflicting histidine HIP * Discard match if it is smaller. * Adding test sdf file for HIP * Making Atom with metadata serializable again. * Adding substructure library file with explicit bond orders Co-authored-by: j-wags <jwagnerjpl@gmail.com>
* Class for creating substructure library from CCD/cif files. * Override some entries from aa variants. Clean before filling data. * Better naming, documentation and cleaning up a bit. * Now with atom name information in substructure dictionary. * Dealing with atom names correctly when no leaving atoms. * Support for optional explicit bond orders.
This pull request introduces 2 alerts when merging 519f69c into b2b76bb - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging 7fb46c7 into b2b76bb - view on LGTM.com new alerts:
|
openff/toolkit/topology/molecule.py
Outdated
import networkx as nx | ||
import numpy as np | ||
from cached_property import cached_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from cached_property import cached_property | |
from functools import cached_property |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be related to cached_property
only being available in 3.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're usiing https://pypi.org/project/cached-property/ I'm not sure if https://pypi.org/project/backports.cached-property/ would be better, and the answer comes from how each works with 3.8+; a solution in the standard library should be used over a third-party backport. I have not looked into this but it should be a quick check.
With the T4 lysozyme, I'm getting a single atom with missing metadata, causing issues with export: In: import re
from collections import Counter
from openff.toolkit.topology import Molecule, Topology
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.utils import get_data_file_path
from openff.units import unit
from simtk import unit as simtk_unit
from simtk.openmm.app import PDBFile
from openff.interchange.components.interchange import Interchange
from openff.interchange.drivers import get_gromacs_energies, get_openmm_energies
from openff.interchange.utils import get_test_file_path
def _fix_openff_impropers(force_field: ForceField):
impropers = force_field.get_parameter_handler("ImproperTorsions")
for imp in impropers.parameters:
imp.smirks = re.sub("(:[23])(.+)(:[23])", r"\3\2\1", imp.smirks)
return force_field
protein = Molecule.from_file(get_data_file_path("proteins/T4-protein.sdf"))
protein.perceive_residues()
top = protein.to_topology()
print("Counter(Length of metadata dict : number of atoms)")
print(Counter([len(a.atom.metadata) for a in top.topology_atoms]))
print("The atom with no metadata:")
for top_atom in top.topology_atoms:
if len(top_atom.atom.metadata) == 0:
print(top_atom) Out:
|
Resolve conflicts, folding in changes from #1026 # Conflicts: # .github/workflows/CI.yml # devtools/conda-envs/test_env.yaml # docs/releasehistory.md # openff/toolkit/topology/molecule.py # openff/toolkit/utils/toolkits.py
This pull request introduces 2 alerts when merging 58c4284 into d78912b - view on LGTM.com new alerts:
|
…to topology-biopolymer-refactor
Common caps substructures perception and strict chirality option
* Deprecate particle methods from `Topology` API * Deprecate particles methods from `Molecule`
…fault in from_polymer_pdb, add tests (#1305)
) * Remove `use_interchange` argument Remove `ParameterHandler.create_force` and associated tools Add back `ParameterHandler_check_all_valence_terms_assigned` Also add back `ParameterHandler._assert_correct_connectivity` Update developer docs Remove allow_nonintegral_charges Fix typos Typo Remove unused exception Skip charge increment tests for now Fix JSON serialization Add to docstring of LibraryChargeType.from_molecule Cleanup Fix version serialization test Update CI Cleanup, add test for create_force Turn on some charge increment tests again * Turn on duplicate partial bond order molecule test * Re-implement `ForceField.get_partial_charges` * Re-implement `allow_nonintegral_charges` * Switch to Interchange's main branch * Revert dev-facing CI simplifications * Update openff/toolkit/tests/utils.py Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * Turn back on a 1-4 test * Add back missing kwarg exception and test * Update release history Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
* [WIP] Biopolymer rdkit smarts (#1301) * first pass at doing structure matching with rdkit instead of networkx * current best attempt at fuzzy rdkit query Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix LEU hitting max matches, merge conflict errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * porting rdkit-dependent logic to RDKitToolkitWrapper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get rdkit implementation working again. add check for unassigned atoms and bonds. get dipeptide+disulfide assignment going again. Remove unneeded code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make mypy happy Co-authored-by: Richard Gowers <richardjgowers+github@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add FAQ section on `allow_undefined_stereo=True` * Fix typos in FAQ * Fix link to cookbook using MyST magic Co-authored-by: Josh A. Mitchell <yoshanuikabundi@gmail.com> * Fix typo * Apply suggestions from code review Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * Apply suggestions from code review Co-authored-by: Josh A. Mitchell <yoshanuikabundi@gmail.com> Co-authored-by: Josh A. Mitchell <yoshanuikabundi@gmail.com> Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
* Fix more calls to deprecated methods * Revert escapes on docstrings * `from_pdb` -> `from_polymer_pdb` in tests * Apply suggestions from code review Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove particle stuff from `_mm_molecule`, run `pre-commit` * Clean up BuiltInToolkitWrapper charge assignment * Require RDKit on another test with `from_polymber_pdb` Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1f8c09f
to
0133414
Compare
* Assorted fixes to increase test coverage * Fix test * Move some functions to Interchange * Remove un-used code * Skip coverage on non-public utilities * Use `MissingOptionalDependency` from openff-utilities * Use MissingOptionalDependencyError
* Add test reproducing #1319 with T4 lysozyme * Force Molecule deepcopies to run through dicts * Fix test * Fix tests * Fix tests * Tinker with environment * Fix installs in examples workflow * Update openff/toolkit/tests/test_molecule.py Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
* fix 1308 and add test for default hierarchy scheme correctness * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * `chain` -> `chain_id` in some tests * Update openff/toolkit/topology/molecule.py Co-authored-by: Matt Thompson <mattwthompson@protonmail.com> * Update openff/toolkit/topology/molecule.py Co-authored-by: Matt Thompson <mattwthompson@protonmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthew W. Thompson <mattwthompson@protonmail.com>
* [WIP] Biopolymer rdkit smarts (#1301) * first pass at doing structure matching with rdkit instead of networkx * current best attempt at fuzzy rdkit query Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix LEU hitting max matches, merge conflict errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * porting rdkit-dependent logic to RDKitToolkitWrapper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get rdkit implementation working again. add check for unassigned atoms and bonds. get dipeptide+disulfide assignment going again. Remove unneeded code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Docs updates for hierarchy schemes * Fix sphinx warnings * Factor default chain and residue schemes to own methods * Add perceive_hierarchy to add_hierarchy_scheme and remove now-duplicated calls to it * Add residue hierarchy scheme when perceiving residues * Improve docs of hierarchy scheme stuff * Improve docs of hierarchy-adjacent objects * Changes to tests to account for moving perceive calls around * Re-export Hierarchy types from openff.toolkit.topology * Streamline links in docs * perceive_hierarchy -> update_hierarchy_schemes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Avoid overwriting existing methods with hierarchy iterators * Update tests to give correct output - Return correct second triplet of residues to test (CYX is two ACE-CYS-NME chains with a disulfide bond) - Add second chain (each `Molecule` in the `Topology` has its own HierarchyScheme, and elements within them are not consolidated) * Add new hierarchy scheme directly in _initialize_from_dict * Work around case where self._hierarchy_schemes is undefined in __getattr__ * Add type hint required by mypy * atom.properties -> atom.metadata in add_hierarchy_scheme docstring Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * Document reason behind manual addition of hierarchy scheme Add comment describing why we don't call `add_hierarchy_scheme` from `_initialize_from_dict` Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Richard Gowers <richardjgowers+github@gmail.com> Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
* Warn when using AM1-BCC or similar methods on large molecules * Fix typo * Point to `stable` tag in docs
* Use `pytest.approx` to safeguard against WBO flakiness * Fix interaction between `openmm.unit.Quantity` and `pytest.approx`
* Drop ParmEd from protein-ligand workflows * Do not uninstall openff-toolkit-base when not installed * Turn off broken cells, update calls to deprecated methods * Update environment, fix typo * Cleanup and fixes * Require RDKit for loading protein PDBs * Move stuff around with /examples/external/ * Update action * Update action * Re-run notebook without exceptions * Move notebook * Add back old BRD4 example * Update workflow for new notebook path
* Improve error description when combining bond handlers (#719) * Add reproducing test
* Correct warning about allow_undefined_stereo argument in cookbook * Revise remapping and from topology sections * Increase cell execution timeout so QCArchive cell completes * Increase timeout further * Increase minimum myst-nb version * Add details about force field stereochemistry dependence to FAQ * Pare down undefined stereochemistry warning and link to FAQ
* progress toward oe implementation * initial implementation of OE graph matching * Note why code doesn't use OEQMol * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * final cleanups * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Start `Topology.to_openmm` vs `Interchange.to_openmm_topology` * Add note to self
This can probably be closed now that all of these changes are in the default branch. |
I think we're forced to "close" this but it has been implicitly merged in - the default branch is now Of course, please re-open if I'm wrong! But I don't see any other ending here. |
Feature branch plans: https://openforcefield.atlassian.net/wiki/spaces/IN/pages/1579122724/2021+Topology+Refactor
Do not edit this branch directly -- All changes should go through PR review!
Notes and to-dos inherited from #881
Adds
cachetools
as a requirementFrozenMolecule.ordered_connection_table_hash
)to_openeye
andto_rdkit
usingcachelib
Atom.molecule_atom_index
, since the data structure makes self-index lookup an O(n_atoms_in_molecule) operation.Molecule._invalidate_cached_properties
now recursively deletes this cached property on the molecule's Atoms.@classmethod
designation fromOpenEyeToolkitWrapper.to_openeye
andRDKitToolkitWrapper.to_rdkit
(I don't see a need for them to be classmethods, this was probably from before when I knew what I was doing)Molecule.to_openeye
andto_rdkit
accept a toolkit registry kwarg. It's unclear why they didn't before (a user could reasonably want to divertMolecule.to_rdkit
through a custom ToolkitWrapper)to_openeye
/to_rdkit
actually detects when a aromaticity model is requestedNotes and to-dos inherited from #973
Remaining things that this doesn't do are:
Figure out what's up with the proline substructure that gets anI'm pretty sure this is a problem withN-
in itcomponents.cif
Notes and to-dos inherited from #982
pycifrw
as dependency to some recipesNotes and to-dos from #1072
Better way to deal with strict chirality. Check code dev notesWe won't do this - A partnering company has asked us to remove stereomarks from protein FFs.Notes and to-dos from #1085
Molecule
-levelHierarchySchemes
andHierarchyElements
, providing ways to generate a static index where atoms are grouped according to their common metadata.Molecule
s, deprecatingTopologyMolecule
,TopologyAtom
,TopologyVirtualSite
, etcHierarchySchemes
from the Topology level.to
/from_dict
methods for Topologyprotein_param_test.tar.gz
Notes and to-dos from #1097
Molecule
s, removingTopologyMolecule
,TopologyAtom
,TopologyVirtualSite
, etctest_topology.py
to use basic pytest featuresHierarchySchemes
from the Topology level, usingofftop.hierarchy_iterator
(I'm happy to change the name of this in a future PR)to
/from_dict
methods for TopologyMolecule.particle_index(particle)
,atom(index)
,atom_index(atom)
,virtual_site(index)
,virtual_site_index(vsite)
,virtual_site_particle_start_index(vsite)
, andbond(index)
.Topology.atoms
,atom_index(atom)
,particle_index(particle)
,virtual_site_particle_start_index(virtual_site)
,molecule_index(molecule)
,molecule_atom_start_index(molecule)
,molecule_virtual_particle_start_index(molecule)
master
)ToolkitAM1BCCHandler
(done in Improve topology refactor performance, offer API point to group molecules #1140)Topology
class and implement heuristics to recover original performanceForceField
aromaticity models{Molecule,Topology}.{atom,particle,virtual_site}{,s,_index}
and counters liken_atoms
are exposed and tested.Topology.{to,from}_openmm
Notes and to-dos from #1105
aa_variants.cif
(and some manually-added caps)perceive_residues
still works with newly-generated substructure filesperceive_residues
) AND bond order/formal charge assignment (from_pdb
)molecule.py
Handle loading PDBs with multiple components (even if they're a mix of polymer and small molecules) (do we really want to support this? It's easy to split PDBs)This is out-of-scope for the 0.11.0 releaseOffer API point for residue substructure perception when loading OpenMM topologiesThis is out of scope for the 0.11.0 releaseresidue_name
,residue_number
metadataatom_name
metadataperceive_residues
assigns appropriate residue numbers at disulfide bridges (numbering should increment at each peptide bond, NOT jump disulfide bonds)aa-variants
fileNotes and to-dos from #1140
Topology.identical_molecule_groups
docstring so actual humans can read itNotes and to-dos from #1195
to
/from
) (openmm
,rdkit
,openeye
) to handle hierarchy metadataNotes and to-dos from #1285
test_parameters
highlighted at this link get ported to Interchange