Add "desire paths" to API #1192

Yoshanuikabundi · 2022-02-09T05:07:56Z

This PR re-exports select classes and functions to improve their accessibility:

openff.toolkit.ForceField -> openff.toolkit.typing.engines.smirnoff.forcefield.ForceField (Should this be renamed SmirnoffFF or something to account for future forcefield types? I think having a unified ForceField type is very valuable)
openff.toolkit.get_available_force_fields -> openff.toolkit.typing.engines.smirnoff.forcefield.get_available_force_fields
openff.toolkit.Molecule -> openff.toolkit.topology.molecule.Molecule
openff.toolkit.Topology -> openff.toolkit.topology.topology.Topology
A new openff.toolkit.typing.engines.smirnoff.parametertypes module that re-exports the ParameterType classes from the class attributes in openff.toolkit.typing.engines.smirnoff.parameters (and the docs have been redirected to point to these re-exports rather than the class attributes, which fixes a series of sphinx warnings)

It also fixes some documentation warnings, such as by removing API references to the removed TopologyAtom, TopologyBond, TopologyVirtualSite, TopologyVirtualParticle and TopologyMolecule classes, and adds the ConstraintType and ConstraintHandler classes to the API reference.

Lint codebase
Update changelog

…om etc)

codecov · 2022-02-09T05:25:03Z

Codecov Report

❗ No coverage uploaded for pull request base (topology-biopolymer-refactor@1115d25). Click here to learn what that means.
The diff coverage is n/a.

mattwthompson · 2022-02-09T15:22:23Z

I have to raise a point of concern that exposing everything at the top level like this will drastically slow import times for all imports:

$ git checkout upstream/desire_paths
Previous HEAD position was a7a9f186 Bump actions/setup-python from 2.3.1 to 2.3.2 (#1189)
HEAD is now at 2c36a0cf Update changelog
$ time python -c "from openff.toolkit import __file__"
python -c "from openff.toolkit import __file__"  3.03s user 0.94s system 78% cpu 5.025 total
$ git checkout upstream/topology-biopolymer-refactor
Previous HEAD position was 2c36a0cf Update changelog
HEAD is now at 0beef754 Add `use_interchange` argument to `ForceField.create_openmm_system` (#1165)
$ time python -c "from openff.toolkit import __file__"
python -c "from openff.toolkit import __file__"  0.06s user 0.06s system 83% cpu 0.136 total
$ git checkout upstream/master
Previous HEAD position was 0beef754 Add `use_interchange` argument to `ForceField.create_openmm_system` (#1165)
HEAD is now at a7a9f186 Bump actions/setup-python from 2.3.1 to 2.3.2 (#1189)
$ time python -c "from openff.toolkit import __file__"
python -c "from openff.toolkit import __file__"  0.07s user 0.06s system 79% cpu 0.168 total

My hardware is aging so it may be closer to 2 seconds on a faster CPU & disk. This isn't so impactful for workflows that are already importing the major classes in the toolkit, but it unfortunately it does mean that any downstream library built off of any individual component of the toolkit will need to import all of it, i.e. something just wanting to use the Molecule class for file parsing will need to pull in all of the typing machinery.

Waiting 2-3 seconds for an interpreter or script to start up isn't the end of the world, but these can easily add up (hgrecco/pint#1460) and grow to 5+ seconds, which IMO is not a good user experience.

j-wags · 2022-02-10T00:59:19Z

Hm, darn. This would be a really convenient thing for users. Are we sure that @mattwthompson's tests reflect real use cases? I don't know if we actually expose anything at the openff.toolkit level, so I'm not sure that people would be importing from there in the first place.

Yoshanuikabundi · 2022-02-10T02:18:13Z

Thanks for pointing that out Matt, I hadn't considered it! I agree that this might be a net loss if it dramatically increases import times for typical users. I think we're in luck though.

Without this PR, import openff.toolkit is more or less useless. The only useful objects you get out of it is openff.toolkit.__version__ and the builtin stuff. Since it doesn't import other modules, you can't even use it to get deeper into the toolkit:

>>> import openff.toolkit
>>> openff.toolkit.topology.molecule.Molecule
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'openff.toolkit' has no attribute 'topology'

This is changed by this PR, which I think is more in line with people's expectation. It takes more time because it actually imports the whole toolkit. (or at least, all of the toolkit that is used by Molecule or ForceField... which seems to be most of it. I have learnt a lot about the minutiae of Python importing today!)

I didn't realise this, but Python runs all of the __init__.py files for parent packages when it loads a module, so this PR does mean that the toolkit's entire public API is loaded when any module is imported.

However, this PR barely affects the time taken to import the main classes.

I think this is because all the main modules are complex enough to incidentally import most of the rest of the toolkit. I've done some benchmarking with hyperfine, which runs the provided command repeatedly to get stats on the time it takes. These times are probably close to the fastest modern hardware can go, and they're definitely long enough to be problematic, but I don't think they support holding back this PR. Here are my benchmarks:

$ git checkout topology-biopolymer-refactor
Already on 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.

$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule"'
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule"
  Time (mean ± σ):     814.2 ms ±  12.7 ms    [User: 1068.9 ms, System: 1388.2 ms]
  Range (min … max):   789.0 ms … 832.1 ms    10 runs
 
$ hyperfine 'python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):     825.3 ms ±  13.2 ms    [User: 1062.0 ms, System: 1408.6 ms]
  Range (min … max):   800.3 ms … 843.7 ms    10 runs
 
$ git checkout desire_paths                                                                       
Switched to branch 'desire_paths'

$ hyperfine 'python -c "from openff.toolkit import ForceField"' 
Benchmark 1: python -c "from openff.toolkit import ForceField"
  Time (mean ± σ):     839.0 ms ±   6.1 ms    [User: 1091.1 ms, System: 1391.4 ms]
  Range (min … max):   829.5 ms … 847.2 ms    10 runs
 
$ hyperfine 'python -c "from openff.toolkit import Molecule"'  
Benchmark 1: python -c "from openff.toolkit import Molecule"
  Time (mean ± σ):     839.0 ms ±   9.9 ms    [User: 1074.8 ms, System: 1402.9 ms]
  Range (min … max):   823.5 ms … 854.8 ms    10 runs

So the new desire paths are, like, a few per cent slower. Probably an imperceptible amount of time for any computer recent enough to run Python 3.7. At these levels, there can be variance between runs even if you repeat them due to caching and clock boost and CPU/disk temps and stuff like that, so even this 2 or 3 per cent might not be real (I can't reliably reproduce it).

In particular, importing Molecule apparently imports so much of the typing machinery that importing ForceField straight after is basically free:

$ git checkout topology-biopolymer-refactor                                                       
Switched to branch 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.
 
$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule"'                   
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule"
  Time (mean ± σ):     822.1 ms ±  11.3 ms    [User: 1059.9 ms, System: 1400.8 ms]
  Range (min … max):   802.0 ms … 837.2 ms    10 runs
 
$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule; from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule; from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):     826.0 ms ±  15.3 ms    [User: 1080.0 ms, System: 1389.2 ms]
  Range (min … max):   789.3 ms … 841.2 ms    10 runs

Some cases that do get slower with this PR are checking the version:

$ git checkout topology-biopolymer-refactor
Already on 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.

$ hyperfine 'python -c "from openff.toolkit import __version__"'
Benchmark 1: python -c "from openff.toolkit import __version__"
  Time (mean ± σ):      30.7 ms ±   4.7 ms    [User: 24.1 ms, System: 7.1 ms]
  Range (min … max):    25.2 ms …  42.5 ms    105 runs
 
$ git checkout desire_paths                                                    
Switched to branch 'desire_paths'

$ hyperfine 'python -c "from openff.toolkit import __version__"'
Benchmark 1: python -c "from openff.toolkit import __version__"
  Time (mean ± σ):     826.5 ms ±  14.4 ms    [User: 1070.6 ms, System: 1390.3 ms]
  Range (min … max):   808.9 ms … 855.4 ms    10 runs

and importing small modules that don't depend on other modules:

$  git checkout topology-biopolymer-refactor                                    

$ hyperfine 'python -c "import openff.toolkit.utils.constants"'
Benchmark 1: python -c "import openff.toolkit.utils.constants"
  Time (mean ± σ):     452.5 ms ±   7.1 ms    [User: 470.4 ms, System: 468.0 ms]
  Range (min … max):   439.5 ms … 461.7 ms    10 runs

$  git checkout desire_paths         

$ hyperfine 'python -c "import openff.toolkit.utils.constants"'        
Benchmark 1: python -c "import openff.toolkit.utils.constants"
  Time (mean ± σ):     843.5 ms ±  10.2 ms    [User: 1081.1 ms, System: 1402.5 ms]
  Range (min … max):   827.8 ms … 862.7 ms    10 runs

TL;DR Importing anything in the toolkit now imports most of the toolkit. For the main classes, this doesn't change the time it takes because they're complex enough that the whole toolkit gets imported anyway. For small modules with few imports, this can increase import times. I think this is acceptable for the usability wins with IPython and in notebooks, where tab completion can now take you through the entire toolkit, and for saving users from memorizing some pretty obscure paths. It also lets us explicitly declare our public API, which is useful for any future automated documentation.

Yoshanuikabundi · 2022-02-10T02:44:30Z

Don't think this is useful but Hyperfine is cool

hyperfine -L branch desire_paths,topology-biopolymer-refactor -L script "from openff.toolkit.topology.molecule import Molecule","from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField" -p "git checkout {branch}" -n '{branch}:{script}' 'python -c "{script}"' --export-markdown desire_path_benchmarks.md

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`desire_paths:from openff.toolkit.topology.molecule import Molecule`	844.6 ± 14.5	824.5	873.6	1.02 ± 0.02
`topology-biopolymer-refactor:from openff.toolkit.topology.molecule import Molecule`	838.6 ± 23.8	796.2	877.3	1.01 ± 0.03
`desire_paths:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField`	839.7 ± 17.8	814.3	870.7	1.01 ± 0.03
`topology-biopolymer-refactor:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField`	827.8 ± 13.3	805.5	849.3	1.00

j-wags · 2022-02-10T18:14:55Z

Wow, hyperfine is really cool - Thanks for sharing your timing commands, @Yoshanuikabundi!

I ran the same on my computer (with a billion tabs open, pycharm, slack, etc) and I saw the same relative timings as you, though another important result is that your computer is 2-3x faster than my 2018 MBP, and my computer is probably faster than many of our users'. I've also included a few "lighter" imports that users might reasonably call and saw a significant performance degradation there.

Command	Mean [s]	Min [s]	Max [s]	Relative
`desire_paths:from openff.toolkit.topology.molecule import Molecule`	2.722 ± 0.200	2.596	3.237	1.73 ± 0.13
`topology-biopolymer-refactor:from openff.toolkit.topology.molecule import Molecule`	2.654 ± 0.143	2.561	3.023	1.69 ± 0.09
`desire_paths:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField`	2.737 ± 0.223	2.555	3.272	1.74 ± 0.14
`topology-biopolymer-refactor:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField`	2.675 ± 0.081	2.584	2.788	1.70 ± 0.05
`desire_paths:from openff.toolkit.utils import constants`	2.658 ± 0.106	2.580	2.931	1.69 ± 0.07
`topology-biopolymer-refactor:from openff.toolkit.utils import constants`	1.597 ± 0.028	1.554	1.661	1.02 ± 0.02
`desire_paths:from openff.toolkit.utils import RDKitToolkitWrapper`	2.693 ± 0.183	2.541	3.175	1.71 ± 0.12
`topology-biopolymer-refactor:from openff.toolkit.utils import RDKitToolkitWrapper`	1.573 ± 0.017	1.555	1.602	1.00

There's a fine balance to this tradeoff -

The negatives are that
- we significantly slow down some use cases
- we increase the API's surface area
- we make it impossible to undo this slowdown without breaking the API
The positives are that
- most users get to use simpler import paths

I think the negatives here outweigh the positives.

So, to move forward, let's sink these quick imports behind something like openff.toolkit.app. That would gain most of the benefit of these changes and shake off most of the negatives. So, imports like

from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.utils import RDKitToolkitWrapper

become

from openff.toolkit.app import (Molecule, ForceFIeld, RDKitToolkitWrapper)

I'm happy to discuss alternative names than app, that's just the first one that comes to mind since openmm uses it.

mattwthompson · 2022-02-10T21:37:59Z

Without this PR, import openff.toolkit is more or less useless.

Agree, I use it for __version__ and __file__. My understanding, though, is that putting everything in the top-level __init__.py has the effect of importing everything whenever any import is hit, whereas currently importing just one class might not bring in everything. (Though, as you accurately point out, things are intertwined here in ways that make importing one thing already similar to importing everything.) I grant that ~90% of the use cases are going to pull into Molecule and ForceField. I agree with your assessment that import times of the other modules aren't so important. I actually didn't even remember there was a .constants module, but my memory is neither here nor there ...

importing Molecule apparently imports so much of the typing machinery that importing ForceField straight after is basically free

This is counter-intuitive to me, and looking around in the files openff/toolkit/topology/*/*.py gives me no clue as to how this happens. Personally I'd like to find a way to untangle these/lazy-load where possible, but I won't put effort into that if we go with an approach that puts everything in openff/toolkit/__init__.py. If I understand import paths accurately, this would be worth exploring after something like Jeff's openff.toolkit.app idea, if that's what is chosen.

In the interest of fairness, I used tuna to see why the imports are so incredibly slow, and was reminded it's because it's of Mendeleev bringing in Pandas and some other things:

This will be fixed when #1182 is merged, though I doubt it will affect the numbers enough to alter any conclusions. The hardware I use most often work is pretty old, also:

I'm not the ultimate reviewer of changes to the toolkit. I did, however, want to voice the general concern of import times here given that

This toolkit is likely to increase in size and complexity (LOC, functionality, dependencies, ...) over time
Most of the OpenFF stack will import something from the toolkit
Most workflows are going to import multiple other libraries, some of which are probably heavy themselves
Many users do star imports

As a matter of preference and sharing my biases, I'll add that

I prefer, all other things being equal, the ability to import a portion of a library without pulling in everything
I hate star imports
I think really long import times (i.e. 5+ seconds) are unsightly and I'm concerned projects like Interchange will have a hard time avoiding this on mid-range hardware
I am impatient and prone to context switching when a command like interpreter startup takes several seconds to run (okay, this is maybe a wetware issue)

j-wags · 2022-02-10T23:38:32Z

@mattwthompson Would the openff.toolkit.app solution remove all the negatives from your point of view?

mattwthompson · 2022-02-10T23:59:23Z

It does seem worth it. Provided we want a single, short-ish path that provides a the key classes and provided we avoid using it internally, I don't see substantial new negatives introduced and it leaves the door open to speeding up imports whenever we do tackle that.

I have no preference on app naming; I don't think it's a common pattern in Python libraries, or at least it's not something I see commonly enough to have other good points of reference.

I did some profiling after merging #1182, which shows some new issues. But they're out of scope for this PR and I split them out here: openforcefield/openff-units#17

lgtm-com · 2022-02-11T04:15:25Z

This pull request introduces 4 alerts when merging 4f4844f into 1115d25 - view on LGTM.com

new alerts:

4 for Explicit export is not defined

Yoshanuikabundi · 2022-02-11T04:18:42Z

Cool! I've moved the new imports to the new openff.toolkit.app module. I think this is a good compromise. Should I add any toolkit wrappers to it?

I've also implemented a lazy importing system for openff.toolkit. It delays the actual import to when an object is asked for, but still shows it in tab completion and so forth. It uses a new mechanism added in Python 3.7 (PEP 562, note that lazy imports are one of the suggested use cases). I don't think its worth the added complexity and magic, but it only took a few minutes to whip up so I thought I'd point it out as a possibility. I'm expecting to remove it before merging (leaving openff/toolkit/__init__.py in the same state it was before this PR)

lilyminium · 2022-02-11T05:07:33Z

Should I add any toolkit wrappers to it?

If it doesn't make anything harder, I'd (as a user) love that. Thanks for this PR, it'll take away one of the serious pain points of doing anything with a ForceField!

I've also implemented a lazy importing system for openff.toolkit.

This is cool!

Yoshanuikabundi · 2022-02-15T02:53:06Z

I've added all 4 toolkit wrappers, the ToolkitRegistry class, and the GLOBAL_TOOLKIT_REGISTRY constant to the app module, as requested by @lilyminium

lgtm-com · 2022-02-15T02:59:11Z

This pull request introduces 4 alerts when merging ed96e05 into 1115d25 - view on LGTM.com

new alerts:

4 for Explicit export is not defined

j-wags

Wow! I REALLY like the design for lazy loading in __init__.py, and it gives us everything that the openff.toolkit.app module would, with only a moderate tradeoff in terms of complexity.

@mattwthompson and @lilyminium - I've never futzed with import paths before like this. Does the __init__.py design look legit to you? @Yoshanuikabundi's code follows closely from the linked PEP so I have some confidence that it's pretty mainstream, but I'd like to get another set of eyes on it. If either of you can vouch for the correct-ish-ness of it, could you give an approving review?

j-wags · 2022-02-17T01:06:29Z

openff/toolkit/typing/engines/smirnoff/parametertypes.py

+"""Re-exports of concrete ParameterTypes
+
+openff.toolkit.typing.engines.smirnoff.parameters defines a number of parameter
+types within class definitions of the corresponding ParameterHandler. This module
+re-exports them for discoverability and ease of use."""


(blocking) Per our discussion today, let's

delete this file

move the exporting of the ParameterTypes to parameters.py itself

include in exported ParameterTypes in the __all__ for parameters.py

Add a unit test to test_parameters that ensures that all ParameterType subclasses owned by ParameterHandler subclasses are exposed via re-export and are in __all__

j-wags · 2022-02-17T01:28:42Z

docs/typing.rst

-    VirtualSiteHandler.VirtualSiteMonovalentLonePairType
-    VirtualSiteHandler.VirtualSiteDivalentLonePairType
-    VirtualSiteHandler.VirtualSiteTrivalentLonePairType
+    ConstraintType


🤦 Thanks for catching this!

j-wags · 2022-02-17T01:29:03Z

docs/topology.md

-    TopologyAtom
-    TopologyBond
-    TopologyVirtualSite


mattwthompson · 2022-02-17T16:34:00Z

My glances through earlier versions of this PR didn't expose any obvious red flags but I will give this a more thorough review later today.

j-wags · 2022-02-17T16:53:31Z

Thanks, @mattwthompson! And reading this morning, I realized that it may not have been clear what I'm asking about - The path forward I'm considering would be completely deleting openff/toolkit/app.py and instead letting people do from openff.toolkit import Molecule using the "lazy-loading" functionality from __init__.py. So I've reviewed the rest of the PR, I'd just like to see if anyone can provide a more certain vote of confidence/reliability for the new code in __init__.py.

mattwthompson

This gets an approval from me! Assuming app.py is removed as per the proposal, though it's not a file that's imported anywhere so it should not have an impact on what I tinkered with. I also noticed there's a difference between what's in app.py and what's currently lazy-loaded in openff/toolkit/__init__.py. I have no preferences but wanted to point this out in case from openff.toolkit import RDKitToolkitWrapper was expected to work.

The import machinery looks good to me. We did the same thing in #1021 (and removed in #1156) when we wanted to have module-level exceptions but have them only imported when explicitly imported. We didn't call this lazy-loading at the time, but I think that's the definition.

The behavior does what I'd expect:

In [1]: from openff.toolkit import Molecule, Topology

In [2]: locals().keys()
Out[2]: dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '_ih', '_oh', '_dh', 'In', 'Out', 'get_ipython', 'exit', 'quit', '_', '__', '___', '_i', '_ii', '_iii', '_i1', 'OE_1996539322038294600', 'OE_3449561020318279929', 'OE_3534227101456629665', 'OE_7544655256171905223', 'OE_14481470454631586240', 'OE_2886910160464464284', 'OE_12830519133454273888', 'OE_7146837923505175476', 'OE_1333624159125614004', 'OE_7190599916362654688', 'OE_18194288045002224399', 'OE_2920715238402669782', 'OE_10862347053494700408', 'OE_1152104254218024490', 'OE_12407415621277732493', 'OE_2699086864785953591', 'OE_13975590391927931629', 'Molecule', 'Topology', '_i2'])

In [3]: from openff.toolkit import ForceField

In [4]: locals().keys()
Out[4]: dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '_ih', '_oh', '_dh', 'In', 'Out', 'get_ipython', 'exit', 'quit', '_', '__', '___', '_i', '_ii', '_iii', '_i1', 'OE_1996539322038294600', 'OE_3449561020318279929', 'OE_3534227101456629665', 'OE_7544655256171905223', 'OE_14481470454631586240', 'OE_2886910160464464284', 'OE_12830519133454273888', 'OE_7146837923505175476', 'OE_1333624159125614004', 'OE_7190599916362654688', 'OE_18194288045002224399', 'OE_2920715238402669782', 'OE_10862347053494700408', 'OE_1152104254218024490', 'OE_12407415621277732493', 'OE_2699086864785953591', 'OE_13975590391927931629', 'Molecule', 'Topology', '_i2', '_2', '_i3', 'ForceField', '_i4'])

In [5]: from openff.toolkit import Foo
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-5-84c3b7d3429c> in <module>
----> 1 from openff.toolkit import Foo

ImportError: cannot import name 'Foo' from 'openff.toolkit' (/Users/mwt/software/openforcefield/openff/toolkit/__init__.py)

The timings are ultimately not good, but they're not worse, so it's a fair tradeoff. My SSD is in poor shape (will be replaced soon!), so these are slower times than I'd normally expect.

git checkout upstream/topology-biopolymer-refactor
hyperfine --min-runs 20 'python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
hyperfine --min-runs 20 'python -c "from openff.toolkit.topology import Molecule"'
hyperfine --min-runs 20 'python -c "from openff.toolkit.topology import Topology"'
git checkout upstream/desire_paths
hyperfine --min-runs 20 'python -c "from openff.toolkit import ForceField"'
hyperfine --min-runs 20 'python -c "from openff.toolkit import Molecule"'
hyperfine --min-runs 20 'python -c "from openff.toolkit import Topology"'

Previous HEAD position was ed96e05c Add toolkits to app module
HEAD is now at 1115d256 Refactor to openff.units.elements (#1182)
Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):      2.506 s ±  0.197 s    [User: 1.948 s, System: 0.628 s]
  Range (min … max):    2.294 s …  2.938 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Molecule"
  Time (mean ± σ):      2.335 s ±  0.067 s    [User: 1.938 s, System: 0.609 s]
  Range (min … max):    2.237 s …  2.484 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Topology"
  Time (mean ± σ):      2.344 s ±  0.060 s    [User: 1.953 s, System: 0.615 s]
  Range (min … max):    2.257 s …  2.446 s    20 runs

Previous HEAD position was 1115d256 Refactor to openff.units.elements (#1182)
HEAD is now at ed96e05c Add toolkits to app module
Benchmark 1: python -c "from openff.toolkit import ForceField"
  Time (mean ± σ):      2.494 s ±  0.183 s    [User: 2.015 s, System: 0.646 s]
  Range (min … max):    2.288 s …  2.873 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Molecule"
  Time (mean ± σ):      2.543 s ±  0.256 s    [User: 2.013 s, System: 0.645 s]
  Range (min … max):    2.251 s …  3.181 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Topology"
  Time (mean ± σ):      2.637 s ±  0.432 s    [User: 2.017 s, System: 0.658 s]
  Range (min … max):    2.282 s …  3.894 s    20 runs

This is probably due to the Pandas issue I linked, so I removed it from my environment and got slightly better tiimes:

Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):      2.403 s ±  0.382 s    [User: 1.648 s, System: 0.606 s]
  Range (min … max):    1.907 s …  3.740 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Molecule"
  Time (mean ± σ):      1.992 s ±  0.135 s    [User: 1.552 s, System: 0.520 s]
  Range (min … max):    1.842 s …  2.385 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Topology"
  Time (mean ± σ):      1.904 s ±  0.055 s    [User: 1.543 s, System: 0.514 s]
  Range (min … max):    1.821 s …  1.992 s    20 runs

Previous HEAD position was 1115d256 Refactor to openff.units.elements (#1182)
HEAD is now at ed96e05c Add toolkits to app module
Benchmark 1: python -c "from openff.toolkit import ForceField"
  Time (mean ± σ):      1.979 s ±  0.109 s    [User: 1.549 s, System: 0.515 s]
  Range (min … max):    1.867 s …  2.166 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Molecule"
  Time (mean ± σ):      1.941 s ±  0.110 s    [User: 1.546 s, System: 0.518 s]
  Range (min … max):    1.838 s …  2.203 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Topology"
  Time (mean ± σ):      1.908 s ±  0.077 s    [User: 1.540 s, System: 0.514 s]
  Range (min … max):    1.804 s …  2.128 s    20 runs

Basically Molecule and Topology take identical amounts of time but ForceField is faster, which is great. I dug around a bit and wasn't able to figure out why. I think it has something to do with the other things snuck along the way in the long import paths (star imports and/or __all__ definitions) but I can't say for sure. There's some more work to do here but I'm getting way off track here and digging into non-blockers.

lgtm-com · 2022-02-21T06:54:21Z

This pull request introduces 10 alerts when merging 88bbb7e into 0e18422 - view on LGTM.com

new alerts:

10 for Explicit export is not defined

lgtm-com · 2022-02-21T07:08:54Z

This pull request introduces 10 alerts when merging 9080c6f into 0e18422 - view on LGTM.com

new alerts:

10 for Explicit export is not defined

Yoshanuikabundi · 2022-02-21T07:51:58Z

I have removed the app module and added the toolkit re-exports to openff.toolkit (thanks for the reminder Matt!)

I've also moved the ParameterType re-exports to the bottom of parameters.py and written a test that any new ParameterTypes are re-exported in the same way. This test would fail, as there is an undocumented and un-re-exported ParameterType called VirtualSiteType. This class is marked as an abstract base class (though it has no abstract methods) and has a number of subclasses (VirtualSiteMonovalentLonePairType, VirtualSiteDivalentLonePairType etc), all of which are re-exported.

I didn't re-export this class because I think it's essentially an implementation detail, but it's not clear to me how to exclude it from the test. I see a number of options:

Have the test exclude abstract base classes, and make it an ABC by adding an abstract method (I've implemented this one)
Make it private by renaming it _VirtualSiteType (the test must exclude private classes to avoid complaining about _INFOTYPE a lot)
Re-export it
Add a specific exception to the test (like if paramtype is VirtualSiteType: continue)

I chose 1 because the class is currently in a weird place, being marked as an ABC but not having any abstract methods. Without any abstract methods, Python considers it to be a concrete class, so inheriting from abc.ABC doesn't really do anything productive. It seems like VirtualSiteType._add_virtual_site was once an abstract method called add_virtual_site, but was at some point converted to a private method, but the git blame shows this is not the case. All the subclasses implement add_virtual_site with the same signature and docstring, which seems like a clear case of a missing abstract method declaration, so I've added the declaration.

Is VirtualSiteType intended to be used by users for implementing their own VirtualSites? If so (3) is probably the best solution. If not, (1) or (2) probably is. In any case, I would like to keep the abstract method (and possibly remove the exception for ABCs from the test), assuming it doesn't break anything in tests.

Background: VirtualSiteType was made an ABC here, but no methods were marked as abstract. This happened after the base class _?add_virtual_site was made private.
ABCs are supposed to allow base classes to be declared that cannot be instantiated on their own, but that can be used for checking that subclasses inherit from them. An ABC can then implement abstract methods that subclasses must implement, providing a way of declaring a generic interface as an alternative to duck typing. I love this because I love type systems, but Python's implementation is a mess. The docs for the abc.ABC class say:

A helper class that has ABCMeta as its metaclass. With this class, an abstract base class can be created by simply deriving from ABC avoiding sometimes confusing metaclass usage, for example

And then gives an example that declares no abstract methods. However, the PEP says

Implementation: The @AbstractMethod decorator sets the function attribute isabstractmethod to the value True. The ABCMeta.new method computes the type attribute abstractmethods as the set of all method names that have an isabstractmethod attribute whose value is true. It does this by combining the abstractmethods attributes of the base classes, adding the names of all methods in the new class dict that have a true isabstractmethod attribute, and removing the names of all methods in the new class dict that don't have a true isabstractmethod attribute. If the resulting abstractmethods set is non-empty, the class is considered abstract, and attempts to instantiate it will raise TypeError.

In other words, simply inheriting from ABC or using the ABCMeta metaclass is not sufficient to create an ABC; you must also declare an abstract method. I think it's possible for someone to follow the Python docs and add the ABC parent class but no abstract methods expecting this to make the class abstract.

lgtm-com · 2022-02-21T08:03:52Z

This pull request introduces 10 alerts when merging 58cde36 into 0e18422 - view on LGTM.com

new alerts:

10 for Explicit export is not defined

mattwthompson · 2022-02-21T21:45:28Z

Is VirtualSiteType intended to be used by users for implementing their own VirtualSites?

Not really, the unspoken recommendation here is for users to specify their parameters in OFFXML and let the toolkit handle everything. Unless you're asking about implementing an entire new type (as in variety/flavor/version) of virtual sites with a new handler. That would be more or less uncharted territory and I'm personally fine with the user experience there being ambiguous and not well-supported.

I think the answer is "no" and making them proper abstract classes is the way to go. To be honest I also think that this corner of the codebase is so rough that any changes not making it substantially worse is good, but fixing it might be best split out into another PR so the original focus of this PR can be implemented without needed to make decisions on how the virtual site classes should be improved.

Yoshanuikabundi · 2022-02-22T04:58:23Z

Sounds good. Shall I merge this as-is?

mattwthompson · 2022-02-22T05:09:18Z

Earlier I didn't see that Jeff's comment earlier defers to me as a sufficient reviewer - now I do, and yes, go ahead and merge.

I know I put up some resistance here but I am looking forward to typing fewer characters at the top of every script!

Yoshanuikabundi · 2022-02-22T08:42:17Z

No worries, your input definitely made this PR better!!

Yoshanuikabundi added 6 commits February 9, 2022 15:16

Re-export fundamental types from openff.toolkit

513f998

Re-export parametertypes in new module

911fd71

Add ConstraintHandler and ConstraintType to API docs

d02406c

Point API docs at re-exported parameter types

ec621c8

Remove data and tests from __all__

eb46a4a

Fix sphinx warnings (including removing docs references to TopologyAt…

66f0c31

…om etc)

Yoshanuikabundi changed the base branch from master to topology-biopolymer-refactor February 9, 2022 05:08

Isort

81ee77f

Update changelog

2c36a0c

Yoshanuikabundi added 3 commits February 11, 2022 15:05

Move top level re-exports to new app module

be42146

Lazy imports for openff.toolkit

f00d6f9

Merge branch 'topology-biopolymer-refactor' into desire_paths

4f4844f

Add toolkits to app module

ed96e05

j-wags reviewed Feb 17, 2022

View reviewed changes

mattwthompson approved these changes Feb 18, 2022

View reviewed changes

Yoshanuikabundi added 3 commits February 21, 2022 13:27

Remove app.py and move toolkit imports to openff.toolkit

50ed154

Move ParameterType re-exports to parameters module

82e4af4

Add test checking that all ParameterTypes have been re-exported

88bbb7e

Add abstract method to VirtualSiteType base class

9080c6f

isort

58cde36

Yoshanuikabundi merged commit c95642b into topology-biopolymer-refactor Feb 22, 2022

Yoshanuikabundi deleted the desire_paths branch February 22, 2022 08:42

mattwthompson mentioned this pull request Mar 25, 2022

Use quick imports for core classes, remove simtk imports #1230

Merged

5 tasks

mattwthompson mentioned this pull request Nov 22, 2023

Re-export unit and Quantity from toolkit? #1774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "desire paths" to API #1192

Add "desire paths" to API #1192

Yoshanuikabundi commented Feb 9, 2022 •

edited

Loading

codecov bot commented Feb 9, 2022 •

edited

Loading

mattwthompson commented Feb 9, 2022

j-wags commented Feb 10, 2022

Yoshanuikabundi commented Feb 10, 2022 •

edited

Loading

Yoshanuikabundi commented Feb 10, 2022

j-wags commented Feb 10, 2022

mattwthompson commented Feb 10, 2022

j-wags commented Feb 10, 2022

mattwthompson commented Feb 10, 2022

lgtm-com bot commented Feb 11, 2022

Yoshanuikabundi commented Feb 11, 2022 •

edited

Loading

lilyminium commented Feb 11, 2022

Yoshanuikabundi commented Feb 15, 2022

lgtm-com bot commented Feb 15, 2022

j-wags left a comment

j-wags Feb 17, 2022

j-wags Feb 17, 2022

j-wags Feb 17, 2022

mattwthompson commented Feb 17, 2022

j-wags commented Feb 17, 2022

mattwthompson left a comment

lgtm-com bot commented Feb 21, 2022

lgtm-com bot commented Feb 21, 2022

Yoshanuikabundi commented Feb 21, 2022 •

edited

Loading

lgtm-com bot commented Feb 21, 2022

mattwthompson commented Feb 21, 2022

Yoshanuikabundi commented Feb 22, 2022

mattwthompson commented Feb 22, 2022

Yoshanuikabundi commented Feb 22, 2022 •

edited

Loading

Add "desire paths" to API #1192

Add "desire paths" to API #1192

Conversation

Yoshanuikabundi commented Feb 9, 2022 • edited Loading

codecov bot commented Feb 9, 2022 • edited Loading

Codecov Report

mattwthompson commented Feb 9, 2022

j-wags commented Feb 10, 2022

Yoshanuikabundi commented Feb 10, 2022 • edited Loading

Yoshanuikabundi commented Feb 10, 2022

j-wags commented Feb 10, 2022

mattwthompson commented Feb 10, 2022

j-wags commented Feb 10, 2022

mattwthompson commented Feb 10, 2022

lgtm-com bot commented Feb 11, 2022

Yoshanuikabundi commented Feb 11, 2022 • edited Loading

lilyminium commented Feb 11, 2022

Yoshanuikabundi commented Feb 15, 2022

lgtm-com bot commented Feb 15, 2022

j-wags left a comment

Choose a reason for hiding this comment

j-wags Feb 17, 2022

Choose a reason for hiding this comment

j-wags Feb 17, 2022

Choose a reason for hiding this comment

j-wags Feb 17, 2022

Choose a reason for hiding this comment

mattwthompson commented Feb 17, 2022

j-wags commented Feb 17, 2022

mattwthompson left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Feb 21, 2022

lgtm-com bot commented Feb 21, 2022

Yoshanuikabundi commented Feb 21, 2022 • edited Loading

lgtm-com bot commented Feb 21, 2022

mattwthompson commented Feb 21, 2022

Yoshanuikabundi commented Feb 22, 2022

mattwthompson commented Feb 22, 2022

Yoshanuikabundi commented Feb 22, 2022 • edited Loading

Yoshanuikabundi commented Feb 9, 2022 •

edited

Loading

codecov bot commented Feb 9, 2022 •

edited

Loading

Yoshanuikabundi commented Feb 10, 2022 •

edited

Loading

Yoshanuikabundi commented Feb 11, 2022 •

edited

Loading

Yoshanuikabundi commented Feb 21, 2022 •

edited

Loading

Yoshanuikabundi commented Feb 22, 2022 •

edited

Loading