-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "desire paths" to API #1192
Conversation
Codecov Report
|
I have to raise a point of concern that exposing everything at the top level like this will drastically slow import times for all imports:
My hardware is aging so it may be closer to 2 seconds on a faster CPU & disk. This isn't so impactful for workflows that are already importing the major classes in the toolkit, but it unfortunately it does mean that any downstream library built off of any individual component of the toolkit will need to import all of it, i.e. something just wanting to use the Waiting 2-3 seconds for an interpreter or script to start up isn't the end of the world, but these can easily add up (hgrecco/pint#1460) and grow to 5+ seconds, which IMO is not a good user experience. |
Hm, darn. This would be a really convenient thing for users. Are we sure that @mattwthompson's tests reflect real use cases? I don't know if we actually expose anything at the |
Thanks for pointing that out Matt, I hadn't considered it! I agree that this might be a net loss if it dramatically increases import times for typical users. I think we're in luck though. Without this PR, >>> import openff.toolkit
>>> openff.toolkit.topology.molecule.Molecule
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'openff.toolkit' has no attribute 'topology' This is changed by this PR, which I think is more in line with people's expectation. It takes more time because it actually imports the whole toolkit. (or at least, all of the toolkit that is used by I didn't realise this, but Python runs all of the However, this PR barely affects the time taken to import the main classes. I think this is because all the main modules are complex enough to incidentally import most of the rest of the toolkit. I've done some benchmarking with $ git checkout topology-biopolymer-refactor
Already on 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.
$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule"'
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule"
Time (mean ± σ): 814.2 ms ± 12.7 ms [User: 1068.9 ms, System: 1388.2 ms]
Range (min … max): 789.0 ms … 832.1 ms 10 runs
$ hyperfine 'python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
Time (mean ± σ): 825.3 ms ± 13.2 ms [User: 1062.0 ms, System: 1408.6 ms]
Range (min … max): 800.3 ms … 843.7 ms 10 runs
$ git checkout desire_paths
Switched to branch 'desire_paths'
$ hyperfine 'python -c "from openff.toolkit import ForceField"'
Benchmark 1: python -c "from openff.toolkit import ForceField"
Time (mean ± σ): 839.0 ms ± 6.1 ms [User: 1091.1 ms, System: 1391.4 ms]
Range (min … max): 829.5 ms … 847.2 ms 10 runs
$ hyperfine 'python -c "from openff.toolkit import Molecule"'
Benchmark 1: python -c "from openff.toolkit import Molecule"
Time (mean ± σ): 839.0 ms ± 9.9 ms [User: 1074.8 ms, System: 1402.9 ms]
Range (min … max): 823.5 ms … 854.8 ms 10 runs So the new desire paths are, like, a few per cent slower. Probably an imperceptible amount of time for any computer recent enough to run Python 3.7. At these levels, there can be variance between runs even if you repeat them due to caching and clock boost and CPU/disk temps and stuff like that, so even this 2 or 3 per cent might not be real (I can't reliably reproduce it). In particular, importing $ git checkout topology-biopolymer-refactor
Switched to branch 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.
$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule"'
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule"
Time (mean ± σ): 822.1 ms ± 11.3 ms [User: 1059.9 ms, System: 1400.8 ms]
Range (min … max): 802.0 ms … 837.2 ms 10 runs
$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule; from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule; from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
Time (mean ± σ): 826.0 ms ± 15.3 ms [User: 1080.0 ms, System: 1389.2 ms]
Range (min … max): 789.3 ms … 841.2 ms 10 runs Some cases that do get slower with this PR are checking the version:
and importing small modules that don't depend on other modules: $ git checkout topology-biopolymer-refactor
$ hyperfine 'python -c "import openff.toolkit.utils.constants"'
Benchmark 1: python -c "import openff.toolkit.utils.constants"
Time (mean ± σ): 452.5 ms ± 7.1 ms [User: 470.4 ms, System: 468.0 ms]
Range (min … max): 439.5 ms … 461.7 ms 10 runs
$ git checkout desire_paths
$ hyperfine 'python -c "import openff.toolkit.utils.constants"'
Benchmark 1: python -c "import openff.toolkit.utils.constants"
Time (mean ± σ): 843.5 ms ± 10.2 ms [User: 1081.1 ms, System: 1402.5 ms]
Range (min … max): 827.8 ms … 862.7 ms 10 runs TL;DR Importing anything in the toolkit now imports most of the toolkit. For the main classes, this doesn't change the time it takes because they're complex enough that the whole toolkit gets imported anyway. For small modules with few imports, this can increase import times. I think this is acceptable for the usability wins with IPython and in notebooks, where tab completion can now take you through the entire toolkit, and for saving users from memorizing some pretty obscure paths. It also lets us explicitly declare our public API, which is useful for any future automated documentation. |
Don't think this is useful but Hyperfine is cool hyperfine -L branch desire_paths,topology-biopolymer-refactor -L script "from openff.toolkit.topology.molecule import Molecule","from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField" -p "git checkout {branch}" -n '{branch}:{script}' 'python -c "{script}"' --export-markdown desire_path_benchmarks.md
|
Wow, I ran the same on my computer (with a billion tabs open, pycharm, slack, etc) and I saw the same relative timings as you, though another important result is that your computer is 2-3x faster than my 2018 MBP, and my computer is probably faster than many of our users'. I've also included a few "lighter" imports that users might reasonably call and saw a significant performance degradation there.
There's a fine balance to this tradeoff -
I think the negatives here outweigh the positives. So, to move forward, let's sink these quick imports behind something like from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.utils import RDKitToolkitWrapper become from openff.toolkit.app import (Molecule, ForceFIeld, RDKitToolkitWrapper) I'm happy to discuss alternative names than |
Agree, I use it for
This is counter-intuitive to me, and looking around in the files In the interest of fairness, I used This will be fixed when #1182 is merged, though I doubt it will affect the numbers enough to alter any conclusions. The hardware I use most often work is pretty old, also: I'm not the ultimate reviewer of changes to the toolkit. I did, however, want to voice the general concern of import times here given that
As a matter of preference and sharing my biases, I'll add that
|
@mattwthompson Would the |
It does seem worth it. Provided we want a single, short-ish path that provides a the key classes and provided we avoid using it internally, I don't see substantial new negatives introduced and it leaves the door open to speeding up imports whenever we do tackle that. I have no preference on I did some profiling after merging #1182, which shows some new issues. But they're out of scope for this PR and I split them out here: openforcefield/openff-units#17 |
This pull request introduces 4 alerts when merging 4f4844f into 1115d25 - view on LGTM.com new alerts:
|
Cool! I've moved the new imports to the new I've also implemented a lazy importing system for openff.toolkit. It delays the actual import to when an object is asked for, but still shows it in tab completion and so forth. It uses a new mechanism added in Python 3.7 (PEP 562, note that lazy imports are one of the suggested use cases). I don't think its worth the added complexity and magic, but it only took a few minutes to whip up so I thought I'd point it out as a possibility. I'm expecting to remove it before merging (leaving |
If it doesn't make anything harder, I'd (as a user) love that. Thanks for this PR, it'll take away one of the serious pain points of doing anything with a ForceField!
This is cool! |
I've added all 4 toolkit wrappers, the |
This pull request introduces 4 alerts when merging ed96e05 into 1115d25 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! I REALLY like the design for lazy loading in __init__.py
, and it gives us everything that the openff.toolkit.app
module would, with only a moderate tradeoff in terms of complexity.
@mattwthompson and @lilyminium - I've never futzed with import paths before like this. Does the __init__.py
design look legit to you? @Yoshanuikabundi's code follows closely from the linked PEP so I have some confidence that it's pretty mainstream, but I'd like to get another set of eyes on it. If either of you can vouch for the correct-ish-ness of it, could you give an approving review?
"""Re-exports of concrete ParameterTypes | ||
|
||
openff.toolkit.typing.engines.smirnoff.parameters defines a number of parameter | ||
types within class definitions of the corresponding ParameterHandler. This module | ||
re-exports them for discoverability and ease of use.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) Per our discussion today, let's
- delete this file
- move the exporting of the
ParameterTypes
toparameters.py
itself - include in exported
ParameterTypes
in the__all__
forparameters.py
- Add a unit test to
test_parameters
that ensures that all ParameterType subclasses owned by ParameterHandler subclasses are exposed via re-export and are in__all__
VirtualSiteHandler.VirtualSiteMonovalentLonePairType | ||
VirtualSiteHandler.VirtualSiteDivalentLonePairType | ||
VirtualSiteHandler.VirtualSiteTrivalentLonePairType | ||
ConstraintType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 Thanks for catching this!
TopologyAtom | ||
TopologyBond | ||
TopologyVirtualSite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
My glances through earlier versions of this PR didn't expose any obvious red flags but I will give this a more thorough review later today. |
Thanks, @mattwthompson! And reading this morning, I realized that it may not have been clear what I'm asking about - The path forward I'm considering would be completely deleting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets an approval from me! Assuming app.py
is removed as per the proposal, though it's not a file that's imported anywhere so it should not have an impact on what I tinkered with. I also noticed there's a difference between what's in app.py
and what's currently lazy-loaded in openff/toolkit/__init__.py
. I have no preferences but wanted to point this out in case from openff.toolkit import RDKitToolkitWrapper
was expected to work.
The import machinery looks good to me. We did the same thing in #1021 (and removed in #1156) when we wanted to have module-level exceptions but have them only imported when explicitly imported. We didn't call this lazy-loading at the time, but I think that's the definition.
The behavior does what I'd expect:
In [1]: from openff.toolkit import Molecule, Topology
In [2]: locals().keys()
Out[2]: dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '_ih', '_oh', '_dh', 'In', 'Out', 'get_ipython', 'exit', 'quit', '_', '__', '___', '_i', '_ii', '_iii', '_i1', 'OE_1996539322038294600', 'OE_3449561020318279929', 'OE_3534227101456629665', 'OE_7544655256171905223', 'OE_14481470454631586240', 'OE_2886910160464464284', 'OE_12830519133454273888', 'OE_7146837923505175476', 'OE_1333624159125614004', 'OE_7190599916362654688', 'OE_18194288045002224399', 'OE_2920715238402669782', 'OE_10862347053494700408', 'OE_1152104254218024490', 'OE_12407415621277732493', 'OE_2699086864785953591', 'OE_13975590391927931629', 'Molecule', 'Topology', '_i2'])
In [3]: from openff.toolkit import ForceField
In [4]: locals().keys()
Out[4]: dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '_ih', '_oh', '_dh', 'In', 'Out', 'get_ipython', 'exit', 'quit', '_', '__', '___', '_i', '_ii', '_iii', '_i1', 'OE_1996539322038294600', 'OE_3449561020318279929', 'OE_3534227101456629665', 'OE_7544655256171905223', 'OE_14481470454631586240', 'OE_2886910160464464284', 'OE_12830519133454273888', 'OE_7146837923505175476', 'OE_1333624159125614004', 'OE_7190599916362654688', 'OE_18194288045002224399', 'OE_2920715238402669782', 'OE_10862347053494700408', 'OE_1152104254218024490', 'OE_12407415621277732493', 'OE_2699086864785953591', 'OE_13975590391927931629', 'Molecule', 'Topology', '_i2', '_2', '_i3', 'ForceField', '_i4'])
In [5]: from openff.toolkit import Foo
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-5-84c3b7d3429c> in <module>
----> 1 from openff.toolkit import Foo
ImportError: cannot import name 'Foo' from 'openff.toolkit' (/Users/mwt/software/openforcefield/openff/toolkit/__init__.py)
The timings are ultimately not good, but they're not worse, so it's a fair tradeoff. My SSD is in poor shape (will be replaced soon!), so these are slower times than I'd normally expect.
git checkout upstream/topology-biopolymer-refactor
hyperfine --min-runs 20 'python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
hyperfine --min-runs 20 'python -c "from openff.toolkit.topology import Molecule"'
hyperfine --min-runs 20 'python -c "from openff.toolkit.topology import Topology"'
git checkout upstream/desire_paths
hyperfine --min-runs 20 'python -c "from openff.toolkit import ForceField"'
hyperfine --min-runs 20 'python -c "from openff.toolkit import Molecule"'
hyperfine --min-runs 20 'python -c "from openff.toolkit import Topology"'
Previous HEAD position was ed96e05c Add toolkits to app module
HEAD is now at 1115d256 Refactor to openff.units.elements (#1182)
Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
Time (mean ± σ): 2.506 s ± 0.197 s [User: 1.948 s, System: 0.628 s]
Range (min … max): 2.294 s … 2.938 s 20 runs
Benchmark 1: python -c "from openff.toolkit.topology import Molecule"
Time (mean ± σ): 2.335 s ± 0.067 s [User: 1.938 s, System: 0.609 s]
Range (min … max): 2.237 s … 2.484 s 20 runs
Benchmark 1: python -c "from openff.toolkit.topology import Topology"
Time (mean ± σ): 2.344 s ± 0.060 s [User: 1.953 s, System: 0.615 s]
Range (min … max): 2.257 s … 2.446 s 20 runs
Previous HEAD position was 1115d256 Refactor to openff.units.elements (#1182)
HEAD is now at ed96e05c Add toolkits to app module
Benchmark 1: python -c "from openff.toolkit import ForceField"
Time (mean ± σ): 2.494 s ± 0.183 s [User: 2.015 s, System: 0.646 s]
Range (min … max): 2.288 s … 2.873 s 20 runs
Benchmark 1: python -c "from openff.toolkit import Molecule"
Time (mean ± σ): 2.543 s ± 0.256 s [User: 2.013 s, System: 0.645 s]
Range (min … max): 2.251 s … 3.181 s 20 runs
Benchmark 1: python -c "from openff.toolkit import Topology"
Time (mean ± σ): 2.637 s ± 0.432 s [User: 2.017 s, System: 0.658 s]
Range (min … max): 2.282 s … 3.894 s 20 runs
This is probably due to the Pandas issue I linked, so I removed it from my environment and got slightly better tiimes:
Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
Time (mean ± σ): 2.403 s ± 0.382 s [User: 1.648 s, System: 0.606 s]
Range (min … max): 1.907 s … 3.740 s 20 runs
Benchmark 1: python -c "from openff.toolkit.topology import Molecule"
Time (mean ± σ): 1.992 s ± 0.135 s [User: 1.552 s, System: 0.520 s]
Range (min … max): 1.842 s … 2.385 s 20 runs
Benchmark 1: python -c "from openff.toolkit.topology import Topology"
Time (mean ± σ): 1.904 s ± 0.055 s [User: 1.543 s, System: 0.514 s]
Range (min … max): 1.821 s … 1.992 s 20 runs
Previous HEAD position was 1115d256 Refactor to openff.units.elements (#1182)
HEAD is now at ed96e05c Add toolkits to app module
Benchmark 1: python -c "from openff.toolkit import ForceField"
Time (mean ± σ): 1.979 s ± 0.109 s [User: 1.549 s, System: 0.515 s]
Range (min … max): 1.867 s … 2.166 s 20 runs
Benchmark 1: python -c "from openff.toolkit import Molecule"
Time (mean ± σ): 1.941 s ± 0.110 s [User: 1.546 s, System: 0.518 s]
Range (min … max): 1.838 s … 2.203 s 20 runs
Benchmark 1: python -c "from openff.toolkit import Topology"
Time (mean ± σ): 1.908 s ± 0.077 s [User: 1.540 s, System: 0.514 s]
Range (min … max): 1.804 s … 2.128 s 20 runs
Basically Molecule
and Topology
take identical amounts of time but ForceField
is faster, which is great. I dug around a bit and wasn't able to figure out why. I think it has something to do with the other things snuck along the way in the long import paths (star imports and/or __all__
definitions) but I can't say for sure. There's some more work to do here but I'm getting way off track here and digging into non-blockers.
This pull request introduces 10 alerts when merging 88bbb7e into 0e18422 - view on LGTM.com new alerts:
|
This pull request introduces 10 alerts when merging 9080c6f into 0e18422 - view on LGTM.com new alerts:
|
I have removed the app module and added the toolkit re-exports to I've also moved the I didn't re-export this class because I think it's essentially an implementation detail, but it's not clear to me how to exclude it from the test. I see a number of options:
I chose 1 because the class is currently in a weird place, being marked as an ABC but not having any abstract methods. Without any abstract methods, Python considers it to be a concrete class, so inheriting from Is
|
This pull request introduces 10 alerts when merging 58cde36 into 0e18422 - view on LGTM.com new alerts:
|
I think the answer is "no" and making them proper abstract classes is the way to go. To be honest I also think that this corner of the codebase is so rough that any changes not making it substantially worse is good, but fixing it might be best split out into another PR so the original focus of this PR can be implemented without needed to make decisions on how the virtual site classes should be improved. |
Sounds good. Shall I merge this as-is? |
Earlier I didn't see that Jeff's comment earlier defers to me as a sufficient reviewer - now I do, and yes, go ahead and merge. I know I put up some resistance here but I am looking forward to typing fewer characters at the top of every script! |
No worries, your input definitely made this PR better!! |
This PR re-exports select classes and functions to improve their accessibility:
openff.toolkit.ForceField
->openff.toolkit.typing.engines.smirnoff.forcefield.ForceField
(Should this be renamedSmirnoffFF
or something to account for future forcefield types? I think having a unifiedForceField
type is very valuable)openff.toolkit.get_available_force_fields
->openff.toolkit.typing.engines.smirnoff.forcefield.get_available_force_fields
openff.toolkit.Molecule
->openff.toolkit.topology.molecule.Molecule
openff.toolkit.Topology
->openff.toolkit.topology.topology.Topology
openff.toolkit.typing.engines.smirnoff.parametertypes
module that re-exports theParameterType
classes from the class attributes inopenff.toolkit.typing.engines.smirnoff.parameters
(and the docs have been redirected to point to these re-exports rather than the class attributes, which fixes a series of sphinx warnings)It also fixes some documentation warnings, such as by removing API references to the removed
TopologyAtom
,TopologyBond
,TopologyVirtualSite
,TopologyVirtualParticle
andTopologyMolecule
classes, and adds theConstraintType
andConstraintHandler
classes to the API reference.