-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve load time #85
Conversation
I lied. This doesn't do anything after merging with |
@prisae, some of the logic with |
Thanks @banesullivan for jumping quick on this! Yes, now that distutils is gone the picture looks very different. I pushed to also lazy-import Overall, this is a massive difference. The big pieces are now:
These changes would be already very small compared to the changed achieved so far. I wonder if we should do them (hide all the imports in the functions), or leave it as is for know. It might be worth holding on for a while, there are "things in the works" I think: |
Codecov Report
@@ Coverage Diff @@
## main #85 +/- ##
==========================================
+ Coverage 87.15% 87.23% +0.08%
==========================================
Files 4 4
Lines 366 384 +18
==========================================
+ Hits 319 335 +16
- Misses 47 49 +2 |
This brings an interesting question. Should we add a benchmark testing load time, so we do not accidentality introduce a slow load at a later point? Not sure how to do that exactly. Here a very ugly way, probably very error prone, but at least an idea ;-) It asserts that the import time is less than 0.1 seconds (maybe we could aim at 0.05 s).
Maybe a bit better (although not sure if cross-platform)
|
It is fantastic, thanks @banesullivan - in my application the load time spent on scooby went from 0.077s (5%!) to 0.001s (irrelevant), which is fantastic! (Note the difference, not from 0.288s to 0.015s as reported here; I assume when importing a package together with many other there are some shared benefits, so importing scooby alone is different than importing scooby within another package.) |
I just realized that your solution here @banesullivan will get into merge conflict with the solution of @akaszynski in #83 - can we get first #83 in? |
Give me a minute. |
@prisae, I'll leave it to you to resolve the conflicts. Consider this approved. |
Enforcement of execution times is a bit hard, especially on CI where we have no control over the hardware. I honestly wish we could record the total number of cpu instructions rather than execution time. That's (probably) less variable. Regardless, I see no reason why you couldn't warn for execution times. Just use CI/CD as the worst case. |
Good point @akaszynski - but this was indeed my main idea. Just to have a simply test that would warn you if you add by accident a new dependency that takes, say 1s to load. There should go a flag up somewhere. |
1s is worse than even the original implementation. I'd say tack on 50% to the import time on our CI/CD with this implementation and then warn if we exceed. |
Add more lazy imports; down to 3 ms / ~7%. This is basically 1/100 of before.
I will add now a small test and double-check everything before merging. |
OK, CI/CD is quite slower than local. Imports took between 0.08s and 0.11s; I set the test for now to 0.15s. That should hopefully work as red flag. If it starts to fail we have to either increase the limit or search for the culprit. |
Thanks both for the feedback! Would anyone mind making a release (@banesullivan, @akaszynski) - or tell me, what I would have to do for a release? I don't think I have ever done one for scooby. |
I can do it. We don't actually document our release approach, but you can glance over https://github.com/pyvista/pyvista/blob/main/CONTRIBUTING.rst to see how we do it for PyVista. I'm just going to implement that. @banesullivan, I hope you're fine with this. I never really liked having a non-dev version on main. |
FWIW because this is Python 3.7+ Using module level |
Thanks @Erotemic! Would you mind giving an example how one could use |
Perhaps something like this? https://anvil.works/blog/lazy-modules If so, (I also did not read that in detail) I don't see why you'd do this... having |
def __getattr__(key):
if key == 'platform':
import platform
return platform
else:
raise AttributeError(key) Including the above code in the module would allow you to just access There is good discussion of how this can be used more broadly in a scikit-image PR. @banesullivan I think the current way is also sufficient. It just looks weird to me to include the |
That is interesting @Erotemic and good to know - A lot seems to go on at the moment with regards to lazy loading modules in the scientific python stack. I also like the way that SciPy will load newly all submodules only "on-demand", but do not have to be imported explicitly any longer. |
FYI: I have a package mkinit that makes it really easy to expose your entire top-level API via lazy or explicit imports. It works by statically parsing your code and then autogenerating boilerplate for def lazy_import(module_name, submodules, submod_attrs):
"""
Boilerplate to define PEP 562 __getattr__ for lazy import
https://www.python.org/dev/peps/pep-0562/
"""
import importlib
import os
name_to_submod = {
func: mod for mod, funcs in submod_attrs.items()
for func in funcs
}
def __getattr__(name):
if name in submodules:
attr = importlib.import_module(
'{module_name}.{name}'.format(
module_name=module_name, name=name)
)
elif name in name_to_submod:
submodname = name_to_submod[name]
module = importlib.import_module(
'{module_name}.{submodname}'.format(
module_name=module_name, submodname=submodname)
)
attr = getattr(module, name)
else:
raise AttributeError(
'No {module_name} attribute {name}'.format(
module_name=module_name, name=name))
globals()[name] = attr
return attr
if os.environ.get('EAGER_IMPORT', ''):
for name in submodules:
__getattr__(name)
for attrs in submod_attrs.values():
for attr in attrs:
__getattr__(attr)
return __getattr__
__getattr__ = lazy_import(
__name__,
submodules={
'submod',
'subpkg',
},
submod_attrs={
'submod': [
'submod_func',
],
'subpkg': [
'nested',
'nested_func',
],
},
)
def __dir__():
return __all__
__all__ = ['nested', 'nested_func', 'submod', 'submod_func', 'subpkg'] Which is the lazy version of: from mkinit_demo_pkg import submod
from mkinit_demo_pkg import subpkg
from mkinit_demo_pkg.submod import (submod_func,)
from mkinit_demo_pkg.subpkg import (nested, nested_func,)
__all__ = ['nested', 'nested_func', 'submod', 'submod_func', 'subpkg'] Now you could maintain this yourself. Or you could just run |
Helps #79 to speed up scooby import time since
distutils
is imported for a lesser used methodget_standard_lib_modules()
We can remove/deprecate in a follow-up PR. This change speeds up the import time from 0.378s to 0.044s for me
cc @prisae