Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy imports based on __import__ #22752

Closed
jdemeyer opened this issue Apr 4, 2017 · 67 comments
Closed

Lazy imports based on __import__ #22752

jdemeyer opened this issue Apr 4, 2017 · 67 comments

Comments

@jdemeyer
Copy link

jdemeyer commented Apr 4, 2017

Implement lazy imports by overriding import.

To be used as

from sage.misc.lazy_import import _lazyimport_

with _lazyimport_:
    from sage.all import ZZ

Unlike the existing lazy import implementation, this supports both Python and Cython.

Depends on #22755

Component: misc

Author: Jeroen Demeyer

Branch/Commit: u/jdemeyer/ticket/22752 @ 9e1208b

Issue created by migration from https://trac.sagemath.org/ticket/22752

@jdemeyer jdemeyer added this to the sage-8.0 milestone Apr 4, 2017
@jdemeyer

This comment has been minimized.

@jdemeyer jdemeyer changed the title Lazy imports based on PEP 302 Lazy imports based on __import__ Apr 4, 2017
@jdemeyer
Copy link
Author

jdemeyer commented Apr 4, 2017

Dependencies: #22755

@jdemeyer
Copy link
Author

jdemeyer commented Apr 4, 2017

Branch: u/jdemeyer/ticket/22752

@jdemeyer
Copy link
Author

jdemeyer commented Apr 5, 2017

Upstream: Reported upstream. No feedback yet.

@jdemeyer
Copy link
Author

jdemeyer commented Apr 5, 2017

Commit: 06c4d0f

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Author

jdemeyer commented Apr 5, 2017

New commits:

94302caDo not declare functions/methods as "cdef inline"
9e5715eVarious improvements to lazy imports
d2bb491Whitespace fix
06c4d0fLazy imports based on __import__

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 5, 2017

Changed commit from 06c4d0f to 8ec9f5d

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 5, 2017

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8aac18aVarious improvements to lazy imports
8ec9f5dLazy import context based on __import__

@jdemeyer
Copy link
Author

jdemeyer commented Apr 5, 2017

Changed upstream from Reported upstream. No feedback yet. to Fixed upstream, but not in a stable release.

@jdemeyer

This comment has been minimized.

@nbruin
Copy link
Contributor

nbruin commented Apr 5, 2017

comment:9

I don't think using the context manager is thread safe, and python as a whole doesn't forbid using threads. I'm afraid we're stuck with other, less attractive syntax.

I haven't looked into this in detail, but python does have an impressive hooks etc. in the import system (hooks that have changed quite a bit in later versions of 3.* as well ...). If we want to co-opt the import statement syntax, perhaps we can do something using
meta_path so that we can do something like

import lazy.sage.calculus.calculus

@nbruin
Copy link
Contributor

nbruin commented Apr 5, 2017

comment:10

Different note on lazy imports: Currently we have hundreds of "lazy-import" objects hanging around coming from
in module A:

lazy_import('C',XXX}

and then in module B:

from A import XXX

The problem is that the lazy import object that gets bound to B.XXX still has a pointer to the namespace of A, so every time B.XXX gets called, it will looking at A.XXX. See #16522.

If we're going to put hooks into the import system, perhaps we can filter out these issues as well.

@kwankyu
Copy link
Collaborator

kwankyu commented Apr 5, 2017

comment:11

I don't like two similar-looking names lazy_import and lazyimport used together. This would be nicer

from sage.misc.lazy_import import lazy_import

with lazy_import():
   from ...

Looking at your code, it seems that you chose lazyimport because of the existing function of the same name. To solve this unfortunate situation, it could be possible to hack the __call__ method so that your lazyimport(or lazy_import) also support the old function of the same name. Like if the first argument starts with a string, then roll back to the old function...

@nbruin
Copy link
Contributor

nbruin commented Apr 6, 2017

comment:12

Hm. Replacing __import__ within a context manager isn't thread safe. But we COULD replace __import__ wholesale to do what it normally does, and in extended form allow things like

import lazy.sage.calculus.calculus
from lazy.sage.interfaces.maxima_lib import maxima

what's more, we could also wrap

from A import XXX

to check if XXX is a LazyImport object and if so, rewrap it referencing the appropriate namespace.

The only offence is that we'd be incompatible with a top-level module "lazy" existing.

@jdemeyer
Copy link
Author

jdemeyer commented Apr 6, 2017

comment:13

Replying to @nbruin:

I don't think using the context manager is thread safe, and python as a whole doesn't forbid using threads. I'm afraid we're stuck with other, less attractive syntax.

On the other hard, I would argue that lazy imports are mostly useful during startup, where multi-threading is not likely.

I haven't looked into this in detail, but python does have an impressive hooks etc. in the import system (hooks that have changed quite a bit in later versions of 3.* as well ...). If we want to co-opt the import statement syntax, perhaps we can do something using
meta_path

I guess you refer to PEP 302? I did look into that, but it doesn't quite work. It's the wrong level of abstraction. These PEP 302 hooks assume that you actually want to import something (i.e. add a new entry to sys.modules). For lazy imports, we don't import anything, so I think that __import__ is the right hook. Besides, sys.meta_path is also a global variable, so it wouldn't solve the "multi-threading" problem.

import lazy.sage.calculus.calculus

Interesting syntax. I guess that would work (not for relative imports, but that is a minor thing). The only problem is that lazy would become a reserved name, so we really should pick something that no package will ever use. Say

import __lazy__.sage.calculus.calculus

@jdemeyer
Copy link
Author

jdemeyer commented Apr 6, 2017

comment:14

Replying to @nbruin:

Hm. Replacing __import__ within a context manager isn't thread safe. But we COULD replace __import__ wholesale to do what it normally does, and in extended form allow things like

import lazy.sage.calculus.calculus
from lazy.sage.interfaces.maxima_lib import maxima

what's more, we could also wrap

from A import XXX

to check if XXX is a LazyImport object and if so, rewrap it referencing the appropriate namespace.

Funny. I was thinking exactly the same thing after reading your comment [comment:10]

@jdemeyer
Copy link
Author

jdemeyer commented Apr 6, 2017

comment:15

Thinking more about it, I prefer

from sage.calculus.calculus.__lazy__ import something

Only the latter would support relative imports like from .calculus.__lazy__ import something.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Author

jdemeyer commented Apr 6, 2017

comment:17

Nils, you idea doesn't allow passing extra options to LazyImport. You cannot do

from sage.calculus.calculus.__lazy__(deprecation=12345) import something

@jdemeyer

This comment has been minimized.

@jdemeyer jdemeyer removed this from the sage-8.0 milestone Apr 6, 2017
@jdemeyer jdemeyer closed this as completed Apr 6, 2017
@jdemeyer
Copy link
Author

jdemeyer commented Apr 6, 2017

comment:41

Once you implement lazy imports with your hypothetical fix using PEP 302, remember to fix #15648 and #20626 too.

@jdemeyer
Copy link
Author

jdemeyer commented Apr 6, 2017

comment:42

And maybe #21636 too, while you're at it.

@jdemeyer
Copy link
Author

jdemeyer commented Apr 6, 2017

comment:43

...which should eventually lead to a fix for #22747 which I need for OpenDreamKit/OpenDreamKit#87

@jdemeyer
Copy link
Author

jdemeyer commented Apr 7, 2017

comment:44

Just to know where we stand, let me try to summarize the pros and cons of my approach. I'll try to do this objectively. It will at least allow to compare different approaches.

New features compared to the old lazy_import:

  1. Support Python and Cython (imports in Cython work the same way as plain Python, so this is true for anything which hooks the import system).

  2. Simple syntax: an ordinary import statement inside a with _lazyimport_ block.

  3. Relative and absolute imports are supported.

Existing features from lazy_import which are kept:

  1. Support for PEP 302 (a module which needs a PEP 302 loader can be lazy imported).

  2. Support for optional arguments like deprecation.

  3. Support for from ... import *.

  4. Automatic replacement of LazyImport in a global namespace by the actual imported object.

  5. Existing lazy_imports can be easily converted to the new with _lazyimport_, preserving all features.

Cons:

  1. Only from ... import ... is supported, not import ...

  2. One should be careful not to run arbitrary code inside a with _lazyimport_ block.

  3. It's not implemented using PEP 302 (personally, I don't think that's an issue, but you guys seem to think so).

Finally, let me mention that #21636 is the main reason to work on this, for which in particular proper Cython support for lazy imports is needed.

@jdemeyer
Copy link
Author

jdemeyer commented Apr 7, 2017

comment:45

I also remind you about #22755, which makes some fixes to the existing lazy imports. If you do work on lazy imports, please do so on top of #22755.

@embray
Copy link
Contributor

embray commented Apr 7, 2017

comment:46

PEP 302 wouldn't make a difference for this because it also uses global variables like sys.meta_path

You wouldn't be manipulating sys.meta_path in general, except very early on in sage.__init__ for example.

@embray
Copy link
Contributor

embray commented Apr 7, 2017

comment:47

I think I understand better now how this is really only trying to override the from a import b syntax, which is why you are targeting __import__ itself. Though really, the problem with circular imports is the same whether one is using the import foo syntax or the from foo import ... syntax.

So that takes me back to thinking that some modules should always use some kind of lazy-import mechanism. It's better overall to design things to avoid circular imports in the first place, but it's admittedly tricky for some fundamental modules (like rationals are defined in terms of integers, but operations on integers can return rationals, or need to be able to work with rationals, etc.). That's just one example it it seems to be one of the most pervasive.

Now, supporting from foo import ... is still tricky since it needs to be supported via a proxy object, but already exists and can be reused / improved on. The issues you mentioned like #20626 seem like issues with the implementation of the proxy itself. (I also wonder if, rather than using our own proxy implementation, it would be useful to use something from the community like https://github.com/ionelmc/python-lazy-object-proxy; not sure how well that works with Sage's builtin types though).


On a slightly different topic, I don't quite understand why the "deprecation" feature of the existing lazy_import needs to be implemented directly through the import syntax or via a wrapper around it. Similarly for the "startup guard" functionality. Instead, these features could be implemented if all modules in sage were wrapped in a custom Module subclass/wrapper. Although I haven't tested as such, I think there'd be little noticeable overhead to doing so. Then we can add simple syntax at the module level for specifying this behavior. For example, sage.combinat.crystals.all contains a fair number of deprecations. Instead of:

lazy_import('sage.combinat.crystals.letters',
            'CrystalOfLetters',
            deprecation=(15882, "this is being removed from the global namespace. Use crystals.Letters instead"))

lazy_import('sage.combinat.crystals.fast_crystals',
            'FastCrystal',
            deprecation=(15882, "this is being removed from the global namespace. Use crystals.FastRankTwo instead"))

lazy_import('sage.combinat.crystals.highest_weight_crystals',
            'HighestWeightCrystal',
            deprecation=(15882, "this is being removed from the global namespace. Use crystals.HighestWeight instead"))

this could look something like:

__sage_deprecations__ = [
    ('sage.combinat.crystals.' + mod_name, cls_name, 15822, "this is being removed from the global namespace. Use crystals. %s instead" % cls_name) for mod_name, cls_name in [
        ('letters', 'CrystalOfLetters'),
        ('fast_crystals', 'FastCrystal'),
        ('highest_weight_crystals', 'HeighestWeightCrystals'),
        ...
    ]
]

A custom module wrapper would be able handle imports of objects listed in __sage_deprecations__ (as I'm calling it) through __getattribute__ and return proxies for those objects, whereas the sage.combinat.crystals.catalog module would not have __sage_deprecations__ and thus would supply the normal, non-proxied objects.

I feel like this is a clearer and more explicit declarative approach to how some attributes of some modules should be handled, rather than mucking with the import system every single time this feature is needed.

@embray
Copy link
Contributor

embray commented Apr 7, 2017

comment:48

Just out of curiosity, I'm going to take a stab at replacing our proxy with lazy_object_proxy (without any other changes to lazy_import) and see how far I can get, or if it can resolve any of the known bugs.

@nbruin
Copy link
Contributor

nbruin commented Apr 7, 2017

comment:49

Replying to @embray:

I think I understand better now how this is really only trying to override the from a import b syntax, which is why you are targeting __import__ itself. Though really, the problem with circular imports is the same whether one is using the import foo syntax or the from foo import ... syntax.

No, it's not. Python is completely fine with circular imports. The problem arises when there is a circular access of module attributes during load time. The problem with "from foo import ..." is that it automatically leads to a module attribute access during load time.

Without "from foo import ..." you'd have to write something like

import A

c=A.b+1

which is very unlikely to happen, since usually at load time only declarations are executed.

For that kind of symbols we would basically just need

#instead of from A import b
import A
forward_binding(target="b",namespace=globals(),from=A,from_name="b")

where forward_binding could be resolved as soon as A is initialized. The proxy doesn't have to hang around until first used, as with lazy-import. I'm not sure we have the right hooks to do this, but if there are I think we could change __import__ to allow one more level of circularity (namely, circularity for "from A import b" provided that A.b doesn't get accessed during load time).

@jdemeyer
Copy link
Author

jdemeyer commented Apr 7, 2017

comment:50

Replying to @embray:

The issues you mentioned like #20626 seem like issues with the implementation of the proxy itself.

Yes and I had a fix for #15648 and for #20626 (based on this ticket, so those fixes are no longer valid).

@nbruin
Copy link
Contributor

nbruin commented Apr 7, 2017

comment:51

Replying to @jdemeyer:

and even there, with the hooks available in the python import system, do you have full control over what code gets executed?

Yes because those other hooks won't be called.

I think there might be a more reliable way: When __import__ gets called, it does get a "globals" parameter with it as the scope on which to work. We can park our state whether inputs are to be lazy there:

builtin.__import__=new_import
lazy_import=True
from A import a
lazy_import=False
import B

i.e., the new import uses a binding in the relevant scope to see if it needs to intercept imports and make them lazy or whether they should be processed as normal.

That way, the new import routine would be transparent for scopes where lazy_import isn't being used.

Based on that infrastructure, a context would be just fine. It would be about toggling a flag in the relevant scope rather than changing a global variable.

It does mean we'd need to have our "import" wrapper in place all the time, because swapping it out would still be sensitive to scoping and threading issues.

@embray
Copy link
Contributor

embray commented Apr 7, 2017

comment:52

I still think that messing so much with imports for what you want to accomplish is overkill. My point in my last comment was that whether you use the import foo or from import foo ... syntax the first thing that happens is foo is imported, possibly through import hooks, and those hooks can always wrap the Module object for foo in a proxy that can handle lazy imports.

You could even make the module wrapper itself a context manager for handling lazy imports. I don't know that this is the clearest syntax but it's an idea:

import foo
with foo.__lazy_import__:
    from foo import a
    from foo import b

In this case a and b are LazyImport proxies.

I think the 'deprecation warning' aspect of lazy_import can be handled through a separate mechanism as explained above.

My experiment in rewriting LazyImport worked too--there's just one minor detail with regard to how __class__ is handled that I need to fix but it's 11:00 p.m. on Friday night here so I'm gonna call it quits until Monday :)

@embray
Copy link
Contributor

embray commented Apr 11, 2017

comment:53

Can we maybe discuss exactly what the use cases are for lazy-import?

To me the primary use case--the one that really matters--is just breaking circular imports at startup time. I've never needed something like lazy imports to get around that before, but I admit some of the circular definitions in Sage are vexing and hard to avoid, especially in Cygwin modules where we have less control over the order in which module-level statements are executed with respect to each other. So that's one case.

The other two, as far as I can tell, are:

  1. Issuing deprecation warnings for imports of some names from modules they shouldn't be directly imported from.

  2. Issuing a warning (or error as in Various fixes to lazy imports #22755) if some proxied lazy import gets resolved during initial startup.

I don't really much see the case for the latter of the two. If the main use case for lazy imports (and note: I'm not necessarily claiming that's the case, that's just my understanding) then who cares if gets resolved at startup time? If it does, any nothing breaks, then it doesn't need to be a lazy import in the first place (having this feature optionally for debugging purposes is nice though).

@embray
Copy link
Contributor

embray commented Apr 11, 2017

comment:54

Or are there also cases where lazy import is used for performance reasons, and if so what are some examples of that?

@jdemeyer
Copy link
Author

comment:55

I would say there are two important use cases:

  1. as you mentioned, breaking circular imports.

  2. importing less stuff at startup to decrease startup time. For example, if you don't care about differential geometry, then it would be a waste of time to import all differential-geometry-related modules when Sage starts up. However, we do want the relevant classes to appear in the global namespace, so we use a lazy import. The at_startup argument is very relevant here, since a lazy import resolved at startup wouldn't help with this.

@embray
Copy link
Contributor

embray commented Apr 11, 2017

comment:56

Got it--thanks for clarifying.

@jdemeyer
Copy link
Author

comment:57

TODO: discuss lazy imports next week too with @embray.

@jdemeyer
Copy link
Author

comment:58

Good news for myself: I think I can get #22747 to work without requiring this.

@embray
Copy link
Contributor

embray commented Apr 21, 2017

comment:59

Is that the issue you refer to when you mean "works with Cython"?

@jdemeyer
Copy link
Author

Changed upstream from Fixed upstream, but not in a stable release. to none

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants