Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 690: Lazy Imports #2569

Merged
merged 18 commits into from
May 3, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,7 @@ pep-0686.rst @methane
pep-0687.rst @encukou
pep-0688.rst @jellezijlstra
pep-0689.rst @encukou
pep-0690.rst @warsaw
# ...
# pep-0754.txt
# ...
Expand Down
1 change: 1 addition & 0 deletions AUTHOR_OVERRIDES.csv
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ Just van Rossum,"van Rossum, Just (JvR)",JvR
Martin v. Löwis,"von Löwis, Martin",von Löwis
Nathaniel Smith,"Smith, Nathaniel J.",Smith
P.J. Eby,"Eby, Phillip J.",Eby
Germán Méndez Bravo,"Méndez Bravo, Germán",Méndez Bravo
350 changes: 350 additions & 0 deletions pep-0690.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,350 @@
PEP: 690
Title: Lazy Imports
Author: Germán Méndez Bravo <german.mb@gmail.com>, Carl Meyer <carl@oddbird.net>
Sponsor: Barry Warsaw <barry@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Apr-2022
Python-Version: 3.12


Abstract
========

This PEP proposes a feature to transparently defer the execution of imported
modules until the moment when an imported object is used. Since Python
programs commonly import many more modules than a single invocation of the
program is likely to use in practice, lazy imports can greatly reduce the
overall number of modules loaded, improving startup time and memory usage. Lazy
imports also mostly eliminate the risk of import cycles.


Motivation
==========

Common Python code style :pep:`prefers <8#imports>` imports at module
level, so they don't have to be repeated within each scope the imported object
is used in, and to avoid the inefficiency of repeated execution of the import
system at runtime. This means that importing the main module of a program
typically results in an immediate cascade of imports of most or all of the
modules that may ever be needed by the program.

Consider the example of a Python command line program with a number of
subcommands. Each subcommand may perform different tasks, requiring the import
of different dependencies. But a given invocation of the program will only
execute a single subcommand, or possibly none (i.e. if just ``--help`` usage
info is requested). Top-level eager imports in such a program will result in
the import of many modules that will never be used at all; the time spent
(possibly compiling and) executing these modules is pure waste.

In an effort to improve startup time, some large Python CLIs tools make imports
lazy by manually placing imports inline into functions to delay imports of
expensive subsystems. This manual approach is labor-intensive and fragile; one
misplaced import or refactor can easily undo painstaking optimization work.

Existing import-hook-based solutions such as `demandimport
<https://github.com/bwesterb/py-demandimport/>`_ are limited in are limited
in that only certain styles of import can be made truly lazy (imports such as
``from foo import a, b`` will still eagerly import the module ``foo``) and they
impose additional runtime overhead on every module attribute access.

This PEP proposes a more comprehensive solution for lazy imports that does not
impose detectable overhead in real-world use. The implementation in this PEP
has already `demonstrated
<https://github.com/facebookincubator/cinder/blob/cinder/3.8/CinderDoc/lazy_imports.rst>`_
startup time improvements up to 70% and memory-use reductions up to
40% on real-world Python CLIs.

Lazy imports also eliminate most import cycles. With eager imports, "false
cycles" can easily occur which are fixed by simply moving an import to the
bottom of a module or inline into a function, or switching from ``from foo
import bar`` to ``import foo``. With lazy imports, these "cycles" just work.
The only cycles which will remain are those where two modules actually each use
a name from the other at module level; these "true" cycles are only fixable by
refactoring the classes or functions involved.


Rationale
=========

The aim of this feature is to make imports transparently lazy. "Lazy" means
that the import of a module (execution of the module body and addition of the
module object to ``sys.modules``) should not occur until the module (or a name
imported from it) is actually referenced during execution. "Transparent" means
that besides the delayed import (and necessarily observable effects of that,
such as delayed import side effects and changes to ``sys.modules``), there is
no other observable change in behavior: the imported object is present in the
module namespace as normal and is transparently loaded whenever first used: its
status as a "lazy imported object" is not directly observable from Python or
from C extension code.

The requirement that the imported object be present in the module namespace as
usual, even before the import has actually occurred, means that we need some
kind of "lazy object" placeholder to represent the not-yet-imported object.
The transparency requirement dictates that this placeholder must never be
visible to Python code; any reference to it must trigger the import and replace
it with the real imported object.

Given the possibility that Python (or C extension) code may pull objects
directly out of a module ``__dict__``, the only way to reliably prevent
accidental leakage of lazy objects is to have the dictionary itself be
responsible to ensure resolution of lazy objects on lookup.

To avoid a performance penalty on the vast majority of dictionaries which never
contain any lazy objects, we install a specialized lookup function
(``lookdict_unicode_lazy``) for module namespace dictionaries when they first
gain a lazy-object value. When this lookup function finds that the key
references a lazy object, it resolves the lazy object immediately before
returning it.

This implementation comprehensively prevents leakage of lazy objects, ensuring
they are always resolved to the real imported object before anyone can get hold
of them for any use, while avoiding any noticeable performance impact on
dictionaries in general.


Specification
=============

Lazy imports are opt-in, and globally enabled via a new ``-L`` flag to the
Python interpreter, or a ``PYTHONLAZYIMPORTS`` environment variable.

When enabled, the loading and execution of all (and only) top level imports is
deferred until the imported name is used. This could happen immediately (e.g.
on the very next line after the import statement) or much later (e.g. while
using the name inside a function being called by some other code at some later
time.)

For these top level imports, there are two exceptions which will make them
eager (not lazy): imports inside ``try``/``except``/``finally`` or ``with``
blocks, and star imports (``from foo import *``.) Imports inside
exception-handling blocks (this includes ``with`` blocks, since those can also
"catch" and handle exceptions) remain eager so that any exceptions arising from
the import can be handled. Star imports must remain eager since performing the
import is the only way to know which names should be added to the namespace.

Imports inside class definitions or inside functions/methods are not "top
level" and are never lazy.

Dynamic imports using ``__import__()`` or ``importlib.import_module()`` are
also never lazy.


Per-module opt out
------------------

Due to the backwards compatibility issues mentioned below, it may be necessary
to force some imports to be eager. In first-party code, this can be easily
accomplished via any module-level reference to the name, e.g. even re-assigning
the name to itself will trigger the import::

import foo

# ensure 'foo' is eagerly imported
foo = foo

Another option that scales better to making multiple imports eager is to place
them inside a ``try/finally``::

try: # force these imports to be eager
import foo
import bar
finally:
pass

The more difficult case can occur if an import in third-party code that can't
easily be modified must be forced to be eager. For this purpose, we propose to
add an API to ``importlib`` that can be called early in the process to specify
a list of module names within which all imports will be eager::

from importlib import set_eager_imports

set_eager_imports(["one.mod", "another"])


Backwards Compatibility
=======================

This proposal preserves full backwards compatibility when the feature is
disabled, which is the default.

When enabled, lazy imports could produce currently unexpected results and
behaviors in existing codebases. The problems that we may see when enabling
lazy imports in an existing codebase are related to:


Import Side Effects
-------------------

Import side effects that would otherwise be produced by the execution of
imported modules during the execution of import statements will be deferred at
least until the imported objects are used.

These import side effects may include:

* code executing any side-effecting logic during import;
* relying on imported submodules being set as attributes in the parent module.


Dynamic Paths
-------------

There could be issues related to dynamic Python import paths; particularly,
adding (and then removing after the import) paths from ``sys.path``::

sys.path.insert(0, "/path/to/foo/module")
import foo
del sys.path[0]
foo.Bar()

In this case, with lazy imports enabled, the import of ``foo`` will not
actually occur while the addition to ``sys.path`` is present.


Deferred Exceptions
-------------------

All exceptions arising from import (including ``ModuleNotFoundError``) are
deferred from import time to first-use time, which might complicate debugging.
Accessing an object in the middle of any code could trigger a deferred import
and produce ``ImportError`` or any other exception resulting from the
resolution of the deferred object, while loading and executing the related
imported module.


Security Implications
=====================

Deferred execution of code could produce security concerns if process owner,
path, ``sys.path``, or other sensitive environment or contextual states change
between the time the ``import`` statement is executed and the time where the
imported object is used.


Performance Impact
==================

The reference implementation has shown that the feature has negligible
performance impact on existing real-world codebases (Instagram Server and other
several CLI programs at Meta), while providing substantial improvements to
startup time and memory usage.

The reference implementation shows small performance regressions in a few
pyperformance benchmarks, but improvements in others. (TODO update with
detailed data from 3.11 port of implementation.)


How to Teach This
=================

In most cases, lazy imports should just work transparently and no teaching of
the feature should be necessary.

The implementation will ensure that errors resulting from a deferred import
have metadata attached pointing the user to the original import statement, to
ease debuggability of errors from lazy imports.

Some best practices to deal with some of the issues that could arise and to
better take advantage of lazy imports are:

* Avoid relying on import side effects. Perhaps the most common reliance on
import side effects is the registry pattern, where population of some
external registry happens implicitly during the importing of modules, often
via decorators. Instead, the registry should be built via an explicit call
that perhaps does a discovery process to find decorated functions or classes.

* Always import needed submodules explicitly, don't rely on some other import
to ensure a module has its submodules as attributes. That is, do ``import
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only
works (unreliably) because the attribute ``foo.bar`` is added as a side
effect of ``foo.bar`` being imported somewhere else. With lazy imports this
may not always happen on time.

* Avoid using star imports, as those are always eager.

* When possible, do not import whole submodules. Import specific names instead;
i.e.: do ``from foo.bar import Baz``, not ``import foo.bar`` and then
``foo.bar.Baz``. If you import submodules (such as ``foo.qux`` and
``foo.fred``), with lazy imports enabled, when you access the parent module's
name (``foo`` in this case), that will trigger loading all of the sibling
submodules of the parent module (``foo.bar``, ``foo.qux`` and ``foo.fred``),
not only the one being accessed, because the parent module ``foo`` is the
actual deferred object name.

* Don't use inline imports, unless absolutely necessary. Import cycles should
no longer be a problem with lazy imports enabled, so there’s no need to add
complexity or more opcodes in a potentially hot path.


Reference Implementation
========================

The current reference implementation is available as part of
`Cinder <https://github.com/facebookincubator/cinder>`_.
Reference implementation is in use within Meta Platforms and has proven to
achieve improvements in startup time (and total runtime for some applications)
in the range of 40%-70%, as well as significant reduction in memory footprint
(up to 40%), thanks to not needing to execute imports that end up being unused
in the common flow.


Rejected Ideas
==============

Explicit syntax for lazy imports
--------------------------------

If the primary objective of lazy imports were solely to work around import
cycles and forward references, an explicitly-marked syntax for particular
targeted imports to be lazy would make a lot of sense. But in practice it would
be very hard to get robust startup time or memory use benefits from this
approach, since it would require converting most imports within your code base
(and in third-party dependencies) to use the lazy import syntax.

It would be possible to aim for a "shallow" laziness where only the top-level
imports of subsystems from the main module are made explicitly lazy, but then
imports within the subsystems are all eager. This is extremely fragile, though
-- it only takes one mis-placed import to undo the carefully constructed
shallow laziness. Globally enabling lazy imports, on the other hand, provides
in-depth robust laziness where you always pay only for the imports you use.


Half-lazy imports
-----------------

It would be possible to eagerly run the import loader to the point of finding
the module source, but then defer the actual execution of the module and
creation of the module object. The advantage of this would be that certain
classes of import errors (e.g. a simple typo in the module name) would be
caught eagerly instead of being deferred to the use of an imported name.

The disadvantage would be that the startup time benefits of lazy imports would
be significantly reduced, since unused imports would still require a filesystem
``stat()`` call, at least. It would also introduce a possibly non-obvious split
between *which* import errors are raised eagerly and which are delayed, when
lazy imports are enabled.

This idea is rejected for now on the basis that in practice, confusion about
import typos has not been an observed problem with the reference
implementation. Generally delayed imports are not delayed forever, and errors
show up soon enough to be caught and fixed (unless the import is truly unused.)


Lazy dynamic imports
--------------------

It would be possible to add a ``lazy=True`` or similar option to ``__import__``
and/or ``importlib.import_module()``, to enable them to perform lazy imports.
That idea is rejected in this PEP for lack of a clear use case. Dynamic imports
are already far outside the :pep:`8` code style recommendations for imports,
and can easily be made precisely as lazy as desired by placing them at the
desired point in the code flow. These aren't commonly used at module top level,
which is where lazy imports applies.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.