Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved import hook #1748

Closed
wants to merge 63 commits into from
Closed

Improved import hook #1748

wants to merge 63 commits into from

Conversation

mbway
Copy link
Contributor

@mbway mbway commented Aug 27, 2023

I am working towards integrating rust into a python codebase and one of the main blockers is the lack of a comprehensive
import hook for automatically recompiling rust projects that have been installed in editable / unpacked mode.

To that end I have created a new import hook for maturin that I hope should cover most reasonable use cases and be
robust to many edge cases. I have written a fairly comprehensive test suite to ensure that the import hook is working
in many situations. I hope to upstream this new hook if possible.

Overview

The new import hook is mostly compatible with the old one (see the comparison section below) but is also usable in
many more situations:

  • automatically rebuild projects that are installed in editable mode
    • supports both pure and mixed projects
    • supports all types of bindings
    • independent of the current directory
    • supports path dependencies
  • automatically create projects for stand-alone .rs files and import them
    • improves on the existing hook by removing arbitrary limitations (see below)
    • unlike the old hook, doesn't install the generated projects. Uses a build cache similar to pyximport for cython.
  • when nothing has changed importing is near instantaneous (a few milliseconds)
    • change detection can be overridden with the force_rebuild option
  • automatically install a previously not installed package when imported from a python script inside that project
    • this behaviour is configurable with the install_new_packages option
  • does not rely on the current working directory
    • the old import hook uses the CWD but this makes it behave differently to regular python imports which can be
      very confusing as well as being quite limiting
  • import multiple projects at a time
  • safely import concurrently across multiple processes
    • concurrent importing can often happen unintentionally for example when using pytest-xdist
  • work with multiple virtual environments and interpreter versions without clashing
  • very configurable
    • can use the package importer and the .rs importer separately
    • can configure all maturin settings that can be overriden by cli flags
      • settings/flags can be set individually for specific projects using user-defined logic
    • uses the standard logging framework so the user can choose how noisy or silent they want the hook to be
      • very useful for debugging

Limitations and Future Improvements

I wrote this new hook in python because that is what was used previously but implementing in rust may have some
advantages. Namely, re-using the project resolving code instead of re-implementing it in python and probably being
able to detect changed projects more accurately and quickly. Although I find the performance to be reasonable in
the 'unchanged' case already.

The import hook works around an issue where I am using project_name-0.0.0.dist-info/direct_url.json to
follow editable installed packages back to the project source code in the 'pure' package case. When projects are
installed with pip install -e a file:// URL is written to direct_url.json that points to the project directory
but when installed with maturin develop the direct_url.json file points to the temporary wheel that was created
during installation. I would be willing to look into fixing this issue by having maturin write the required values
directly instead of applying a fix afterward. This also means existing pure packages installed with maturin develop
won't rebuild with the import hook until this is fixed.

edit: this has now been fixed in this PR and maturin develop now creates editable installs.

The project change detection probably isn't perfect. A potential improvement might be to load .gitignore and exclude
ignored files when deciding whether to rebuild. There are probably also edge cases of packages relying on data outside
the project directory which I have not considered yet.

The single .rs file import is currently missing some features offered by alternatives such as
rustimport but similar features could be added in future if there is demand.

If maturin was able to create an 'unpacked' wheel or just write the extension module binary somewhere that may be
more efficient than creating a wheel then unzipping it.

Background and comparison

The old import hook supports the following scenarios:

  • Old Scenario 1:
    • script running with CWD at the root of a maturin project
    • script does top-level import matching the name of the project
    • outcome: import hook runs maturin develop to build and install or re-install the project
      into the active environment
  • Old Scenario 2:
    • script running with CWD in a directory containing a maturin project directory named with the same
      name as the project it contains
    • script does a top-level import matching the name of the project
    • outcome: import hook runs maturin develop to build and install or re-install the project
      into the active environment
  • Old Scenario 3:
    • script import resolves to a .rs file
    • outcome: import hook generates a project at $PWD/build and installs or re-installs the
      project into the active environment

The old import hook is much more limited in the situations it can be used for. It isn't a fair comparison but the old
import hook only passes 4/84 of the tests (and not usually because they fail on the assertions that are looking for log
outputs specific to the new hook)

Old scenario 1 is expanded upon with the new import hook because it detects the project from any subdirectory within
the project instead of just at the root and works regardless of what the current working directory is (which fits
with how imports of regular packages takes place).

Old Scenario 2 is not supported by the new hook because I decided the added complexity of being able to support
this feature well outweighs the usefulness. I would suggest that installing the project in editable mode before
using the hook is a reasonable compromise.

Old scenario 3 is supported by the new rust file import hook but again is improved to not rely on the current
working directory. The old hook also had limitations like not being able to support multiple .rs files with the same
name but in different packages.

@netlify
Copy link

netlify bot commented Aug 27, 2023

Deploy Preview for maturin-guide ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 18b03d1
🔍 Latest deploy log https://app.netlify.com/sites/maturin-guide/deploys/659737c0415b120008a2b84e
😎 Deploy Preview https://deploy-preview-1748--maturin-guide.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mbway mbway force-pushed the improved_import_hook branch 2 times, most recently from 35872a1 to 0770262 Compare August 27, 2023 20:01
@mbway mbway force-pushed the improved_import_hook branch from 0770262 to e4cc108 Compare August 27, 2023 20:03
@mbway mbway force-pushed the improved_import_hook branch from e4cc108 to ca62147 Compare August 27, 2023 21:01

let (venv_dir, python) = create_virtualenv(virtualenv_name, Some(python)).unwrap();

println!("installing maturin binary into virtualenv");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the best way of making the maturin binary available to the python tests on the CI but currently they fail without this.
I tried installing with pip install . but that seems to install a release binary from pypi or something because maturin new was not available

@messense
Copy link
Member

Thanks for the PR and detailed write-up, this is a large PR so review may take a while.

@mbway mbway force-pushed the improved_import_hook branch from 369ab21 to 1fa2b38 Compare August 28, 2023 17:23
@mbway mbway force-pushed the improved_import_hook branch from 1fa2b38 to 1be2370 Compare August 28, 2023 17:25
@mbway
Copy link
Contributor Author

mbway commented Aug 29, 2023

I have fixed the CI tests that fail because of issues with the import hook and I now clear up the temporary
test workspaces by default to help prevent running out of space. However integration_pyo3_mixed_py_subdir
is still failing (although this seems unrelated to the import hook?) and the FreeBSD machine still runs out of space.
I may need help with fixing the remaining tests.

I have reviewed my own PR and made some changes based on that. There are more options
for customizing now. all relevant maturin flags can now be passed and the user can subclass the importers
to add their own custom logic. For example a user could create a custom .rs file importer that reads some
configuration out of a toml file adjacent to the file being imported or read the file itself to parse some
configuration from comments. By making generate_project_for_single_rust_file overridable the user also
has full control over the project which could be used to add dependencies to the generated project.

@messense
Copy link
Member

messense commented Sep 2, 2023

I'm worried the new test case takes too much time to run.

SLOW [>360.000s] maturin::run import_hook_project_importer

@mbway
Copy link
Contributor Author

mbway commented Sep 2, 2023

I knew the tests would take a while but each one was very necessary during development to get the logic just right. I think they will be very useful during maintenance to catch any regressions but I agree that they aren't fast.

These are the options I can think of:

  • keep the tests but only run a simple sanity check of 'does the simple case work' on the CI and run the rest manually only when a problem shows up
    • or maybe the CI could be configured to run all the tests only when releasing a new version or something?
  • have the import hook as a separate project/repo with it's own CI. If you want to go down this route we can discuss the details
  • run all the tests but try to speed them up by mocking maturin
    • this is probably a bad idea because the mock would have to be very accurate and verifying that would require tests which defeats the point of the mock

@messense
Copy link
Member

messense commented Sep 2, 2023

  • Can we split it so we can run them in parallel?
  • Instead of cargo test, maybe just use pytest for this if it's easier to speed up?

@mbway
Copy link
Contributor Author

mbway commented Jan 4, 2024

I fixed the main test failure and discovered and fixed some more. Setting up github actions on my fork was useful as it allowed me to iterate faster with this.

The issue that neither of us could replicate on our machines turned out to simply be the lock timing out if compilation of the project takes too long. This only happened on the CI when running many tests in parallel and without a populated sccache. The old timeout was 2 minutes. I kept this default (with a more informative message so the user can raise it if they need to) but raised the timeout to 10 minutes for the tests that are supposed to wait on the lock (so the parallel import tests).

Perhaps the default timeout should be raised to something like 10 minutes? I'm not sure what the best default would be but I don't think an infinite timeout as the default is a good idea.

The Cirrus CI tests are all failing and I think the machine is just getting overloaded so I'm skipping the import hook tests on that CI for now.

The tests on my fork now all pass (including on MacOS): https://github.com/mbway/maturin/actions/runs/7402918125
edit: went too soon. There seems to be a few remaining issues. I will fix them soon (next few days)

Summary of other things fixed/changed

  • the log outputs when tests fail should be more informative now
  • fixed windows issues
    • conversion between file:// URI and paths
    • issues with platform detection
    • account for the fact that virtual envs store executables in <venv>/Scripts instead of <venv>/bin
    • normalize line-endings when comparing with expected output
  • fixed pypy issues
    • slightly different error messages when loading an invalid module (some tests compare with expected output)
  • improvements
    • moved the fixing of direct_url.json from a post-processing step in the import hook into maturin
      • now packages are editable when installed with pip install -e and maturin develop

@mbway
Copy link
Contributor Author

mbway commented Jan 4, 2024

The remaining issues have been fixed. The remaining CI failures seem to just be overloaded machines or something else unrelated to the import hook.

So I suppose the PR is ready for review now :)

The outstanding improvements/changes I can think of are:

  • perhaps some of the logic like resolving project details should be exposed through the maturin binary, so the import hook could call maturin info <package_dir> or similar and maturin returns the information as json or some other simple format to stdout. This would simplify things a bit instead of maintaining two implementations (although I don't expect it to need changing very often)
  • perhaps the import hook would be better packaged and distributed as its own thing if the recommendation is to install maturin system-wide with pipx (Recommend install via pipx? #1857) instead of having a maturin binary in each environment?
    • ideally I would still like to have the hook associated with maturin/PyO3 so that people know about it though.
    • either way, I can help maintain the import hook so long as I have time to do so.

@messense
Copy link
Member

messense commented Jan 10, 2024

  • perhaps the import hook would be better packaged and distributed as its own thing

I'm fine with extracting the import hook out to a separate package, maturin can still provide an optional dependency on it and re-export it in maturin.import_hook for compatibility in the short term.

I'd also want to put it in a separate repository so the slow CI run time won't be an issue for maturin.

@mbway
Copy link
Contributor Author

mbway commented Jan 10, 2024

Ok, so what do you think are the next steps?

It sounds like you are happy to have the import hook associated with maturin, so would the pypi package and repo be called maturin_import_hook or something.
What is the process of setting up a repo under the Pyo3 organisation? Are there other members of the organisation I should speak to to see whether they approve?

My email is on my profile if it would be better to continue the conversation there

@messense
Copy link
Member

Sorry for the late reply, I've created https://github.com/PyO3/maturin-import-hook, let's move the import hook code there.

@mbway
Copy link
Contributor Author

mbway commented Feb 29, 2024

superseded by PyO3/maturin-import-hook#1

@mbway mbway closed this Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-no-fail-fast If one job fails, allow the rest to keep testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants