Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bzlmod): support bazel downloader when downloading wheels #1827

Closed
wants to merge 73 commits into from

Conversation

aignas
Copy link
Collaborator

@aignas aignas commented Mar 29, 2024

This introduces 3 attributes and the minimal code to be able to download wheels
using the bazel downloader for the host platform. This is not yet adding
support for targeting a different platform but just allows us to get the wheels
for the host platform instead of using pip.

All of this is achieved by calling the PyPI's SimpleAPI (Artifactory should work
as well) and getting the all URLs for packages from there. Then we use the sha256
information within the requirements files to match the entries found on SimpleAPI
and then pass the url, sha256 and the filename to whl_library, which uses
repository_ctx.download.

If we cannot find any suitable artifact to use, we fallback to legacy pip behaviour.

Testing notes:

  • Most of the code has unit tests, but the pypi_index.bzl extension could have more.
  • You can see the lock file for what the output of all of this code would be on your
    platform.
  • Thanks to @dougthor42 for testing this using the credentials helper against a private
    registry that needs authentication to be accessed.

Work towards #1357

This is a variant of bazelbuild#1625 and was inspired by bazelbuild#1788. In bazelbuild#1625, we
attempt to parse the simple API HTML files in the same `pip.parse`
extension and it brings the follownig challenges:

* The `pip.parse` cannot be easily use in `isolated` mode and it may
  be difficult to implement the isolation if bazelbuild/bazel#20186
  moves forward.
* Splitting the `pypi_index` out of the `pip.parse` allows us to accept
  the location of the parsed simple API artifacts encoded as a bazel
  label.
* Separation of the logic allows us to very easily implement usage of
  the downloader for cross-platform wheels.
* The `whl` `METADATA` might not be exposed through older versions of
  Artifactory, so having the complexity hidden in this single extension
  allows us to not increase the complexity and scope of `pip.parse` too
  much.
* The repository structure can be reused for `pypi_install` extension
  from bazelbuild#1728.

TODO:
- [ ] Add unit tests for functions in `pypi_index.bzl` bzlmod extension if
  the design looks good.
- [ ] Changelog.

Out of scope of this PR:
- Further usage of the downloaded artifacts to implement something
  similar to bazelbuild#1625 or bazelbuild#1744. This needs bazelbuild#1750 and bazelbuild#1764.
- Making the lock file the same on all platforms - We would need
  to fully parse the requirements file.
- Support for different dependency versions in the `pip.parse` hub repos
  based on each platform - we would need to be able to interpret
  platform markers in some way, but `pypi_index` should be good already.
- Implementing the parsing of METADATA to detect dependency cycles.
- Support for `requirements` files that are not created via
  `pip-compile`.
- Support for other lock formats, though that would be reasonably
  trivial to add.

Open questions:
- Support for VCS dependencies in requirements files - We should
  probably handle them as `overrides` in the `pypi_index` extension and
  treat them in `pip.parse` just as an `sdist`, but I am not sure it
  would work without any issues.
@aignas aignas requested a review from rickeylev as a code owner March 29, 2024 04:01
@aignas
Copy link
Collaborator Author

aignas commented Apr 2, 2024

I was normalizing the names in the requirements to use underscores. That works with PyPI (and Artifactory if I remember correctly), but it seems that it does not work with every index. I have changed the code to use exactly the same package name as is in the supplied requirements file.

I will be adding various unit tests from now on, so expect a little bit more activity here.

Open questions:

  • Should we just have a single array for the index_urls?
  • What kind of docs should we have for documenting that we support credential helper. @dougthor42 , maybe you could help me with a paragraph on how you got it working? (Assuming the normalization was the actual issue)

@dougthor42
Copy link
Contributor

What kind of docs should we have for documenting that we support credential helper. @dougthor42 , maybe you could help me with a paragraph on how you got it working?

I'd be more than happy to! I think I'll be able to get something whipped up within a day or three. I'll just submit a PR onto aignas:feat/pip-simpleapi.

I have changed the code to use exactly the same package name as is in the supplied requirements file.

Thanks! That part works wonderfully 😁.

Other Issues/Comments

  1. I'm now having issues because some of the ~440 packages we use don't have wheels but that's out of scope of this PR.
  2. I get spammed with 404 warnings when the downloader tries to find things like numpy in our private index.
    • Perhaps it can be downgraded to INFO or DEBUG? That said, it's not a big issue and we can resolve it in other ways (like the virtual index mentioned elsewhere).
  3. ValueErrors in whl_target_platforms.bzl. Some packages have nonstandard / invalid / just old platform descriptors.
    • grpcio v1.62.1 has linux_armv7l
    • freetype_py v2.1.0 has macosx_10_6_intel
    • libclang v13.0.0 has manylinux_2_17_armv7l
    • watchdog v3.0.0 has win_ia64
    • Those are all the ones I found in our project.
    • I had to add a debug statement to parse_whl_name.bzl so that it would print the package that was being processed when the error occured - it might be nice to pass file down to whl_target_platforms._cpu_from_tag so that the error message has the package name in it.

Here's an example of the error in (3):

ERROR: Traceback (most recent call last):
        File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/private/bzlmod/pip.bzl", line 447, column 30, in _pip_impl
                _create_whl_repos(module_ctx, pip_attr, hub_whl_map, whl_overrides, simpleapi_cache)
        File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/private/bzlmod/pip.bzl", line 249, column 29, in _create_whl_repos
                whl = select_whl(
        File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/private/whl_target_platforms.bzl", line 199, column 41, in select_whl
                platforms = whl_target_platforms(platform_tag)
        File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/private/whl_target_platforms.bzl", line 234, column 25, in whl_target_platforms
                cpus = _cpu_from_tag(platform_tag)
        File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/private/whl_target_platforms.bzl", line 268, column 13, in _cpu_from_tag
                fail("Unrecognized tag: '{}': cannot determine CPU".format(tag))
Error in fail: Unrecognized tag: 'macosx_10_6_intel': cannot determine CPU
ERROR: error evaluating module extension pip in @@rules_python~//python/extensions:pip.bzl

python/private/bzlmod/pip.bzl Show resolved Hide resolved
python/private/bzlmod/pip.bzl Outdated Show resolved Hide resolved
python/private/bzlmod/pip.bzl Show resolved Hide resolved
Copy link
Contributor

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It sounds like dougthor42 identified a couple edge cases, but I think the main part of this is good now, so approving so you can later merge

@dougthor42
Copy link
Contributor

dougthor42 commented Apr 5, 2024

After 5643866 it looks like everything's working very well! 404s are still spammy but I can live with that. From what I can tell, both wheels and sdists were correctly installed, it didn't fail out on unknown platforms, and it even fell back to standard pip for one of our more annoying packages klayout1.

This is so awesome, thank you very much!

Footnotes

  1. Turns out this is just because the klayout version we were using was yanked from pypi like 3 days ago and my branch wasn't aware that we updated to 0.29.0...

@aignas
Copy link
Collaborator Author

aignas commented Apr 5, 2024

I'll merge this and start looking at supporting multi-platform whl selects which is what #1744 tried to achieve.

@aignas aignas added this pull request to the merge queue Apr 5, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 5, 2024
This introduces 3 attributes and the minimal code to be able to download
wheels
using the bazel downloader for the host platform. This is not yet adding
support for targeting a different platform but just allows us to get the
wheels
for the host platform instead of using `pip`.

All of this is achieved by calling the PyPI's SimpleAPI (Artifactory
should work
as well) and getting the all URLs for packages from there. Then we use
the `sha256`
information within the requirements files to match the entries found on
SimpleAPI
and then pass the `url`, `sha256` and the `filename` to `whl_library`,
which uses
`repository_ctx.download`.

If we cannot find any suitable artifact to use, we fallback to legacy
`pip` behaviour.

Testing notes:
* Most of the code has unit tests, but the `pypi_index.bzl` extension
could have more.
* You can see the lock file for what the output of all of this code
would be on your
  platform.
* Thanks to @dougthor42 for testing this using the credentials helper
against a private
  registry that needs authentication to be accessed.

Work towards #1357
@aignas
Copy link
Collaborator Author

aignas commented Apr 5, 2024

This is the first time I see a non closed PR after merging. I'll just close this one manually.

@aignas aignas closed this Apr 5, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 6, 2024
…r of python packages (#1834)

Add credential helper docs that were requested in
#1827 (comment)
github-merge-queue bot pushed a commit that referenced this pull request Apr 22, 2024
…ed (#1854)

I have found this to be useful when debugging auth issues when using a
private
repo and I thought that having it configurable from the user's
MODULE.bazel
is a better user experience.

Ammending #1827.
@@ -58,6 +58,7 @@ register_toolchains("@pythons_hub//:all")

pip = use_extension("//python/extensions:pip.bzl", "pip")
pip.parse(
experimental_index_url = "https://pypi.org/simple",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional not to use PIP_INDEX_URL env here, and why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to override this value you can use bazel downloader config. In the general sense it is not a given that people will have their PIP_INDEX_URL configured to contain the index that works for fetching these dependencies.

Hence to avoid unintended breakage, this is hardcoded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants