Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip.parse does not accept lock files with duplicated dependencies #1615

Closed
AndrewGuenther opened this issue Dec 15, 2023 · 8 comments · Fixed by #1620
Closed

pip.parse does not accept lock files with duplicated dependencies #1615

AndrewGuenther opened this issue Dec 15, 2023 · 8 comments · Fixed by #1620
Labels

Comments

@AndrewGuenther
Copy link

AndrewGuenther commented Dec 15, 2023

🐞 bug report

Affected Rule

pip.parse

Is this a regression?

Yes, this behavior is only present when using bzlmod

Description

When using pip.parse, if a dependency is repeated in your requirements lock, you'll get an error. This is handled just fine by pip and is necessary for tools like poetry to export this way because it's possible for the same dependency to have different constraints in different parts of the dependency tree.

Ref: python-poetry/poetry#5688

🔬 Minimal Reproduction

Repo with repro here: https://github.com/AndrewGuenther/rules_python_1615_repro

The main branch uses bzlmod and workspace uses WORKSPACE. The workspace branch builds fine, but main does not. Same requirements on both branches, the only difference is bzlmod.

🔥 Exception or Error

If you try and build the example above, you'll get an error to the following effect:


ERROR: Traceback (most recent call last):
        File "/home/andrew/.cache/bazel/_bazel_andrew/9ca33225fba41f1be5682708dd655aad/external/rules_python~0.27.1/python/private/bzlmod/pip.bzl", line 298, column 30, in _pip_impl
                _create_whl_repos(module_ctx, pip_attr, hub_whl_map, whl_overrides)
        File "/home/andrew/.cache/bazel/_bazel_andrew/9ca33225fba41f1be5682708dd655aad/external/rules_python~0.27.1/python/private/bzlmod/pip.bzl", line 129, column 20, in _create_whl_repos
                whl_library(
Error in repository_rule: A repo named pip_38_pyjwt is already generated by this module extension at /home/andrew/.cache/bazel/_bazel_andrew/9ca33225fba41f1be5682708dd655aad/external/rules_python~0.27.1/python/private/bzlmod/pip.bzl:129:20
ERROR: error evaluating module extension pip in @@rules_python~0.27.1//python/extensions:pip.bzl
INFO: Elapsed time: 1.071s
INFO: 0 processes.
ERROR: Build did NOT complete successfully
FAILED: 
    Fetching module extension pip in @@rules_python~0.27.1//python/extensions:pip.bzl; starting

🌍 Your Environment

Operating System:

  
Ubuntu
  

Output of bazel version:

  
6.4.0
  

Rules_python version:

  
0.27.1
  

Anything else relevant?

@AndrewGuenther
Copy link
Author

So this appears to be a bzlmod exclusive behavior? We can see this issue with bzlmod starting from 0.15.0, but if we use pip_parse in our WORKSPACE pointing to the same version it works just fine. Are these implementations different between bzlmod and non-bzlmod?

@AndrewGuenther
Copy link
Author

Alright, I've now created a repo which shows this is exclusively an issue with bzlmod: https://github.com/AndrewGuenther/rules_python_1615_repro

The main branch uses bzlmod and the workspace branch is the equivalent. main fails to build, but workspace builds no problem.

@rickeylev
Copy link
Collaborator

That behavior differs between bzlmod and workspace here is a bit strange. The two implementation re-use a lot of the lower level pieces.

The thing that sticks out to me is both use the same function to parse the requirements. That parsing ignores the [extras] part of a requirements line, e.g. foo ... and foo[bar] ... gets returned as [foo, foo] in the parsed result. So, something in the workspace code must be doing...something...to prevent a duplicate call to whl_library(name="foo"), but I don't see where that is happening.

@rickeylev
Copy link
Collaborator

Uhhh, huh.

So the workspace code path isn't de-duplicating the names. It is calling whl_library(name="pip_pyjwt") twice, but the repository rule implementation is only invoked once. I have no idea how this isn't an error. I don't see any maybe wrappers being used.

In this case, the last one wins. The last one is the one whose requirement line sorts last (there's an earlier sorted() call on the list of requirement lines).

In practice, I think this works out OK? Because the requirements file expands out extras into their actual packages? I'm not too familiar with how extras work. If so, then an earlier line being ignored doesn't matter.

It should be easy to mimic this behavior in bzlmod. We just have to dedupe the names, giving preference to the last one.

@AndrewGuenther
Copy link
Author

@rickeylev I'm happy to submit a PR if you can point me in a direction. Unless this is one of those "it'd take the same time to explain as it would to implement" sort of things.

@AndrewGuenther AndrewGuenther changed the title pip_parse does not accept lock files with duplicated dependencies pip.parse does not accept lock files with duplicated dependencies Dec 15, 2023
rickeylev added a commit to rickeylev/rules_python that referenced this issue Dec 15, 2023
When a package has extras, the requirements parsing discards the extra
portion of the name and just returns the base name. When a repository
is created for each entry, this means the same name is used for multiple
repositories.

Under WORKSPACE builds, duplicate repository names aren't an error. It
appears the last defined repo takes affect. Under bzlmod, duplicate repo
names are an error.

To fix, mimic the last-defined-wins behavior in bzlmod by using a map to
dedupe the package names.

Fixes bazelbuild#1615
@rickeylev
Copy link
Collaborator

I've got a fix, so now its just about constructing a test case.

Is there a trick to generating a requirements file with both pyjwt and pyjwt[crypto] entries? When I do it myself, it just has pyjwt[crypto].

rickeylev added a commit to rickeylev/rules_python that referenced this issue Dec 16, 2023
When a package has extras, the requirements parsing discards the extra
portion of the name and just returns the base name. When a repository
is created for each entry, this means the same name is used for multiple
repositories.

Under WORKSPACE builds, duplicate repository names aren't an error. It
appears the last defined repo takes affect. Under bzlmod, duplicate repo
names are an error.

To fix, mimic the last-defined-wins behavior in bzlmod by using a map to
dedupe the package names.

Fixes bazelbuild#1615
rickeylev added a commit to rickeylev/rules_python that referenced this issue Dec 16, 2023
When a package has extras, the requirements parsing discards the extra
portion of the name and just returns the base name. When a repository
is created for each entry, this means the same name is used for multiple
repositories.

Under WORKSPACE builds, duplicate repository names aren't an error. It
appears the last defined repo takes affect. Under bzlmod, duplicate repo
names are an error.

To fix, mimic the last-defined-wins behavior in bzlmod by using a map to
dedupe the package names.

Fixes bazelbuild#1615
rickeylev added a commit to rickeylev/rules_python that referenced this issue Dec 16, 2023
When a package has extras, the requirements parsing discards the extra
portion of the name and just returns the base name. When a repository
is created for each entry, this means the same name is used for multiple
repositories.

Under WORKSPACE builds, duplicate repository names aren't an error. It
appears the last defined repo takes affect. Under bzlmod, duplicate repo
names are an error.

To fix, mimic the last-defined-wins behavior in bzlmod by using a map to
dedupe the package names.

Fixes bazelbuild#1615
@AndrewGuenther
Copy link
Author

It's a quirk with poetry and it's export function. If you have a pyproject.toml like this:

[tool.poetry]
name = "example"
version = "0.1.0"
description = "Example"

[tool.poetry.dependencies]
python = "~3.8"
djangorestframework-simplejwt = "^5.2.2"
PyJWT = {extras = ["crypto"], version = "^2.1.0"}

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

and run poetry export you'll get two instances of pyjwt in the requirements output.

@AndrewGuenther
Copy link
Author

The relevant poetry behavior is discussed a bit more in this issue: python-poetry/poetry#5688

It's not so much an issue with extras as it is with constraints in general.

rickeylev added a commit to rickeylev/rules_python that referenced this issue Dec 18, 2023
When a package has extras, the requirements parsing discards the extra
portion of the name and just returns the base name. When a repository
is created for each entry, this means the same name is used for multiple
repositories.

Under WORKSPACE builds, duplicate repository names aren't an error. It
appears the last defined repo takes affect. Under bzlmod, duplicate repo
names are an error.

To fix, mimic the last-defined-wins behavior in bzlmod by using a map to
dedupe the package names.

Fixes bazelbuild#1615
github-merge-queue bot pushed a commit that referenced this issue Dec 19, 2023
#1620)

Requirements files are permitted to have duplicate lines for the same
package. An example
of this is having separate lines for a package and its extras. When we
parse requirements,
the parser discards the "extra" portion of the entry and returns a list
of all the packages
as-is. When a repository is created for each entry, this means the same
name is used for
multiple repositories.

Under WORKSPACE builds, duplicate repository names aren't an error. It
appears that last
defined repo takes affect. Under bzlmod, duplicate repo names are an
error.

To fix, mimic the last-defined-wins behavior in bzlmod by using a map to
dedupe the package names.

Fixes #1615
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants