Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a default hub repo concept in bzlmod #2094

Open
aignas opened this issue Jul 26, 2024 · 5 comments
Open

Support a default hub repo concept in bzlmod #2094

aignas opened this issue Jul 26, 2024 · 5 comments

Comments

@aignas
Copy link
Collaborator

aignas commented Jul 26, 2024

🚀 feature request

Relevant Rules

pip.parse

Description

Currently the pip.parse needs the hub_name parameter, which in reality is quite often set to pypi or pip in the root module. In the non-root modules the users should be also able to use pip or pypi names and this could be the default hub_name value.

Describe the solution you'd like

Minimum (for 1.0.0):

  • Set the hub_name default value to pypi (it should not be pip because we are not using pip here and pypi is a more generic term).
  • Support isolation of pip.parse extension usage via the isolate parameter.
  • We could force users to use the isolate parameter if the hub_name is not set, or if it is set to pip or pypi values

Extra (post 1.0.0):

  • Support passing pyproject.toml and/or constraint files to pip.parse instead of the lock file itself. Enable users to lock the dependency tree via @pypi//:lock target to generate a lock file in the users workspace. An alternative would be to merge the requirements in some way, but that could be harder. This greatly depends on Support uv as part of rules_python #1975 because uv has a lot of different flags to allow overriding dependencies in the constraint files and may enable us to have a working solution. Merging the lock file by just blindly using the lowest version of all could work, but may not be reliable and the solution to properly do the locking may be required.

The lock file merging is used in the go ecosystem and requiring users to either name the hub repo that collects the spoke repos or using a default name is the convention of rules_go/gazelle, see bazel-contrib/bazel-gazelle#1584. Also see bazelbuild/bazel#20186, which talks about stabilizing extension isolation feature. Having rules_python supporting it may be a good way to give feedback to bazel devs.

See #2088 for a slightly related topic.

Describe alternatives you've considered

The main alternative would be to not support it and keep things as they are, but that could make it harder to implement the isolation later down the line. What is more, supporting isolation of the extensions could make the pip code simpler, because we could just ask the user to use isolate if we see clashes.

@aignas aignas added the type: bzlmod bzlmod work label Jul 26, 2024
@aignas aignas added this to the v1.0.0 milestone Jul 26, 2024
@groodt
Copy link
Collaborator

groodt commented Jul 27, 2024

  • Set the hub_name default value to pypi (it should not be pip because we are not using pip here and pypi is a more generic term).

+1 For pypi and not pip

  • Support passing pyproject.toml and/or constraint files to pip.parse instead of the lock file itself. Enable users to lock the dependency tree via @pypi//:lock target to generate a lock file in the users workspace.

This is one additional reason why I've been keen to move dependency specifier lists into Starlark.

@rickeylev
Copy link
Contributor

I'm generally +1 on this idea.

However, I don't see how this can reliably interact with lock files, i.e., how specific versions of a transitive closures can be merged and respected and work reliably. The issue is the inclusion of arbitrary requirements from elsewhere in the module graph.

use isolate=True if there are clashes

I don't see how this would scale? Consider a triangle dependency situation:

  • X depends on A and B, directly or indirectly,
  • A requires foo >= 2.0
  • B requires foo <= 1.0

A and B work fine on their own. Its only when they both end up in a transitive module graph structure that issues occur.

  • Who is supposed to set isolate=True in this case? I don't think there's a good answer to this question.
  • How can the root module fix things? Can root can override a submodule to use isolate=True? Or cause a submodule to remap '@pypi' to some alternative hub_name without the clash?

@groodt
Copy link
Collaborator

groodt commented Jul 29, 2024

Yes, I should clarify as well. I'm only firmly +1 on the hub_name defaults.

For the rest, I'm -1 Im not convinced on any lock file merging for similar reasons as @rickeylev. I don't believe that its generally solvable, which will lead to more issues than it solves, and more maintenance burden.

I can be convinced of a solution where we let the root module re-lock against direct dependencies from transitive repositories, but even then, Im a little confused by the problems this is believed to solve. If somebody needs to compose code and dependencies at a python language level (not a bazel extension level), then I think the most standard path forward would be to publish packages (to PyPI or other) or to use git submodules or similar code sharing mechanisms to merge source code repositories.

I don't think languages (outside golang), can safely compose x_library rules across repositories in the presence of third-party dependencies. For example, in Java, you'd share a jar by publishing it to Maven Central. For rust, you'd share a crate by publishin it to crates.io. Similarly, for Python, you'd share a sdist/wheel by publishing it to PyPI. Private artifact registries are also possible for all these languages as well.

@rickeylev
Copy link
Contributor

I can imagine an API like:

pip.package(name = "foo")

And this basically acts like:

# requirements.in
foo
# requirements.txt
foo == blabla --hash blabla
dep-of-foo == etc etc

# module.bazel
pip.parse(
  hub_name = "pypi",
  requirements_lock = "requirements.txt"
)

And I guess it would run requirements.in -> requirements.txt conversion during the bzlmod extension phase. This also necessitates that the resulting requirements.txt isn't implicitly platform specific.

Adding version constraints to the pip.package() call seems problematic, though. It seems more like a "best effort, but no guarantee" type of thing.

@groodt
Copy link
Collaborator

groodt commented Jul 30, 2024

Yes, rules_rust has something similar for crates in cargo. You can load crates from a cargo lockfile, but you can also load them from specs in Starlark: https://github.com/bazelbuild/rules_rust/blob/b5ecaea4b5996912693135ea14447d22271e9cd5/examples/bzlmod/proto/MODULE.bazel#L77

crate = use_extension("@rules_rust//crate_universe:extension.bzl", "crate")

crate.spec(
    package = "prost",
    version = "0.12",
)
crate.spec(
    package = "prost-types",
    version = "0.12",
)

...

crate.from_specs()
use_repo(crate, "crates")

There's also a variation that looks like this:

crates_repository(
    name = "crate_index",
    cargo_lockfile = "//:Cargo.lock",
    lockfile = "//:Cargo.Bazel.lock",
    packages = {
        "async-trait": crate.spec(
            version = "0.1.51",
        ),
        "mockall": crate.spec(
            version = "0.10.2",
        ),
        "tokio": crate.spec(
            version = "1.12.0",
        ),
    },
)

And the following to relock CARGO_BAZEL_REPIN=1 bazel sync --only=crate_index

I'd think it would be possible to do something similar in python. I think they have things a bit easier because I don't think they need to deal with an sdist vs wheel situation. But yeah, the crate.from_specs() call would be solving dependencies behind the scenes and failing if there is no solution.

@aignas aignas removed this from the v1.0.0 milestone Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants