Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for envsubst in extra_pip_args #1673

Merged
merged 10 commits into from
Jan 31, 2024
30 changes: 29 additions & 1 deletion python/private/bzlmod/pip.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,21 @@ You cannot use both the additive_build_content and additive_build_content_file a
whl_mods = whl_mods,
)

def _envsubst(module_ctx, template_string, varnames):
for varname in varnames:
value = module_ctx.os.environ.get(varname, "")
template_string = template_string.replace("$%s" % varname, value)
template_string = template_string.replace("${%s}" % varname, value)
segments = template_string.split("${%s:-" % varname)
template_string = segments.pop(0)
for segment in segments:
default_value, separator, rest = segment.rpartition("}")
if not separator:
fail("Environment substitution expression " +
"\"${%s:-\" is missing the final \"}\"" % varname)
template_string += (value if value else default_value) + rest
return template_string

def _create_whl_repos(module_ctx, pip_attr, whl_map, whl_overrides):
python_interpreter_target = pip_attr.python_interpreter_target

Expand Down Expand Up @@ -122,7 +137,10 @@ def _create_whl_repos(module_ctx, pip_attr, whl_map, whl_overrides):
for entry in sorted(parse_result.requirements)
}.values()

extra_pip_args = pip_attr.extra_pip_args + parse_result.options
extra_pip_args = [
_envsubst(module_ctx, arg, pip_attr.envsubst)
for arg in pip_attr.extra_pip_args
] + parse_result.options
vonschultz marked this conversation as resolved.
Show resolved Hide resolved

if hub_name not in whl_map:
whl_map[hub_name] = {}
Expand Down Expand Up @@ -344,6 +362,16 @@ def _pip_impl(module_ctx):

def _pip_parse_ext_attrs():
attrs = dict({
"envsubst": attr.string_list(
mandatory = False,
doc = """\
A list of environment variables to substitute (e.g. `["PIP_INDEX_URL",
"PIP_RETRIES"]`). The corresponding variables are expanded in `extra_pip_args`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vonschultz, could you elaborate a little on what is the usecase here? Is it only for the requirements files that your project owns? What should we do for the PIP_INDEX_URL in a non-root module? On one hand I think that the root module should be able to override the index url + extra index urls for everyone, but if that is what we want to achieve with this PR, then maybe having support for this in:

pip.override(
    index_url = "foo",
    extra_index_urls = ["bar", "baz"],
)

might be maybe cleaner?

We could append the index_url and prepend the extra_index_urls to the extra pip args and those would take precedence (I think).

Are there other env var names that you want to expand here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task I need it for is converting a project from WORKSPACE.bazel to MODULE.bazel in a case where the WORKSPACE.bazel uses environment variables (by writing them to a file using a repo rule and loading that file). Since loading isn't supported in MODULE.bazel I need the support for environment variables here instead. I have only been thinking in terms of the root module, so I'm not entirely sure what the desired behavior would be in a non-root module.

In principle, there are as many potential use cases as there are flags to pip, but to focus on the ones explicitly mentioned in the patch:

  • PIP_RETRIES, and also PIP_TIMEOUT for that matter, are useful when the user has an unstable Internet connection. Rather like Bazel's own --http_timeout_scaling, which a user might put in their ~/.bazelrc, it is a property of the environment we run in, rather than the repo itself.
  • PIP_INDEX_URL is given to us by our CI system, and is pointing to an AWS CodeArtifact URL. The PIP_INDEX_URL contains a short-lived token for authentication, so we need to read it from the environment rather than hardcoding it in the MODULE.bazel file.

I haven't been thinking of pip.override. Isn't that specific for a given Python package (distribution)?
https://rules-python.readthedocs.io/en/latest/api/extensions/pip.html#pip-override
It doesn't sound like it's something you'd use to override the index url.

I'm not convinced overriding the index url for everyone (including non-root modules) is the desired behavior. If someone uses the default https://pypi.org/simple certainly, we'd like to be able to redirect that, but if a non-root module very explicitly and purposefully uses a different index url containing a small set of packages not found on the public PyPI, then it's not a given that we should override that.

(Incidentally, I think extra_index_urls is a huge security issue, and should never be used. That's how we got hit by https://pytorch.org/blog/compromised-nightly-dependency/, and I'm sure there are many other cases like it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the environment variables I mentioned, I can also imagine PIP_PROXY being useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating on the extra index url usage.

I am wondering if the right thing here would be to have an attr for each thing pip supports - the downside would be that we would need to have stuff in sync with the pip version we support/use, but the upside would be that it would be more structured and we could more easily ensure that sensitive info does not leak into the lock file.

What do you think? How many extra attributes would you need? @rickeylev, do you have any thoughts on this?

using the syntax `$VARNAME` or `${VARNAME}` (expanding to empty string if unset)
or `${VARNAME:-default}` (expanding to default if the variable is unset or empty
in the environment).
""",
),
"hub_name": attr.string(
mandatory = True,
doc = """
Expand Down
Loading