-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal: repos to create a toolchain from a locally installed Python #2000
Conversation
Alright, I mentioned this in chat, but I'll post a more full description of some of the problems a local auto configuring toolchain encounters. Ideally, we want something like this:
Unfortunately, I don't see a way to do that without triggers the evaluation of the In order to know the TCW values, we have to...IDK. I think manually associate it with the local_runtime() call? But, that also seems wrong -- /usr/bin/python3 is valid for both linux and mac. So I guess transform rctx.os to a constraint sting (but we want to ignore rctx.os on linux if its a windows path)? Maybe the host constraint is the other option? For the target settings, we have to run python in order to determine the Python value. But a linux host can't run the windows path and vice versa. So now...idk. Omit target_settings? Consider the whole toolchain incompatible? But, lets say its the happy path -- we can run the given path, get the version, figure out the os, etc. This is all computed within the runtime repo rule. If a different repo rule generates the toolchain() calls, that info needs to be passed in -- which means in order to access that info, it has to trigger evaluation of the runtime repo. This is true even if we do something like The only paths I see to deal with all this are:
|
I like the approach 1. out of your listed ones. It should be cheap to just have: def _linux_impl(rctx):
if not _is_linux(rctx):
rctx.file("BUILD.bazel", """\
toolchain(
name = "linux_toolchain",
toolchain = "@linux_runtime//:runtime",
target_compatible_with = ["@platforms//:incompatible"],
target_settings = [], # maybe we have to add an incompatible target setting here.
)"""
return
... Because the repository rule is not doing any network IO and the failure is Talking about python versions, do you want this toolchain to be used when I think that if we could affect the default value of |
Yes. We end up having to run python no matter what, so we have the version. Integrating it with the version-aware parts just requires adding the target_settings value. So it should be easy.
By default, the local ones, because they're opt in. It's technically up to the user, though, since they ultimately can futz with the toolchain registration order. |
b42834c
to
8b115f8
Compare
* make logging accept string * add which_unchecked * add text_util.str make watch/watch_tree calls options for earlier bazel support only basic bzlmod support is implemented, and 6.4 lacks some necessary features to test it
8b115f8
to
c46f3aa
Compare
OK, ready for review. I've been sitting on this awhile, so I pulled back the scope to just the repository rules that can set up the runtime and toolchains. It works, but you can see that there's a decent amount of configuration needed in MODULE.bazel. Integrating it into the python bzlmod extension can clean that up, but it got surprisingly messy. I'll post in chat about some of the design/behavior things we need to decide on. Using it in WORKSPACE is probably similar, but I haven't tried it with workspace builds yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how clean your repository_rule
code is. Really nice structure and I like how self-contained the code is. Thank you!
native.config_setting( | ||
name = "_is_major_minor", | ||
flag_values = { | ||
_PYTHON_VERSION_FLAG: major_minor, | ||
}, | ||
) | ||
native.config_setting( | ||
name = "_is_major_minor_micro", | ||
flag_values = { | ||
_PYTHON_VERSION_FLAG: major_minor_micro, | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over a call, it could be nice to add another flag to allow the user to disable the toolchain with a line in the .bazelrc
:
build --@rules_python//python/config_settings:local_toolchain=disable
This could be done in a followup PR to keep this more scoped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is an interesting idea. I think I like it? Agree out of scope for this PR.
Hm. Thinking out loud...
It would make for an easy way to use local instead of the hermetic runtimes. This sounds especially appealing for early adoption and testing -- users don't have to put a bunch of experimental config API in their MODULE files.
The basic thing we could do is:
- Have bzlmod generate two repos per version (one local, one hermetic)
- Use select() on the
toolchain
attribute, switching based on the flag.
And I think this would work for version-aware toolchains, too. The bzlmod config is given the version so no need to load the local_runtime repo to determine it.
And then we can punt on the problems of mix-and-matching local and hermetic runtimes for now.
repo_utils = struct( | ||
# keep sorted | ||
debug_print = _debug_print, | ||
execute_checked = _execute_checked, | ||
execute_unchecked = _execute_unchecked, | ||
execute_checked_stdout = _execute_checked_stdout, | ||
execute_unchecked = _execute_unchecked, | ||
get_platforms_os_name = _get_platforms_os_name, | ||
getenv = _getenv, | ||
is_repo_debug_enabled = _is_repo_debug_enabled, | ||
debug_print = _debug_print, | ||
which_checked = _which_checked, | ||
logger = _logger, | ||
watch = _watch, | ||
watch_tree = _watch_tree, | ||
which_checked = _which_checked, | ||
which_unchecked = _which_unchecked, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this is getting big, but it's good to leave this as is for now. All of the things are low level repository_ctx
related things, so it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've grown to really like our little repo_utils library here. The logger, error reporting, compat functions -- it's actually pretty nice. Part of me wants to rewrite it in an object-oriented fashion so we can do e.g. repo = repo_utils.create(rctx); repo.which_checked(...); repo.log(...); repo.watch(...); etc
just to have a cleaner API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how go has the ctx
and passes it around everywhere. It is kind of a sign that IO is being done here or there. That makes pure starlark functions really obvious.
So I like the current design a lot. :)
@@ -0,0 +1,5 @@ | |||
common --action_env=RULES_PYTHON_BZLMOD_DEBUG=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, using this in tests is really useful. Should we have this in CI for the examples
jobs as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps debug, sure. It can be a bit spammy, though. We should document it somewhere, though I'm not sure where. It'd be nice if we had a place we can put our debugging tips we've all picked up along the way.
Hm, maybe we should create repo_utils.fail()
, and we can have it always include a message like "set --action_env=blabla for debugging information"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a Troubleshooting
section might be good. We could point to Troubleshooting sub-sections within Toolchains, PyPI integration, etc.
…o local.py.toolchains
…les_python into local.py.toolchains
BTW, I realized that the current hermetic toolchain understands if it is Linux GLIBC. Is it possible to detect if the interpreter is built for MUSL vs GLIBC? |
After a bit of searching, yes, i think so. The info looks to be in the platforms module: https://stackoverflow.com/questions/72272168/how-to-determine-which-libc-implementation-the-host-system-uses |
Regarding I know that this is meant for local toolchains, so is low priority IMHO, but it seems that having the |
Ok, I was a bit wrong about how easy detecting musl is.
The other option is to use Since it seems like the answer is to somehow parse something out of the executable's ELF information, I tried some various system utils to do the same, but couldn't figure out how. Looking through sysconfig, there do appear to be several mentions of musl in some variables, e.g. as part of command line argument. So that could be a weak signal. |
Thanks for digging.
Regarding use of packaging, it is used in all of the whl metadata parsing (and env marker eval), so using it should be OK.
We could vendor it if needed, just like what pip is doing with it's dependencies, or download it like we do for whl_library. I would probably be +1 on vendoring it since it is such a critical piece of code in the bootstrap.
…On 20 July 2024 08:55:21 GMT+09:00, Richard Levasseur ***@***.***> wrote:
Ok, I was a bit wrong about how easy detecting musl is.
`platform` has a `libc_ver()` function, which can return the libc implementation, _but_ it doesn't know how to indicate something is musl. It'll just return empty string. Which maybe that'll have to suffice.
The other option is to use `packaging`, but that isn't part of the stdlib, and the way it works is a decent amount of code. A small ELF format parser to find some path to something, then runs it to get some output. There's a PR for python itself, python/cpython#103784, however, it seems to have stalled out.
Since it seems like the answer is to somehow parse something out of the executable's ELF information, I tried some various system utils to do the same, but couldn't figure out how.
Looking through sysconfig, there do appear to be several mentions of musl in some variables, e.g. as part of command line argument. So that could be a weak signal.
--
Reply to this email directly or view it on GitHub:
#2000 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
|
This adds the primitives for defining a toolchain based on a locally installed Python.
Doing this consists of two parts:
The runtime repos create platform runtimes, i.e, it sets py_runtime.interpreter_path.
This means the runtime isn't included in the runfiles.
Note that these repo rules are largely implementation details, and are definitely not
stable API-wise. Creating public APIs to use them through WORKSPACE or bzlmod will
be done in a separate change (there's a few design and behavior questions to discuss).
This is definitely experimental quality. In particular, the code that tries
to figure out the C headers/libraries is very finicky. I couldn't find solid docs about
how to do this, and there's a lot of undocumented settings, so what's there is what
I was able to piece together from my laptop's behavior.
Misc other changes:
pyenv uses
$0
to determine what to re-exec. The:current_interpreter_executable
target used its own name, which pyenv didn't understand.
passing a string causing an error. It's also just a bit more convenient when
doing development.
makes following logging output easier.
repo_utils.execute()
report progress.repo_utils.getenv
,repo_utils.watch
, andrepo_utils.watch_tree
:backwards compatibility functions for their
rctx
equivalents.repo_utils.which_unchecked
: callswhich
, but allows for failure.repo_utils.get_platforms_os_name()
: Returns the name used in@platforms
forthe OS reported by
rctx
.watch()
orgetenv()
, if available. Thismakes repository rules better respect environmental changes.
more involved than other tests, so some docs help.