-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix per-file config interaction with one py_binary per main #1664
Conversation
What would happen if we have lib_and_main.py, which has code usable as library code (e.g. in lib2.py) but also should have a py_binary gerated for that because of an if name main statement? |
A I've added a new Let me know your thoughts! |
+1 for the need for this change. Without this change, per-file generation mode does not really work for us if we are not strict about having the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I would like this to be merged, but not sure about the usage of py_binary as a py_library. Maybe @rickeylev could chime in and advise us here? Should we use py_binary as a convenient py_library implementation when a file supports both, being used as a lib and as a script, are there trade offs in doing so?
gazelle/python/testdata/binary_without_entrypoint_per_file_generation/README.md
Show resolved
Hide resolved
A binary shouldn't be used as a library, insofar as Bazel dependencies are concerned. The main reasons are:
|
Thanks @rickeylev. Some basic thoughts that come to mind:
With the current behavior, unless the file is named Options I can think of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the option 2 as it is the option with the least surprising behaviour.
We should genurate a py_library per file with additional py_binary if there is ifmain block in there. We could improve things by not generating py_library if the only code in the file is the said ifmain block or have directives to not generate a library for a particular file.
@adzenith, what do you think?
Unfortunately, it's not so simple. I held that same opinion until a couple years ago (internally, there was a FR for exactly this, which I originally argued in favor of, but as time passed, it became more clear keeping the two separate is better architecturally). The issue is that the expected behavior of rules is that deps-like attributes also include the runfiles of a target. And where is the executable file? It's part of the runfiles. Filtering it out of the runfiles can't be done without flattening the runfiles. So then the only other answer is a special provider with a different set of runfiles, and thus, we have re-invented the deprecated data_runfiles feature, except now we're calling to deps_runfiles (the reason data_runfiles is deprecated and discouraged is because it creates surprising behavior -- we don't expect that changing the name of the dependency edge changes the set of files received. I've been confused by that behavior several times and seen it bite others). We're still left with the other major problem of unnecessary analysis time logic, though, as well as the other parts of points (2) and (3) I posted. It starts to turn into many special cases and special py-rule-specific behavior. Predominately, we (within Google) have found that binaries only need libraries to make testing easier -- a unit test typically wants to import some of the functions of the binary to test them. To serve that end, the popular py_binary-wrapping macros have an opt-in argument to generate a py_library that is testonly=true. If the same source file is supposed to be both a library and binary, people have to purposefully do that to consciously face the decisions it entails. I originally argued that this testonly lib be opt-out instead, but data indicated that it wouldn't be used much (it was something unequivocal like 25%)
This is fine. The same source file can be in the srcs of multiple targets. I'd recommend that, if both a binary and library are generated, that the binary depend on the library. This just reduces the amount of repetition in the definitions and looks a bit cleaner more than anything, so not strictly necessary. I don't know how advanced of analysis gazelle does, but if it can look around at other files, it could look for a test-looking file that imports the binary. Another heuristic might be to look if a file looks "entry point like", e.g., if the file that has |
An extension of option 2 could be to give the user some control through an in-file directive that says whether the target type should be library, binary, or both. Hopefully, this will obviate the need for intelligence within Gazelle. Without this directive, the ifmain line check can be used to generate both types of targets, or always library targets regardless of the file content (unless the file name is The in-file directive can look like:
Unfortunately, all of this makes the configurability of Gazelle more complex. But because the Python language spec itself is not strict, and Gazelle is not positioned high enough in the ecosystem for enforcing opinions, this is unavoidable. |
Sigh. For this reason, for better or worse, I did not create separate |
This turned out to be a much more controversial change than what I was expecting! I was trying to fix what I perceived as a bug in the current behavior. Thank you all for all the comments. I read through them all but it looks like there's not yet an agreement on the best path forward - so I'll throw my hat in the ring as well.
I personally would prefer option 1. I think option 2 is actually more surprising. Why am I ending up with two targets? What are the implications of that? I think it's pushing an implementation detail onto the user, whereas with option 1 everything "just works" from the user perspective. I also am not sure how often this will actually come up. Do people frequently have
I feel like users generally fall into two camps:
I have the feeling that the first camp doesn't really mind the negative effects of depending on a
My preference would be simply to document the following:
Then users who just want things to work can just work, and for anybody who is looking into performance they will find that the best fix is to structure their code slightly differently (arguably better, perhaps). Thoughts? |
+1 to all @adzenith said above. |
7de137b
to
bc90924
Compare
I added documentation about the negative effects of depending on a |
bc90924
to
957345e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the documentation, I think as is it is good enough for now.
Thank you! I appreciate it |
Changes: - Establish a mechanism for defining external python requirements via a `requirements.in`, generating updates to a lock file with `bazel run //:requirements.update` - Configure gazelle python and protobufs - Configures via gazelle directives to use a file based approach, stick with the BUILD configuration, disable go, default visibility to private and resolve py imports - Update dev_setup to install openjdk (java), tree, ranger, ag New Commands: ``` bazel run //:requirements.update bazel run //:gazelle_python_manifest.update bazel run //:gazelle ``` New Tests: ``` //:requirements_test //::gazelle_python_manifest.test ``` Notes: - One of the recurring issues I have every time I deal with updating WORKSPACE is slight incompatibilities with various repos that result in incomprehensible output errors. - stackb/rules_proto is one of the only things that seems to support what I'm looking for, and while it is getting steady contributions hasn't pubilshed a release in about a year. https://github.com/rules-proto-grpc/rules_proto_grpc. Additionally, it could be handy to look into what it would take to make my own "shitty" version of these rules to learn more about what's going on in gazelle's / bazel's internals - bzlmod migration didn't go well for me, most tutorials out there still use WORKSPACE, might be good to look into in the future / figure out how to migrate piecemeal, but bypassed this and 7.0.1 bazel upgrade for now Related Issues: - Noting the issue with requiring a gazelle directive for proto libraries: bazelbuild/rules_python#1703 - bazelbuild/rules_python#1664 fixes the reason I had to "disable" my py_binary, wait for that to release and then update. I could probably patch this in, but don't quite know how 🤷 - Sounds like there's some talk around gazelle c++ bazel-contrib/bazel-gazelle#910 - I'm helping ;) bazelbuild/rules_python#1712 (doc update) References: - https://github.com/bazelbuild/bazel-gazelle - https://github.com/stackb/rules_proto - https://github.com/bazelbuild/rules_python/tree/main/gazelle Future Things to Look Into: - Setup a gazelle test that checks output of `//:gazelle -- -mode diff` is zero - Configure rust with gazelle: https://github.com/Calsign/gazelle_rust - A really cool / general parser / generator (what github uses for "semantic" tool): https://tree-sitter.github.io/tree-sitter/ Could use to make small code generator / modification? Or maybe more general utilities? - General compile command completion: https://github.com/hedronvision/bazel-compile-commands-extractor - Maybe play with https://github.com/rules-proto-grpc/rules_proto_grpc as an alternative to stackb - If I can't use gazelle for some things, mess around with https://github.com/bazelbuild/buildtools/tree/master/buildozer
This previous PR added the ability to make a
py_binary
target per file ifif __name__ == "__main__"
tokens were found in the file. This works great in the default case, but whenpython_generation_mode
is set tofile
, the plugin now attempts to make both apy_binary
and apy_library
target for each main file, which results in an error.This PR modifies the behavior to work properly with per-file target generation, and adds tests for this case.