Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't depend on the system python #42

Closed
ben-ng opened this issue Nov 16, 2017 · 30 comments
Closed

Don't depend on the system python #42

ben-ng opened this issue Nov 16, 2017 · 30 comments

Comments

@ben-ng
Copy link

ben-ng commented Nov 16, 2017

The native rules have py_runtime but it doesn't look like rules_python respects it.

@mattmoor
Copy link
Contributor

Interesting. I've never seen this before :)

@duggelz FYI

@ben-ng ben-ng changed the title Don't depend on the system python, wheel, & pip Don't depend on the system python Nov 16, 2017
@ben-ng
Copy link
Author

ben-ng commented Nov 16, 2017

I edited the title because I may have misunderstood how piptool.py works, it looks like pip and wheel are already shipped in the par.

My question about py_runtime still stands.

@mattmoor
Copy link
Contributor

I'm also unfamiliar with py_runtime, but at a minimum we should replace this naked "python" with something that more portably grabs Python, e.g. this.

@duggelz
Copy link

duggelz commented Nov 16, 2017

Yes, we should respect py_runtime (it's a fairly new addition). I'm not sure it's available to workspace rules, though.

@excavador
Copy link

excavador commented Nov 27, 2017

I personally prefer the rules_go way - where golang downloaded by bazel as external dependency and works without dependency to system golang.

Right now I have number of issues from my team related to small differences between "python" in PATH on OSX machines and small differences between python2.7 versions.

I want to see bazel as ultimate tool which allows to control which python should be used and bring it.

How it looks like with rules_go

git_repository(
  name = "io_bazel_rules_go",
  remote = "https://github.com/bazelbuild/rules_go.git",
  commit = "a960d7e60fa0e3c19f4b5970dd9ab283fb46d367",
)
load("@io_bazel_rules_go//go:def.bzl", "go_rules_dependencies", "go_register_toolchains")
go_rules_dependencies()
go_register_toolchains(go_version="1.9.2")

@hwright
Copy link
Contributor

hwright commented Nov 27, 2017

@excavador That would be useful (and go a long way toward solving issues like #37)

@greggdonovan
Copy link
Member

@excavador I was able to download and compile python in a WORKSPACE rule but unable to use it via py_runtime/--python_top due to bazelbuild/bazel#3568. But it would be ideal to be able to do so.

We worked around the PATH issues with some .bazelrc overrides. E.g.:

build --action_env=PATH=/path/to/project/specific/python:/usr/bin:/bin:/sbin:/usr/sbin
test   --action_env=PATH=/path/to/project/specific/python:/usr/bin:/bin:/sbin:/usr/sbin

See the Environment Variables design for more info on why the local PATH is mapped into the sandbox environment by default.

@mattmoor @duggelz Are py_runtime / --python_path where Python support is heading or should we be working towards using toolchain support via, say, a new py_register_toolchains rule?

@excavador
Copy link

excavador commented Nov 27, 2017

The primary problem with action_env - you must have some fixed global installation path of python for everybody.
It's little better than system python, but problem still - just imagine how to switch to new version in some branch of your repository - for testing new python and switch back you forced to globally update some location two times

@excavador
Copy link

excavador commented Nov 29, 2017

One of the examples, why this ticket is important: #45

@duggelz
Copy link

duggelz commented Dec 15, 2017

I think a good intermediate step is to vendor the dependencies of piptool (pip, setuptools, etc) into this repository in extracted form, and write wrappers around piptool, etc, that do the required PYTHONPATH hacking and mangling to make it work. This doesn't fix the "my Python interpreter is bad" problem, but does fix the "my setuptools is too old", which is probably more common.

Talking to the Bazel team today, it sounds like their preference is the creation of Python toolchains, which replace the more limited py_runtime functionality, since they are making good progress on C++ and Java toolchains already.

@michaelsafyan
Copy link

Can we also have an explicit way to specify the runtime? It is a little weird that "py_binary" never explicitly points to the runtime (I'd expect there to be an optional "runtime" attribute that points to this label). It's also weird that "py_runtime" cannot be specified in "WORKSPACE" to dictate a workspace default.

@raliste
Copy link

raliste commented Feb 22, 2018

Updates?

@abergmeier
Copy link

abergmeier commented Apr 27, 2020

I think we would need to build CPython from scratch by default then!?

@pvcnt
Copy link

pvcnt commented Apr 27, 2020

That's one option, cf. what Dropbox is doing for an example: https://github.com/dropbox/dbx_build_tools

Another option that I recently tried is to download an interpreter (and required libraries) from Anaconda.

@abergmeier
Copy link

With Anaconda I would suspect that it at least is using system libc.

Did Dropbox link glibc into the binary?

@pvcnt
Copy link

pvcnt commented May 3, 2020

@abergmeier
Copy link

It seems that Dropbox is indeed linking glibc into the binary: https://github.com/dropbox/dbx_build_tools/blob/21d3fe06650ce07bcd3dc6acdea573ad3a477aae/build_tools/drte/tools/drte-build.sh#L475

I think there is an opportunity there to split this script into smaller Bazel genrules.
And CPython support basically means being forced to introduce cc toolchain.

@brandjon
Copy link
Member

Can we also have an explicit way to specify the runtime? It is a little weird that "py_binary" never explicitly points to the runtime (I'd expect there to be an optional "runtime" attribute that points to this label). It's also weird that "py_runtime" cannot be specified in "WORKSPACE" to dictate a workspace default.

FYI, py_binary now gets information about the runtime from the Python toolchain (see also toolchain docs). You can register a toolchain in the WORKSPACE with no constraints to make it the default.

@brandjon
Copy link
Member

py_binary depends on the system Python interpreter in its stub script. That issue is tracked in bazelbuild/bazel#8446 and bazelbuild/bazel#8685.

This issue can be used for how rules_python depends on the system Python for the pip integration rules.

@ddlees
Copy link

ddlees commented May 13, 2020

FWIW I ran into this problem and saw it wasn't solved so I just recently published some rules that leverage pyenv to download, compile and register the python toolchain with whatever py_runtime_pair the project calls for.

It's probably a little rough but feedback and PRs are welcome; I just hope what's currently there can help some folks. 😉 🍻

https://github.com/digital-plumbers-union/rules_pyenv

@keith
Copy link
Member

keith commented May 13, 2020

Is there any way to use the python from those rules with pip_install which currently relies on there being one in your PATH?

@ddlees
Copy link

ddlees commented May 14, 2020

@keith I think the way to do it is to modify the pip_install rule to use the py_runtime_pair that's registered as the @bazel_tools//tools/python:toolchain_type (if it's not already). I assume that means updating whatever pip toolchain that's responsible for choosing pip to use the pip that's in that python version's site-packages directory. That said, I'm wrong often enough that you may want to defer to one of the maintainers here for a better answer.

@pvcnt
Copy link

pvcnt commented May 15, 2020

There is unfortunately a quirk here, which is you cannot use a toolchain in a repository rule.

@brandjon
Copy link
Member

Yes. The fundamental problem is that WORKSPACE evaluation time is when non-hermetic things like network access should happen, and this conceptually occurs before the loading and analysis phase where toolchains and configurations are resolved.

You can of course manually set the Python interpreter used by pip_import to match the one used by the toolchain you expect to use.

@abergmeier
Copy link

The fundamental problem is that WORKSPACE evaluation time is when non-hermetic things like network access should happen, and this conceptually occurs before the loading and analysis phase where toolchains and configurations are resolved.

Are there theoretical reasons why actions inside WORKSPACE phase could not be hermetic? I would imagine adding hermetic WORKSPACE actions, which wait for all non-hermetic (implicitly) actions. Probably would require to change phases to something more dynamic.

@aaliddell
Copy link
Contributor

Take a look at https://github.com/soniaai/rules_poetry

As far as I can determine, it defers the fetching of dependency wheels etc until the build phase (after a small bootstrap in repo phase, which also avoids most .par files by also fetching pip itself), which means all of the tooling runs with the toolchain python and each wheel is fetched only when actually depended upon. This seems incredibly elegant, unless I’m missing something... I guess fetching data over network during build phase is frowned upon?

Conceptually, could these rules work similarly? As this would remove at least some of the current points where the system python is being used.

@ddlees
Copy link

ddlees commented May 17, 2020

There is unfortunately a quirk here, which is you cannot use a toolchain in a repository rule.

Ah... good point. You'd have to defer the pip install execution until the build phase which might be OK for some people but philosophically not great. Maybe the pip_install rule could provide that as an option for the time being attached w/ a big fat disclaimer. ¯_(ツ)_/¯ Unfortunately, then you run the risk of that option becoming permanent which seems to always happen when you put in a "temporary" work-around.

Probably the better thing to do, if there is an interim solution, is to setup an offline mirror when loading the WORKSPACE phase and then defer an offline pip install until the build phase. That way at least there aren't any network requests happening during a build. Anyway, just food for thought. Hopefully y'all intelligent folk can come up with a workable solution. 😄

alexeagle pushed a commit to alexeagle/rules_python that referenced this issue Aug 19, 2020
@aaliddell
Copy link
Contributor

An alternative to building Python manually could be to use indygreg/python-build-standalone. This repo provides completely standalone builds of Python for common platforms and somewhat recently added .gz outputs with the aim of Bazel support and I forgot to post back here when that was merged. Since the binary is prebuilt, we can sidestep the 'no toolchains in repo rules' issue, as we can use the exact same python executable for pip as well as the toolchain.

Would there be interest in adding support to rules_python for it to fetch a prebuilt python from there, much like rules_nodejs pulls in node (IIRC)? I guess this'd need to be opt-in, but would be a huge step up from manually compiling python in a repo rule if you want hermetic behaviour.

There's obviously the issue of trusting a 'random' repo with building a core executable, so some collaboration in understanding the build process would be beneficial. As an aside: his pyoxidizer tool (which is what the standalone python builds are for) is also a great thing to look at for the goal of single file python executables to replace .par, but would perhaps be a rules_pyoxidizer rather than part of core rules_python since it needs a rust toolchain too.

@thundergolfer
Copy link
Collaborator

👋 @aaliddell I think we'll actively pursue this direction, and track discussion in #293

@thundergolfer
Copy link
Collaborator

As #293 has shipped, I'll close this issue 🙂 .

The original comment creating this issue is the following:

The native rules have py_runtime but it doesn't look like rules_python respects it.

and the title is "Don't depend on the system python".

Now, considering the stub script shebang it's still strongly arguable that rules_python still depends on the system Python, but I think the main thrust of this issue – apparent from reading the whole thread – has been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests