Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New mechanism for marking process outputs as platform agnostic #16873

Open
Tracked by #17355
Eric-Arellano opened this issue Sep 14, 2022 · 2 comments
Open
Tracked by #17355

New mechanism for marking process outputs as platform agnostic #16873

Eric-Arellano opened this issue Sep 14, 2022 · 2 comments

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Sep 14, 2022

Platform-aware vs. agnostic processes

Some processes care about which platform they are run on, such as building a PEX (when --platform is not specified). Whereas others do not matter, like running count-loc: we expect the result to be identical regardless of the platform.

For platform-aware processes, it's crucial we include the Platform in the cache key.

Status quo

Currently, processes default to being platform-agnostic. A plugin author opts in to platform awareness by setting platform on the Process.

Issues

#13682 messes things up: When running processes with Docker, it is important we always use the same --platform every time. Otherwise, Docker gives this warning

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Generally, I also believe it is unsafe to default to the possibly dangerous thing, marking something as platform-agnostic. Instead, we should default to the safe semantics, and let others opt-in to something being agnostic.

Proposal

  1. Get rid of platform on Process, and instead for now always set it for every Process.
  2. Add a new field on Process like platform_agnostic: bool, which will cause platform to not factor into the cache key.
@stuhood
Copy link
Member

stuhood commented Sep 14, 2022

Generally, I also believe it is unsafe to default to the possibly dangerous thing, marking something as platform-agnostic. Instead, we should default to the safe semantics, and let others opt-in to something being agnostic.

Yea, this should probably be inverted by having the default platform value come from ProcessConfigFromEnvironment.

Marking things platform agnostic would be good, but is probably not critical until after #12203 is implemented, since cross-platform hits are very unlikely right now for other reasons.

Eric-Arellano added a commit that referenced this issue Sep 15, 2022
…6874)

First part of #16873, and necessary for #16852

A concrete impact of this for users is that remote cache users can now never share results between different platforms. This was already very likely though due to #12203.
Eric-Arellano added a commit that referenced this issue Sep 16, 2022
For Remote Execution, we currently hardcode the platform for the runner:

https://github.com/pantsbuild/pants/blob/ba7566515016ab91668c3610666594844dd5e7db/src/rust/engine/src/context.rs#L204-L206

This won't work as we move to remote execution being a per-environment/process mechanism, such as via #16890. We will add remote execution environment targets, which will allow users to set different platform properties per environment, such as using different Docker images. Thus, each environment target will have a `platform` field because it may be different among them. So, the platform must come from the environment target, not a global hardcoded value.

Our remote command runner doesn't like the platform being optional. In practice, it no longer was optional thanks to #16874, but our types did not reflect that.

When we add back platform-agnostic processes, we will use a new field to indicate this, while still setting the platform. See #16873.

[ci skip-build-wheels]
@stuhood
Copy link
Member

stuhood commented Oct 26, 2022

In the context of #17354, this becomes very relevant again.

The key takeaway from that issue is that processes themselves can't really be truly platform independent (they need to run "somewhere"): but their outputs can be. Which is really not that surprising: it's a "host" vs "target" platform distinction.

@stuhood stuhood changed the title New mechanism for marking a process as platform agnostic New mechanism for marking process outputs as platform agnostic Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants