Require goals to explicitly locate/choose an environment, and pin `graph` calculations to `local` #17129

stuhood · 2022-10-05T21:57:15Z

Currently, determine_bootstrap_environment is called to compute a "default" EnvironmentName before the rule graph is entered, such that all @goal_rules have one in scope automatically.

While this was useful for getting things going and proving out our cross-compilation capabilities (demonstrated in #11148 and #13682), it is error prone. The environment design has been trending toward (eventually) giving all targets an environment= field (even if the vast majority of the time it is only set via __defaults__). And in a world where ~all targets have an environment= field, ever accidentally using the default would definitely represent an issue.

Instead, we'd like to put the onus on each @goal_rule to select the relevant environment (usually via targets). Additionally, to resolve the open question in Appendix A in the design doc, we will pin effectively all calculations in graph.py to the __local__ environment (to eventually potentially be made configurable). This will mean that graph-introspection goals won't need to choose environments, and other @goal_rules won't need to select an environment while computing the graph.

The text was updated successfully, but these errors were encountered:

As a first step toward #17129, this change "masks" the `EnvironmentName` for relevant APIs in `graph.py` and `specs_rules.py`, and pins them to the `__local__` environment. The `@rule(.., _mask_types=[..])` argument triggers an assertion if the given type(s) are used in the identity of the `@rule` (i.e.: if the `@rule` "consumes" them from its scope). The impact of masking a type at an API boundary is an assertion that that type is not consumed by that API. As described in #17129: we're pinning graph computations to the `__local__` environment for now, although we may eventually allow graph computations to execute in a configurable environment. [ci skip-build-wheels]

As described in #17129: we would like `@goal_rule`s to eventually make their own decisions about which environments to use, generally by consuming targets to do so, but possibly also by pinning themselves to other environments via configuration. This change builds on #17179, and introduces a migration from `Goal.environment_migrated = {False => True}`. All graph-introspection goals consume only the APIs which are pinned to the local environment by #17179, and so are trivially migrated here. Other goals trigger a deprecation warning like the following: ``` DEPRECATED: Not setting `Goal.environment_migrated=True` for `Goal` `generate-lockfiles` is scheduled to be removed in version 2.17.0.dev0. See https://www.pantsbuild.org/v2.15/docs/plugin-upgrade-guide ``` Before calling #17129 done, we will migrate all internal goals to `Goal.environment_migrated=True`. [ci skip-rust] [ci skip-build-wheels]

stuhood · 2022-10-12T16:48:50Z

.@chrisjrn is currently carrying the torch here while I look at #17181.

chrisjrn · 2022-10-13T18:12:53Z

Current status:

Migrated:

package

Ready to migrate:

export ?
publish
repl (runs interactive process, must be local, see test implementation.)
run (runs interactive process, must be local, see test implementation.)
filedeps (reports on graph calculations; should run locally)
peek (reports on graph calculations; should run locally)

Yes/No questions:

tailor (This only needs to run locally, right?)
update-build-files (This only needs to run locally, right?)
count-loc (This only needs to run locally, right?)

Questions:

check, fmt, lint: Where do we specify the environment with which we run the tool? Should we be looking at the sources in the targets and use the envionments that each source specifies? What if there are multiple environments specified by a target? (This probably needs a small amount of design work to make coherent)
generate-lockfiles: Similar to check/fmt/lint, it's not clear how we should run the resolver if there's ambiguity; it's probably not valid for a resolve to fulfil a set of sources that have disjoint environments.

stuhood · 2022-10-13T18:22:29Z

Yes/No questions:

Yea, I think all of those should be pinned to local for now.

check, fmt, lint:

These should be consuming environment= fields off of targets, many of which do not exist yet.

In order to avoid a huge flurry of added fields which we're not certain will be used yet, I think that for now these goals should partition their roots by environment BUT default to something permissive that avoids the need to mark all targets in your repository transitively with environments... if possible. That will need some exploration.

generate-lockfiles

Ditto.

chrisjrn · 2022-10-13T18:29:17Z

@stuhood These should indeed be consuming environments off targets, there's just an ambiguity that I think we need to grapple with, which I think is worst for generate-lockfiles, because it doesn't make sense to generate the same resolve on multiple environments.

stuhood · 2022-10-13T19:41:01Z

which I think is worst for generate-lockfiles, because it doesn't make sense to generate the same resolve on multiple environments.

Yea, true. In that case, you could potentially add an option to the generate-lockfiles goal (or perhaps to the resolves themselves?) which recognizes/explains that issue, and allows you to declare/override which environment to use per resolve? e.g.

./pants generate-lockfiles --resolve-environments={"this_resolve": "that_environment"} ..

Eventually, I do think that there is a really, really neat feature possible behind the generate-lockfiles goal to resolve this. But in the meantime, users have to choose.

chrisjrn · 2022-10-13T19:57:54Z

I suspect making it so that Pants generates one lockfile per resolve per environment would be extremely cool, in the long term, but yes, we have to make the choice.

Re. fmt, what happens if a target specifies multiple environments?

stuhood · 2022-10-13T20:17:33Z

Re. fmt, what happens if a target specifies multiple environments?

A target cannot specify multiple environments except via parametrize... but yea, as it stands, the formatter would run twice for the two targets. If the formatters are deterministic in the different environments, you'd end up with the same output, which would not result in an error (Snapshots with overlapping-but-equal files are fine: only overlapping-but-not-equal will error).

chrisjrn · 2022-10-13T20:30:28Z

This is a slight problem if e.g. different versions of Black are in separate environment, but for the moment, it sounds like it's not too concerning.

Eric-Arellano · 2022-10-14T15:36:28Z

@chrisjrn thanks for working on this! This is tough to reason about, and I bias towards doing the "safe" thing of pinning most goals to __local__ until we are confident otherwise. For example, fmt, lint, and now fix should probably be __local__ to keep it simple.

Also, it'd likely be good to split this up into many different PRs so that we can properly reason about each goal correctly.

chrisjrn · 2022-10-14T15:44:08Z

I agree on splitting up into many PRs. I suspect I'm going to do the obvious/ready goals first, and then follow up with the remaining categories.

I think fmt/lint/fix are a bit more nuanced than just running them __local__ (does it make sense to run everything locally if no source target specifies a local environment? Unclear.)

Eric-Arellano · 2022-10-14T15:48:56Z

I think fmt/lint/fix are a bit more nuanced than just running them local (does it make sense to run everything locally if no source target specifies a local environment? Unclear.)

The main risk @stuhood and I identified with using __local__ when you said not to is if third-party dependencies won't be installed correctly, e.g you are doing data science and gave up on getting macOS to work for your resolve. Most of fmt, lint, and fix don't consider third-party deps, but Pylint does. So we eventually do need to properly handle environments, but possibly could punt on it till we have users asking for it. The hard part with those is we need to batch, unlike test and package

stuhood · 2022-10-14T16:18:34Z

One other consideration is that if enough goals are going to stay pinned to __local__, maybe we shouldn't deprecate the environment_migrated flag in this release... or even stabilize it in some way.

chrisjrn · 2022-10-14T16:21:40Z

@stuhood I am in favour of migrating it to an enum for greppability: UNMIGRATED, LOCAL_ONLY, and USES_ENVIRONMENTS. UNMIGRATED would trigger the deprecation warning, and LOCAL_ONLY and USES_ENVIRONMENTS would (for the moment) just be values that we can use for future code audits. Thoughts?

stuhood · 2022-10-14T16:32:47Z

Yea, reasonable.

chrisjrn · 2022-10-14T16:33:22Z

Great, I'll add that in as part of the first PR.

Adds support for extracting `environment` names from `package` goal targets. #17129.

chrisjrn · 2022-10-17T21:02:44Z

With #17247, we'll have publish, fmt, lint, check, (fix?), and generate-lockfiles remaining.

I believe it should be possible to get everything except for generate-lockfiles to run in non-local environments. For generate-lockfiles, I propose the following behaviour for 2.15:

Check to see what environments are inherent in the request
If multiple environments are inherent, and the current local environment is amongst them, use the current local environment unless specified in a command line option.
Else, if the current local environment is not amongst the environments, use the environment specified in a command line option, otherwise fail
If an environment is specified in a command line option and that environment is not inherent in the request, fail
If multiple environments are inherent in the request, raise a warning informing the user that the lockfile may not work in environments other than the selected environment.

For 2.16, we should add a new lockfile schema that creates a separate lockfile per resolve per environment.

For `count_loc`, `filedeps`, `peek`, `tailor`, `update-build-files`, the migration is trivial: no environment-specific processes need to be run. For `repl`, `run`, and `export`, a `rule_helper` is added that raises a warning if the user is asking to run against a target that specifies a non-local environment. This alerts the user to possibly unpredictable behavior, but for now, does _not_ inherently cause an explicit error. If we get to the point where we can run an `InteractiveProcess` in a foreign environment, we may wish to update this behaviour. Finally, this resolves a regression from a copy-pasta introduced in a previous PR. Oops. See #17129.

stuhood · 2022-10-18T22:52:06Z

I believe it should be possible to get everything except for generate-lockfiles to run in non-local environments. For generate-lockfiles, I propose the following behaviour for 2.15:

I think that that makes sense, yea? The proof will be in the pudding, but once we have "something" in place we can stabilize it after cutting the 2.15.x branch if need be.

@Eric-Arellano

…ues working on actually addressing them (#17258) As requested by @Eric-Arellano. See #17129 for remaining migrations.

chrisjrn · 2022-10-19T17:50:03Z

OK, now that I'm wading into how lockfiles are actually implemented -- effectively there's a "planning" phase where GenerateLockfile requests are created, and then subsequently those GenerateLockfile requests are executed.

Currently my plan is make those planning rules execute in the ChosenLocalEnvironmentName context, and making the GenerateLockfile actions execute in a specified environment, with something resembling the warning described above.

It looks like there'll probably need to be a redesign of how lockfiles work in light of multiple environments, as the overall act of writing a lockfile depends on the underlying implementation (honestly, probably directories of files per resolve).

Eric-Arellano · 2022-10-19T18:41:36Z

I'm skeptical on the generate-lockfiles plan because lockfiles are generated to work with both Linux and macOS, and you can generate from either OS. This is thanks to Pex's universal mode.

Related is #17276, which would allow you to give platform-awareness to resolves.

chrisjrn · 2022-10-19T18:42:49Z

@Eric-Arellano I think we're going to need to talk about this one in more detail then.

stuhood · 2022-10-19T18:54:44Z

I'm skeptical on the generate-lockfiles plan because lockfiles are generated to work with both Linux and macOS, and you can generate from either OS. This is thanks to Pex's universal mode.

That part is further down the road I think, and probably best discussed on a new issue. I think that @Eric-Arellano is right that --style=universal (which we use today) will include sdists in the lockfile rather than wheels, which means it will successfully generate a cross-platform lockfile using the sdists, and then only actually build wheels when it is being consumed (potentially in a cross-platform environment).

So... that's cool. I think that that means that any sort of lockfile schema changes, or forking of building in multiple environments would only be necessary if we wanted to switch to --strict by default.

chrisjrn · 2022-10-19T18:56:38Z

What about JVM lockfiles? I'm not aware of how maven allows for JNI in packages, but that's likely to be relevant here.

stuhood · 2022-10-19T19:00:00Z

Depends on whether the resolver supports it by default. But because we have our own lockfile format for coursier... we could do something similar to what PEX does, I think? And/or make the whole lockfile contain one copy of the lock per platform, generated by separate per-platform runs of coursier.

chrisjrn · 2022-10-19T19:08:55Z

Re Python: The way I'm writing the environments support is to leave the onus of selecting the environments that belong to a GenerateLockfiles object to the backend implementation, so for Python, we can always return ChosenLocalEnvironmentName.val, and be done with it. If we're convinced that Python is going to work file with just local execution for now, that's definitely an easy change to add.

Note that if, e.g. we allow interpreter constraints to be specified as part of an environment spec (possibly a useful addition at some point), then we can't guarantee that the resolves will be compatible across multiple environments. This may be a thing we need to factor into our thinking.

chrisjrn · 2022-10-21T20:11:20Z

Following #17300, #17313, and #17315, the only remaining goals are the fmt/lint/fix goals. These may be pretty complicated, as the onus is now on the back-end implementation to choose the environment. I'll take a look at this with fresh eyes on Monday, it may be easier than anticipated (who knows 🤷 )

This migrates `publish` to create the `BuiltPackage` asset in a correct environment, and run the `publish` executable in the local environment. Also adds a helper rule, `EnvironmentAwarePackageRequest`, which consolidates the `EnvironmentName` request and `BuiltPackage` request, since `BuiltPackage` is requested all over the place. See #17129.

`check` will now run each `CheckRequest` in each specified environment per the `field_sets` in the `CheckRequest`. It looks like the remainder of the `check` infrastructure already successfully handles dealing with multiple `CheckRequest`s. See #17129

stuhood · 2022-10-26T17:59:17Z

@chrisjrn : In the PR that closes this ticket (sounds like maybe the next one?) could you please add TODOs, and perhaps some tags in help strings referencing #17355 ?

chrisjrn · 2022-10-26T18:46:06Z

Sure.

…onment. (#17359) This adds a warning if a target-based `fmt`/`lint`/`fix` operation runs against a target that specifies a non-local environment. To property use environments, we'll need a way of making the partitioning rules a bit more pluggable, or decide that individual back-end implementations will be responsible for deciding where they run (which is probably also not great). Addresses #17129.

This is a least-effort attempt at getting `generate-lockfiles` into a position where individual backend implementations can select an environment in which to generate the lockfiles. The request generation phase runs in the default local environment, and is responsible for adding an `environments` field to the `GenerateLockfile` request object. If a request does not specify environments, the `GenerateLockfile` request is run in the default local environment. If a single environment is specified, that environment is to run the request. If more than one environment is specified, a warning is displayed, and the request runs in either the default local environment (if present in the request), or an arbitrary environment. This approach will give us an avenue in the future to run the request in _all_ the specified environments, and add behaviour to somehow merge the resulting lockfiles if different, since the eventual act of writing the lockfiles to disk happens in the rule goal code. Addresses #17129

stuhood · 2022-10-28T19:24:22Z

Thanks @chrisjrn!

stuhood self-assigned this Oct 5, 2022

stuhood assigned chrisjrn Oct 12, 2022

stuhood added the estimate: ~2W label Oct 13, 2022

chrisjrn closed this as completed Oct 14, 2022

chrisjrn reopened this Oct 14, 2022

chrisjrn mentioned this issue Oct 14, 2022

Select environment in package goal #17220

Merged

chrisjrn pushed a commit that referenced this issue Oct 14, 2022

Select environment in package goal (#17220)

94b87c3

Adds support for extracting `environment` names from `package` goal targets. #17129.

chrisjrn mentioned this issue Oct 14, 2022

Replaces environment_migrated bool with environment_behavior enum #17228

Merged

chrisjrn mentioned this issue Oct 18, 2022

Silences the environments-related deprecations while @chrisjrn continues working on actually addressing them #17258

Merged

Eric-Arellano pushed a commit that referenced this issue Oct 19, 2022

Silences the environments-related deprecations while @chrisjrn contin…

e97cd16

…ues working on actually addressing them (#17258) As requested by @Eric-Arellano. See #17129 for remaining migrations.

This was referenced Oct 20, 2022

Migrates generate-lockfiles to USES_ENVIRONMENTS. #17300

Merged

Updates publish to USES_ENVIRONMENTS #17313

Merged

chrisjrn mentioned this issue Oct 21, 2022

Makes check a USES_ENVIRONMENTS goal #17315

Merged

stuhood added this to the 2.15.x milestone Oct 25, 2022

This was referenced Oct 26, 2022

Improve safety of using environments during BUILD evaluation time #17354

Open

Stabilization Tracking: Environments #17355

Open

chrisjrn mentioned this issue Oct 26, 2022

Adds warning if lint/fmt/fix runs on a target using a non-local environment. #17359

Merged

stuhood mentioned this issue Oct 27, 2022

Add support for cross-building Python applications to test, package inside-of/for a Docker image #13682

Closed

stuhood closed this as completed Oct 28, 2022

lilatomic mentioned this issue Nov 20, 2022

deprecation warning for python-dump-source-analysis goal #17597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require goals to explicitly locate/choose an environment, and pin `graph` calculations to `local` #17129

Require goals to explicitly locate/choose an environment, and pin `graph` calculations to `local` #17129

stuhood commented Oct 5, 2022

stuhood commented Oct 12, 2022

chrisjrn commented Oct 13, 2022

stuhood commented Oct 13, 2022

chrisjrn commented Oct 13, 2022

stuhood commented Oct 13, 2022

chrisjrn commented Oct 13, 2022

stuhood commented Oct 13, 2022

chrisjrn commented Oct 13, 2022

Eric-Arellano commented Oct 14, 2022

chrisjrn commented Oct 14, 2022

Eric-Arellano commented Oct 14, 2022

stuhood commented Oct 14, 2022

chrisjrn commented Oct 14, 2022

stuhood commented Oct 14, 2022

chrisjrn commented Oct 14, 2022

chrisjrn commented Oct 17, 2022

stuhood commented Oct 18, 2022

chrisjrn commented Oct 19, 2022 •

edited

Loading

Eric-Arellano commented Oct 19, 2022

chrisjrn commented Oct 19, 2022

stuhood commented Oct 19, 2022

chrisjrn commented Oct 19, 2022

stuhood commented Oct 19, 2022

chrisjrn commented Oct 19, 2022

chrisjrn commented Oct 21, 2022

stuhood commented Oct 26, 2022

chrisjrn commented Oct 26, 2022

stuhood commented Oct 28, 2022

Require goals to explicitly locate/choose an environment, and pin graph calculations to __local__ #17129

Require goals to explicitly locate/choose an environment, and pin graph calculations to __local__ #17129

Comments

stuhood commented Oct 5, 2022

stuhood commented Oct 12, 2022

chrisjrn commented Oct 13, 2022

stuhood commented Oct 13, 2022

chrisjrn commented Oct 13, 2022

stuhood commented Oct 13, 2022

chrisjrn commented Oct 13, 2022

stuhood commented Oct 13, 2022

chrisjrn commented Oct 13, 2022

Eric-Arellano commented Oct 14, 2022

chrisjrn commented Oct 14, 2022

Eric-Arellano commented Oct 14, 2022

stuhood commented Oct 14, 2022

chrisjrn commented Oct 14, 2022

stuhood commented Oct 14, 2022

chrisjrn commented Oct 14, 2022

chrisjrn commented Oct 17, 2022

stuhood commented Oct 18, 2022

chrisjrn commented Oct 19, 2022 • edited Loading

Eric-Arellano commented Oct 19, 2022

chrisjrn commented Oct 19, 2022

stuhood commented Oct 19, 2022

chrisjrn commented Oct 19, 2022

stuhood commented Oct 19, 2022

chrisjrn commented Oct 19, 2022

chrisjrn commented Oct 21, 2022

stuhood commented Oct 26, 2022

chrisjrn commented Oct 26, 2022

stuhood commented Oct 28, 2022

Require goals to explicitly locate/choose an environment, and pin `graph` calculations to `local` #17129

Require goals to explicitly locate/choose an environment, and pin `graph` calculations to `local` #17129

chrisjrn commented Oct 19, 2022 •

edited

Loading