Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require goals to explicitly locate/choose an environment, and pin graph calculations to __local__ #17129

Closed
Tracked by #17355
stuhood opened this issue Oct 5, 2022 · 29 comments
Assignees
Milestone

Comments

@stuhood
Copy link
Member

stuhood commented Oct 5, 2022

Currently, determine_bootstrap_environment is called to compute a "default" EnvironmentName before the rule graph is entered, such that all @goal_rules have one in scope automatically.

While this was useful for getting things going and proving out our cross-compilation capabilities (demonstrated in #11148 and #13682), it is error prone. The environment design has been trending toward (eventually) giving all targets an environment= field (even if the vast majority of the time it is only set via __defaults__). And in a world where ~all targets have an environment= field, ever accidentally using the default would definitely represent an issue.

Instead, we'd like to put the onus on each @goal_rule to select the relevant environment (usually via targets). Additionally, to resolve the open question in Appendix A in the design doc, we will pin effectively all calculations in graph.py to the __local__ environment (to eventually potentially be made configurable). This will mean that graph-introspection goals won't need to choose environments, and other @goal_rules won't need to select an environment while computing the graph.

@stuhood stuhood self-assigned this Oct 5, 2022
stuhood added a commit that referenced this issue Oct 11, 2022
As a first step toward #17129, this change "masks" the `EnvironmentName` for relevant APIs in `graph.py` and `specs_rules.py`, and pins them to the `__local__` environment.

The `@rule(.., _mask_types=[..])` argument triggers an assertion if the given type(s) are used in the identity of the `@rule` (i.e.: if the `@rule` "consumes" them from its scope). The impact of masking a type at an API boundary is an assertion that that type is not consumed by that API.

As described in #17129: we're pinning graph computations to the `__local__` environment for now, although we may eventually allow graph computations to execute in a configurable environment.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Oct 11, 2022
As described in #17129: we would like `@goal_rule`s to eventually make their own decisions about which environments to use, generally by consuming targets to do so, but possibly also by pinning themselves to other environments via configuration.

This change builds on #17179, and introduces a migration from `Goal.environment_migrated = {False => True}`. All graph-introspection goals consume only the APIs which are pinned to the local environment by #17179, and so are trivially migrated here. Other goals trigger a deprecation warning like the following:
```
DEPRECATED: Not setting `Goal.environment_migrated=True` for `Goal` `generate-lockfiles` is scheduled to be removed in version 2.17.0.dev0.

See https://www.pantsbuild.org/v2.15/docs/plugin-upgrade-guide
```

Before calling #17129 done, we will migrate all internal goals to `Goal.environment_migrated=True`.

[ci skip-rust]
[ci skip-build-wheels]
@stuhood
Copy link
Member Author

stuhood commented Oct 12, 2022

.@chrisjrn is currently carrying the torch here while I look at #17181.

@chrisjrn
Copy link
Contributor

Current status:

Migrated:

  • package

Ready to migrate:

  • export ?
  • publish
  • repl (runs interactive process, must be local, see test implementation.)
  • run (runs interactive process, must be local, see test implementation.)
  • filedeps (reports on graph calculations; should run locally)
  • peek (reports on graph calculations; should run locally)

Yes/No questions:

  • tailor (This only needs to run locally, right?)
  • update-build-files (This only needs to run locally, right?)
  • count-loc (This only needs to run locally, right?)

Questions:

  • check, fmt, lint: Where do we specify the environment with which we run the tool? Should we be looking at the sources in the targets and use the envionments that each source specifies? What if there are multiple environments specified by a target? (This probably needs a small amount of design work to make coherent)
  • generate-lockfiles: Similar to check/fmt/lint, it's not clear how we should run the resolver if there's ambiguity; it's probably not valid for a resolve to fulfil a set of sources that have disjoint environments.

@stuhood
Copy link
Member Author

stuhood commented Oct 13, 2022

Yes/No questions:

Yea, I think all of those should be pinned to local for now.

check, fmt, lint:

These should be consuming environment= fields off of targets, many of which do not exist yet.

In order to avoid a huge flurry of added fields which we're not certain will be used yet, I think that for now these goals should partition their roots by environment BUT default to something permissive that avoids the need to mark all targets in your repository transitively with environments... if possible. That will need some exploration.

generate-lockfiles

Ditto.

@chrisjrn
Copy link
Contributor

@stuhood These should indeed be consuming environments off targets, there's just an ambiguity that I think we need to grapple with, which I think is worst for generate-lockfiles, because it doesn't make sense to generate the same resolve on multiple environments.

@stuhood
Copy link
Member Author

stuhood commented Oct 13, 2022

which I think is worst for generate-lockfiles, because it doesn't make sense to generate the same resolve on multiple environments.

Yea, true. In that case, you could potentially add an option to the generate-lockfiles goal (or perhaps to the resolves themselves?) which recognizes/explains that issue, and allows you to declare/override which environment to use per resolve? e.g.

./pants generate-lockfiles --resolve-environments={"this_resolve": "that_environment"} ..

Eventually, I do think that there is a really, really neat feature possible behind the generate-lockfiles goal to resolve this. But in the meantime, users have to choose.

@chrisjrn
Copy link
Contributor

I suspect making it so that Pants generates one lockfile per resolve per environment would be extremely cool, in the long term, but yes, we have to make the choice.

Re. fmt, what happens if a target specifies multiple environments?

@stuhood
Copy link
Member Author

stuhood commented Oct 13, 2022

Re. fmt, what happens if a target specifies multiple environments?

A target cannot specify multiple environments except via parametrize... but yea, as it stands, the formatter would run twice for the two targets. If the formatters are deterministic in the different environments, you'd end up with the same output, which would not result in an error (Snapshots with overlapping-but-equal files are fine: only overlapping-but-not-equal will error).

@chrisjrn
Copy link
Contributor

This is a slight problem if e.g. different versions of Black are in separate environment, but for the moment, it sounds like it's not too concerning.

@Eric-Arellano
Copy link
Contributor

@chrisjrn thanks for working on this! This is tough to reason about, and I bias towards doing the "safe" thing of pinning most goals to __local__ until we are confident otherwise. For example, fmt, lint, and now fix should probably be __local__ to keep it simple.

Also, it'd likely be good to split this up into many different PRs so that we can properly reason about each goal correctly.

@chrisjrn
Copy link
Contributor

I agree on splitting up into many PRs. I suspect I'm going to do the obvious/ready goals first, and then follow up with the remaining categories.

I think fmt/lint/fix are a bit more nuanced than just running them __local__ (does it make sense to run everything locally if no source target specifies a local environment? Unclear.)

@Eric-Arellano
Copy link
Contributor

I think fmt/lint/fix are a bit more nuanced than just running them local (does it make sense to run everything locally if no source target specifies a local environment? Unclear.)

The main risk @stuhood and I identified with using __local__ when you said not to is if third-party dependencies won't be installed correctly, e.g you are doing data science and gave up on getting macOS to work for your resolve. Most of fmt, lint, and fix don't consider third-party deps, but Pylint does. So we eventually do need to properly handle environments, but possibly could punt on it till we have users asking for it. The hard part with those is we need to batch, unlike test and package

@stuhood
Copy link
Member Author

stuhood commented Oct 14, 2022

One other consideration is that if enough goals are going to stay pinned to __local__, maybe we shouldn't deprecate the environment_migrated flag in this release... or even stabilize it in some way.

@chrisjrn
Copy link
Contributor

@stuhood I am in favour of migrating it to an enum for greppability: UNMIGRATED, LOCAL_ONLY, and USES_ENVIRONMENTS. UNMIGRATED would trigger the deprecation warning, and LOCAL_ONLY and USES_ENVIRONMENTS would (for the moment) just be values that we can use for future code audits. Thoughts?

@stuhood
Copy link
Member Author

stuhood commented Oct 14, 2022

Yea, reasonable.

@chrisjrn
Copy link
Contributor

Great, I'll add that in as part of the first PR.

@chrisjrn chrisjrn reopened this Oct 14, 2022
chrisjrn pushed a commit that referenced this issue Oct 14, 2022
Adds support for extracting `environment` names from `package` goal targets.

#17129.
@chrisjrn
Copy link
Contributor

With #17247, we'll have publish, fmt, lint, check, (fix?), and generate-lockfiles remaining.

I believe it should be possible to get everything except for generate-lockfiles to run in non-local environments. For generate-lockfiles, I propose the following behaviour for 2.15:

  • Check to see what environments are inherent in the request
  • If multiple environments are inherent, and the current local environment is amongst them, use the current local environment unless specified in a command line option.
  • Else, if the current local environment is not amongst the environments, use the environment specified in a command line option, otherwise fail
  • If an environment is specified in a command line option and that environment is not inherent in the request, fail
  • If multiple environments are inherent in the request, raise a warning informing the user that the lockfile may not work in environments other than the selected environment.

For 2.16, we should add a new lockfile schema that creates a separate lockfile per resolve per environment.

chrisjrn pushed a commit that referenced this issue Oct 18, 2022
For `count_loc`, `filedeps`, `peek`, `tailor`, `update-build-files`, the migration is trivial: no environment-specific processes need to be run.

For `repl`, `run`, and `export`, a `rule_helper` is added that raises a warning if the user is asking to run against a target that specifies a non-local environment. This alerts the user to possibly unpredictable behavior, but for now, does _not_ inherently cause an explicit error. If we get to the point where we can run an `InteractiveProcess` in a foreign environment, we may wish to update this behaviour.

Finally, this resolves a regression from a copy-pasta introduced in a previous PR. Oops.

See #17129.
@stuhood
Copy link
Member Author

stuhood commented Oct 18, 2022

I believe it should be possible to get everything except for generate-lockfiles to run in non-local environments. For generate-lockfiles, I propose the following behaviour for 2.15:

I think that that makes sense, yea? The proof will be in the pudding, but once we have "something" in place we can stabilize it after cutting the 2.15.x branch if need be.

Eric-Arellano pushed a commit that referenced this issue Oct 19, 2022
…ues working on actually addressing them (#17258)

As requested by @Eric-Arellano.

See #17129 for remaining migrations.
@chrisjrn
Copy link
Contributor

chrisjrn commented Oct 19, 2022

OK, now that I'm wading into how lockfiles are actually implemented -- effectively there's a "planning" phase where GenerateLockfile requests are created, and then subsequently those GenerateLockfile requests are executed.

Currently my plan is make those planning rules execute in the ChosenLocalEnvironmentName context, and making the GenerateLockfile actions execute in a specified environment, with something resembling the warning described above.

It looks like there'll probably need to be a redesign of how lockfiles work in light of multiple environments, as the overall act of writing a lockfile depends on the underlying implementation (honestly, probably directories of files per resolve).

@Eric-Arellano
Copy link
Contributor

I'm skeptical on the generate-lockfiles plan because lockfiles are generated to work with both Linux and macOS, and you can generate from either OS. This is thanks to Pex's universal mode.

Related is #17276, which would allow you to give platform-awareness to resolves.

@chrisjrn
Copy link
Contributor

@Eric-Arellano I think we're going to need to talk about this one in more detail then.

@stuhood
Copy link
Member Author

stuhood commented Oct 19, 2022

I'm skeptical on the generate-lockfiles plan because lockfiles are generated to work with both Linux and macOS, and you can generate from either OS. This is thanks to Pex's universal mode.

That part is further down the road I think, and probably best discussed on a new issue. I think that @Eric-Arellano is right that --style=universal (which we use today) will include sdists in the lockfile rather than wheels, which means it will successfully generate a cross-platform lockfile using the sdists, and then only actually build wheels when it is being consumed (potentially in a cross-platform environment).

So... that's cool. I think that that means that any sort of lockfile schema changes, or forking of building in multiple environments would only be necessary if we wanted to switch to --strict by default.

@chrisjrn
Copy link
Contributor

What about JVM lockfiles? I'm not aware of how maven allows for JNI in packages, but that's likely to be relevant here.

@stuhood
Copy link
Member Author

stuhood commented Oct 19, 2022

Depends on whether the resolver supports it by default. But because we have our own lockfile format for coursier... we could do something similar to what PEX does, I think? And/or make the whole lockfile contain one copy of the lock per platform, generated by separate per-platform runs of coursier.

@chrisjrn
Copy link
Contributor

Re Python: The way I'm writing the environments support is to leave the onus of selecting the environments that belong to a GenerateLockfiles object to the backend implementation, so for Python, we can always return ChosenLocalEnvironmentName.val, and be done with it. If we're convinced that Python is going to work file with just local execution for now, that's definitely an easy change to add.

Note that if, e.g. we allow interpreter constraints to be specified as part of an environment spec (possibly a useful addition at some point), then we can't guarantee that the resolves will be compatible across multiple environments. This may be a thing we need to factor into our thinking.

@chrisjrn
Copy link
Contributor

Following #17300, #17313, and #17315, the only remaining goals are the fmt/lint/fix goals. These may be pretty complicated, as the onus is now on the back-end implementation to choose the environment. I'll take a look at this with fresh eyes on Monday, it may be easier than anticipated (who knows 🤷 )

chrisjrn pushed a commit that referenced this issue Oct 24, 2022
This migrates `publish` to create the `BuiltPackage` asset in a correct environment, and run the `publish` executable in the local environment.

Also adds a helper rule, `EnvironmentAwarePackageRequest`, which consolidates the `EnvironmentName` request and `BuiltPackage` request, since `BuiltPackage` is requested all over the place.

See #17129.
chrisjrn pushed a commit that referenced this issue Oct 25, 2022
`check` will now run each `CheckRequest` in each specified environment per the `field_sets` in the `CheckRequest`. It looks like the remainder of the `check` infrastructure already successfully handles dealing with multiple `CheckRequest`s.

See #17129
@stuhood stuhood added this to the 2.15.x milestone Oct 25, 2022
@stuhood
Copy link
Member Author

stuhood commented Oct 26, 2022

@chrisjrn : In the PR that closes this ticket (sounds like maybe the next one?) could you please add TODOs, and perhaps some tags in help strings referencing #17355 ?

@chrisjrn
Copy link
Contributor

Sure.

chrisjrn pushed a commit that referenced this issue Oct 28, 2022
…onment. (#17359)

This adds a warning if a target-based `fmt`/`lint`/`fix` operation runs against a target that specifies a non-local environment. 

To property use environments, we'll need a way of making the partitioning rules a bit more pluggable, or decide that individual back-end implementations will be responsible for deciding where they run (which is probably also not great).

Addresses #17129.
chrisjrn pushed a commit that referenced this issue Oct 28, 2022
This is a least-effort attempt at getting `generate-lockfiles` into a position where individual backend implementations can select an environment in which to generate the lockfiles. 

The request generation phase runs in the default local environment, and is responsible for adding an `environments` field to the `GenerateLockfile` request object. 

If a request does not specify environments, the `GenerateLockfile` request is run in the default local environment. If a single environment is specified, that environment is to run the request. If more than one environment is specified, a warning is displayed, and the request runs in either the default local environment (if present in the request), or an arbitrary environment. 

This approach will give us an avenue in the future to run the request in _all_ the specified environments, and add behaviour to somehow merge the resulting lockfiles if different, since the eventual act of writing the lockfiles to disk happens in the rule goal code.

Addresses #17129
@stuhood
Copy link
Member Author

stuhood commented Oct 28, 2022

Thanks @chrisjrn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants