Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print environment variables accessed by rustc as special comments into depinfo files #71858

Merged
merged 1 commit into from
Jun 26, 2020

Conversation

petrochenkov
Copy link
Contributor

@petrochenkov petrochenkov commented May 3, 2020

So cargo (and perhaps others tools) can use them for linting (at least) or for actually rebuilding crates on env var changes.


I've recently observed one more forgotten environment variable in a build script 8a77d1c and thought it would be nice to provide the list of accessed variables to cargo automatically as a part of depinfo.

Unsurprisingly, I wasn't the first who had this idea - cc #70517 #40364 #44074.

Also, there are dozens of uses of (option_)env! in rustc repo and, like, half of them are not registered in build scripts.


Description:

  • depinfo files are extended with special comments containing info about environment variables accessed during compilation.
  • Comment format for environment variables with successfully retrieved value: # env-dep:KEY=VALUE.
  • Comment format for environment variables without successfully retrieved value: # env-dep:KEY (can happen with option_env!).
  • KEY and VALUE are minimally escaped (\n, \r, \\) so they don't break makefile comments and can be unescaped by anything that can unescape standard escape_default and friends.

FCP report: #71858 (comment)

Closes #70517
Closes #40364
Closes #44074
A new issue in the cargo repo will be needed to track the cargo side of this feature.

r? @ehuss

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 3, 2020
@petrochenkov petrochenkov added T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 3, 2020
@petrochenkov
Copy link
Contributor Author

petrochenkov commented May 3, 2020

Note that this method can be somewhat more precise than tracking of environment variables in cargo.

With this approach we can also track values of the variables actually used by rustc instead of ones cached by cargo (there's some window for them to change).
When cargo decides to rerun or not rerun the build it still may use slightly outdated values though.
But that's also true for file dependencies.
And all this probably doesn't matter in practice, env vars don't change this often.

@Mark-Simulacrum
Copy link
Member

I would generally speaking love to have this, syncing with build scripts is a pain. I personally would be on board with landing this PR (I didn't review in detail though) without a -Z flag or so as the behavior here seems fairly well understood.

One thing I'm not sure about is whether we should include other environment variables rustc reads, like RUSTC_BOOTSTRAP.

@ehuss
Copy link
Contributor

ehuss commented May 3, 2020

This looks great! And a remarkably simple change.

I would maybe suggest using more generic text. Instead of cargo:rerun-if-env-changed maybe use something like env-dep as suggested in #40364 (comment)? I think this would be useful for other build tools, it is not necessarily Cargo-specific and wouldn't want to give a misleading impression.

I'm not sure I follow the comment about tracking the value. The environment should be static from the time Cargo starts, right? (Unless it is modified by an evil proc-macro.) Regardless, I don't think that's something to worry about too much. Figuring out how to safely encode the value in a Makefile comment sounds tricky (handling newlines, etc.).

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 4, 2020
@petrochenkov petrochenkov changed the title [experiment] Print environment variables accessed by rustc as special comments into depinfo files Print environment variables accessed by rustc as special comments into depinfo files May 6, 2020
@petrochenkov
Copy link
Contributor Author

Figuring out how to safely encode the value in a Makefile comment sounds tricky (handling newlines, etc.).

FWIW, environment variable names can contain newlines as well.
So I just use some minimal escaping before printing them now.
Makefile comments only need to escape newlines and \ according to the doc.

(Cargo is free to ignore the =VALUE part, but it still may be useful so it is printed.)

I couldn't test this today due to #71936.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 6, 2020
@rust-highfive

This comment has been minimized.

@alexcrichton
Copy link
Member

Nice! One option to handle escapes here might be to do as @ehuss mentioned and only print the key, not the value. Then keys with weird characters (e.g. newlines) could simply be omitted and not tracked for now. Practically I don't think this will ever matter either way. Personally though I tend to be overly averse to using escape patterns

@RalfJung
Copy link
Member

RalfJung commented May 6, 2020

This is really neat. :D Miri also uses (option_)env! and I think we did indeed forget to record this in build.rs...

Does this mean we can revert 8a77d1c again as it becomes redundant?

@petrochenkov
Copy link
Contributor Author

@alexcrichton
Cargo or other tools can filter out (or plain ignore) variables with weird characters (\) equally well, if they want.
Rustc's responsibility here is correctness, IMO - no keys are omitted and values are provided as actually used by the compiler.

@RalfJung

Does this mean we can revert 8a77d1c again as it becomes redundant?

Yes, all uses of rerun-if-env-changed in rustc that I've seen should become redundant once cargo supports these annotations.

@ehuss
Copy link
Contributor

ehuss commented May 6, 2020

Yes, all uses of rerun-if-env-changed in rustc that I've seen should become redundant once cargo supports these annotations.

I don't think this is universally true. Some scripts use env::var which won't be detected like this.

@petrochenkov
Copy link
Contributor Author

I haven't seen this one :)

That's certainly not universally true.
Proc macros can also use env::var that won't be caught by this mechanism.
Proc macros need APIs for both loading files and retrieving env vars that would put them into dependencies.

@ehuss
Copy link
Contributor

ehuss commented May 13, 2020

This looks good to me code-wise. I probably wouldn't include the the value (Cargo probably won't use it), and there's maybe a remote possibility it may have an extremely large value, or expose a secret. I don't care much either way, though.

I don't feel comfortable to r+ since I am not on the compiler team (and this would be insta-stable). Maybe @Mark-Simulacrum or someone on the team can approve.

@Mark-Simulacrum
Copy link
Member

I think the secrets argument in particular is interesting to me, but I would not personally consider it too notable -- you probably just have to trust that anyone that can read files in your target directory is trustworthy enough to be able to read your environment variables too.

I think largely speaking I agree that including them "makes sense" and ideally Cargo would use them (I get why it doesn't today, and agree it likely won't cause trouble).

I believe it makes sense for this to go through compiler team FCP or at least something similar to the MCP process as a stabilization (especially an instantaneous one). I'm not sure exactly what the right step is though -- cc @nikomatsakis @pnkfelix? With respect to the code, I think it looks good enough that I'd be happy to r+ modulo the instant stabilization.

@RalfJung
Copy link
Member

The values embedded here are just the ones used in env!/option_env!, right? So if they contain secrets, that secret already is in the binary.

I suppose there is the possibility that there is const code which does something irreversible to the secret and then only the result of that computation but not its inputs might end up in the binary... not sure if that is a case to be worried about.

@Mark-Simulacrum
Copy link
Member

Essentially, yeah -- one could imagine though that for e.g. proc macros (once we support an API for them to expose "I used this environment variable") it could be common for you to set e.g. a database URL including a password such that the proc macro can connect to the database and collect schema data, while the URL isn't actually intended to be stored in the binary at all.

But I agree that seems mostly like a niche case; I'd prefer to deal with it in the future.

@RalfJung
Copy link
Member

I guess the concern is if we will remember dealing with it in the future -- or if that will only happen after some actual security incident caused by this.

@petrochenkov
Copy link
Contributor Author

Isn't Cargo already saving these env var values for its rebuild logic?
Or it saves some kind of fingerprint instead?

@Mark-Simulacrum
Copy link
Member

This comment suggests that they are hashed: https://github.com/rust-lang/cargo/blob/2cbe9048efc5c904b33191d799f97dc4698debaa/src/cargo/core/compiler/fingerprint.rs#L520-L524 -- but based on the only constructor -- here (https://github.com/rust-lang/cargo/blob/2cbe9048efc5c904b33191d799f97dc4698debaa/src/cargo/core/compiler/fingerprint.rs#L1348-L1355) it seems like this is not true.

So looks like Cargo presumably also stores them in plaintext.

I don't think we need to stop storing the env values, though it might be nice to make sure that there's some prefix (e.g. text:) that we always add so if we want to hash them in the future we can do so. Alternatively, we can do so today and always run them through the standard library's default hasher and document that as the way we do things; it's results should be reproducible if initialized with zero'd keys (instead of random keys).

@RalfJung
Copy link
Member

One thing we could do is state explicitly in the env!/option_env! docs the concerns about secrets.

@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 18, 2020
@bors
Copy link
Contributor

bors commented Jun 25, 2020

📌 Commit 69b2179 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 25, 2020
@Mark-Simulacrum
Copy link
Member

I do see that there are some concerns raised about potentially not being able to opt-out or whatever in the future, and some alternative designs -- my impression is that the current design should work more or less well, even if we might need some additional options. But it seems like we could always just stop emitting these keys and start emitting other keys (or files), and that would be backwards compatible I think.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Jun 25, 2020
Print environment variables accessed by rustc as special comments into depinfo files

So cargo (and perhaps others tools) can use them for linting (at least) or for actually rebuilding crates on env var changes.

---
I've recently observed one more forgotten environment variable in a build script rust-lang@8a77d1c and thought it would be nice to provide the list of accessed variables to cargo automatically as a part of depinfo.

Unsurprisingly, I wasn't the first who had this idea - cc rust-lang#70517 rust-lang#40364 rust-lang#44074.

Also, there are dozens of uses of `(option_)env!` in rustc repo and, like, half of them are not registered in build scripts.

---
Description:
- depinfo files are extended with special comments containing info about environment variables accessed during compilation.
- Comment format for environment variables with successfully retrieved value: `# env-dep:KEY=VALUE`.
- Comment format for environment variables without successfully retrieved value: `# env-dep:KEY` (can happen with `option_env!`).
- `KEY` and `VALUE` are minimally escaped (`\n`, `\r`, `\\`) so they don't break makefile comments and can be unescaped by anything that can unescape standard `escape_default` and friends.

FCP report: rust-lang#71858 (comment)

Closes rust-lang#70517
Closes rust-lang#40364
Closes rust-lang#44074
A new issue in the cargo repo will be needed to track the cargo side of this feature.

r? @ehuss
@bors
Copy link
Contributor

bors commented Jun 25, 2020

⌛ Testing commit 69b2179 with merge 1033351...

@bors
Copy link
Contributor

bors commented Jun 26, 2020

☀️ Test successful - checks-azure
Approved by: Mark-Simulacrum
Pushing 1033351 to master...

@petrochenkov
Copy link
Contributor Author

Made an issue for using this in cargo - rust-lang/cargo#8417.

alexcrichton added a commit to alexcrichton/cargo that referenced this pull request Jun 29, 2020
This commit updates Cargo's parsing of rustc's dep-info files to account
for changes made upstream in rust-lang/rust#71858. This means that if
`env!` or `option_env!` is used in crate files Cargo will correctly
rebuild the crate if the env var changes.

Closes rust-lang#8417
bors added a commit to rust-lang/cargo that referenced this pull request Jun 30, 2020
Parse `# env-dep` directives in dep-info files

This commit updates Cargo's parsing of rustc's dep-info files to account
for changes made upstream in rust-lang/rust#71858. This means that if
`env!` or `option_env!` is used in crate files Cargo will correctly
rebuild the crate if the env var changes.

Closes #8417
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 27, 2020
proc_macro: Add API for tracked access to environment variables

Continuation of rust-lang#71858.

`proc_macro::tracked_env::var` is similar to regular `env::var` called from a proc macro, except that it also adds the accessed variable to depinfo.
@jieyouxu jieyouxu added the A-reproducibility Area: Reproducible / deterministic builds label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-reproducibility Area: Reproducible / deterministic builds disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet