-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider making the src
cache read-only.
#9455
Comments
@matthiaskrgr would making files read-only give any problems to cargo-cache? |
Oh that's an interesting idea!
BUT I am already happy that it does not panic :D Right now I am just running |
I'm bringing up a new rust user on my team, and I found that they ended up editing some files in $CARGO_HOME (which is actually quite easy to do with VSCode+RA) to add some debugging messages. A read-only src cache might have given them a sufficient hint that what they were doing was not the right approach. |
Just did a PoC and benchmarks with rust-lang/cargo itself. It turns out that there is a slight performance hit (~5%) on macOS. It's in expectation, as there are some extra permission operations on every file for the first time checking out/unpacking a crate (Windows might do it better since file permissions can be inherited, though). However, it does get new problems 🤯. When I removed sources residing in $ ./target/release/cargo b
$ rm -rf ~/.cargo/registry/src
$ ./target/release/cargo b
error: failed to run custom build command for `curl-sys v0.4.55+curl-7.83.1`
Caused by:
process didn't exit successfully: `/Users/myuser/projects/cargo/target/debug/build/curl-sys-f4af01668b6e1680/build-script-build` (exit status: 101)
--- stdout
...
--- stderr
fatal: not a git repository (or any of the parent directories): .git
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', /Users/myuser/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/curl-sys-0.4.55+curl-7.83.1/build.rs:93:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace I guess its curl-sys's build script that uses Personally, I love to see this enhancement landed, but it seems a bit risky to me after the analysis. |
That's really unfortunate, I hadn't thought about that. I'm wondering if a compromise would be to only adjust permissions on packages that don't have a build script? I don't know if it would be easy to detect that (that might require parsing Cargo.toml, which might be too much overhead). I also don't know what % of packages that would cover. It's also unfortunate that the |
Given that most of these cases would be caught by crater, maybe its worthwhile to just make the change and make sure crates are fixed up? Alternatively: maybe only apply this for crates published after a certain timestamp? |
@weihanglo Asked about how to do a crater run to test the impact of changing the readonly status. The rough steps are:
I think doing a "check" run should be sufficient? I'm concerned that by default it will not get dev-dependencies, so maybe add the I still feel that it might be good to ease into the change, such as only marking read-only if there are no build scripts. We discussed some other options to consider:
Doing a crater run can maybe give us a sense of the scale of the problem. |
This could be limited to only |
Cargo.{toml,lock} should probably also be read-only. |
…ources-dir, r=epage Improve CI caching by skipping mtime checks for paths in $CARGO_HOME Skip mtime checks for paths pointing into `$CARGO_HOME` to avoid rebuilds when only caching $CARGO_HOME/registry/{index, cache} and $CARGO_HOME/git/db and some of the dependencies have `rerun-if-changed=directory` in their `build.rs` I considered skipping mtime checking only on `$CARGO_HOME/registry/src` but it looks like other functionality (like downloading a newer version of dependency) is unaffected by this and this way we also cover the same issue with git based dependencies (except the part where `cargo` is forced to re-fetch submodules if `$CARGO_HOME/git/checkouts` is missing) and it is more in line with the discussion in #9455 Fix #11083 Credit `@weihanglo` for the test (I did add a case of checking that dependency update still triggers a rebuild but they are the original author of the rest of the test)
As suggested here, a possible alternative is to compute a hash when downloading a dependency, and to verify that hash whenever Cargo needs to access that dependency. |
Somewhat related, we might consider sanitizing the executable bit as well, if we consider distribution of compiled executables via crate tarballs an unsupported use-case. |
If these files are made readonly, what would the process be for updating dependencies in the cache? i.e. if Will cargo temporarily modify the file permissions before performing an update? On unix systems it might make more sense for cargo to run as its own user. Other users have read-only access, but the cargo user has write access. Thoughts? |
Every crate version is stored in a separate directory. This is necessary to be able to cache multiple crate versions at once. And because of this we never need to modify the directory containing the cached crate after we are done unpacking the downloaded tarball. |
There is no way to do that as an unprivileged user. And requiring root permission to install rust would be a really bad idea. |
Cargo doesn't update dependencies at per-file-level. Cargo basically downloads new tarballs, unpacks them, and put files in separate directories. |
Ahh... I should have seen that. In that case, disregard my comment. I am fully in support of making the registry src files read-only. |
We can in addition or instead of ownership or permission bits, use OS-specific flags for finer-grained semantics, like |
@sanmai-NL could you elaborate more on how the approach resolves the concern in #9455 (comment)? It may still fail people's builds, no? |
The immutable flag on linux can only be set by root as it prevents even root from modifying it. |
@weihanglo Better performance can be attained by setting the right inode metadata along with file creation. Can you test setting a As for the breakage, I think that it's a defect to perform these file manipulations in build scripts. I don't know if it is possible within the Cargo architecture to restrict build scripts to not perform some actions. That would be good in general. |
The umask is process global, but someone may be creating files concurrently with cargo when using cargo as library. Also for directories setting the umask I believe will require using chmod anyway to make them writable again while creating files inside the directory. And on Unix I'm not too worried about a single extra chmod per file and directory. On Windows I would be more concerned, but the Windows kernel doesn't have anything like umask afaik. The CRT supports the function, but I wouldn't be surprised if it internally calls the equivalent of chmod after every file creation. |
This could be opt-in per crate. Cargo currently makes tarballs with |
Just an idea here off the top of my head, but why extract source archives at all? Can't we get their contents as needed, in-memory? That may sound costly but I wonder how often that really happens and if extracting them amortizes well (the extra cost being the filesystem operations that notably result in syscalls). That would reduce the number of files to be created and changed, so even particular file metadata like immutable flag can be set at the cost of |
https://docs.rs/tar/latest/tar/struct.Entry.html#method.set_mask |
Rustc currently doesn't support that. We could add it, but build scripts and proc macros will still need it extracted somewhere on the filesystem. |
Tools communicate by filesystem paths, so how do you propose to send an in-memory "path" to |
The invention of not-a-tarball crate storage system is irrelevant here, because the problem is crates relying on files existing in the file system, and existing with specific permissions. If you succeed in removing the files from |
Does |
What is your opinion on this breakage? Would it preclude moving forward with a more performant design in any case? |
Sccache doesn't cache source code. It reads the |
Let's also discuss differentation between crates. Crates without build scripts were mentioned. But only-binary or only-library is also relevant. Binary-only crates's source as some specific version needs to be only read very infrequently compared to libraries, especially popular libraries. So for binary-only, non-buildscript crates you'd most want to avoid extracting the tar archive (even though they may be smaller, in comparison, perhaps) and perhaps doing other filesystem operations such as setting permissions. |
Yes, and the latter would probably require much more significant architecture reengineering than supporting tar-like archives directly. |
And this assuming that no incremental compilation cache is available. In my experience one sometimes cleans this cache. But not too often I suppose. |
Can this be done for 2024 edition crates?
|
Independent of the question of whether this can be tied to an Edition, they asked for Edition 2024 changes to have their PRs posted by May 1st. There is a slim chance something small can be rushed in late but members of the Cargo team have already been exhausting themselves to meet the May 1st date with what was already committed. |
Registry dependencies get extracted into cargo home (in the
src
directory) with whatever metadata is in the tar file. One issue with this is that the files are usually writeable. This can cause a problem if the user accidentally modifies these files, which breaks cargo's assumption that they are immutable and reusable. One way this can happen is that in some editors, when there is an error or warning, they may open those files to display the error/warning (particularly with macros). The user may not realize that this is from a remote location, and may not understand the consequence of making changes.We may want to consider making those files read-only when extracting them. This would help with confusing situations where the
src
cache is inadvertently corrupted.This would not protect from general filesystem corruption, which is also an issue. This is also an issue with
git
dependencies, which may be more difficult to adjust permissions on.There is some risk that this would introduce new problems. For example, if people are using tools to clean the src directory, and those tools have trouble with read-only files.
The text was updated successfully, but these errors were encountered: