Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache usage meta tracking issue #7150

Open
ehuss opened this issue Jul 19, 2019 · 4 comments
Open

Cache usage meta tracking issue #7150

ehuss opened this issue Jul 19, 2019 · 4 comments
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts C-tracking-issue Category: A tracking issue for something unstable. E-hard Experience: Hard S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. Z-mtime-on-use Nightly: mtime-on-use

Comments

@ehuss
Copy link
Contributor

ehuss commented Jul 19, 2019

This issue is to help provide an overview of the different issues around Cargo's excessive disk usage, and tangentially, reducing compile time by reusing artifacts in a shared cache.

Cleaning outdated artifacts

Cargo's target directory can grow substantially over time. It has limited capabilities to clean it with cargo clean. Also, in general, cargo clean has a fair number of bugs and is generally underwhelming.

Various issues and links of interest:

I think a way forward here is to experiment and investigate different ways for tracking artifacts and last-use timestamps. mtime-on-use has an issue with cached files in Docker. The filename hash is opaque and doesn't provide any insight into the metadata which would inform whether or not an artifact could be removed.

Cargo currently tracks a variety of things in different ways. It has a .json fingerprint file which is generally unused (only for debug logging). It also has an invoked.timestamp file used for some change tracking. And mtime information is used in a few different ways. It might be interesting to experiment with a different way to coordinate all this information. Perhaps a single, unified file tracking all artifacts, or changing the way the per-artifact .json file works. The key points is that it must be fast and reliable, and should work well in Docker.

Cleaning cargo's home

Cargo's home directory ~/.cargo grows without bounds. There is currently no built-in way to shrink it.

The cargo-cache package is the foremost way to manage it currently (besides rm -rf). Ideally some of this would be a built-in capability of Cargo.

The main issue tracking this is #3289 — cargo clean ~/.cargo.

There has not been much discussion about this. Ideally cargo would have this capability built in, perhaps with some of the easier/safer tasks automated on a periodic basis.

Reusing shared dependencies

sccache is the primary way to share artifacts across projects. It is also possible to share targets with setting the CARGO_TARGET_DIR environment variable.

Issues:

Since this has the potential to use a substantial amount of disk space, it would be desirable to have better support for pruning as listed above.

There are a fairly large number of tools which dig into the target directory. They would all be broken by this change, so we would need to figure out a strategy for migration before doing this. I began this in #6668, but I have not finished. Ideally #6668 and #6577 would be finished before making this change.

@bugproof
Copy link

bugproof commented Jun 19, 2022

sccache is the primary way to share artifacts across projects

I don't think it works. I used sccache and artifacts are still stored per project instead of globally. Only setting CARGO_TARGET_DIR to something like ../.cargo-build-artifacts actually changed where artifacts are stored.

If you have ever used any C++ package managers like vcpkg and/or conan they also store dependencies globally. I think that behavior should be default for cargo as well. If I have 5 projects using tokio runtime I don't want duplicated tokio artifacts in all of them eating my disk space. That's a major oversight in cargo design.

@bjorn3
Copy link
Member

bjorn3 commented Jun 19, 2022

I used sccache and artifacts are still stored per project instead of globally.

Sccache has a global cache, but when actually using a dependency, it is copied to the local target dir. Sccache reduces build time, but doesn't help with disk usage.

If you have ever used any C++ package managers like vcpkg and/or conan they also store dependencies globally.

That design doesn't allow different projects to use different versions or configurations of a dependency. It also requires that multiple compiler versions are abi compatible with each other which is not the case for rustc. See https://cor3ntin.github.io/posts/abi/ for all the pain C++'s stable abi causes. Furthermore it hurts reproducability by introducing global state. With cargo only Cargo.lock could be considered state and this file is per project and meant to be checked in. The target dir is just a cache, unlike with vcpkg and conan. There are tradeoffs between the designs of cargo and vcpkg/conan. Neither is strictly better than the other.

If I have 5 projects using tokio runtime I don't want duplicated tokio artifacts in all of them eating my disk space.

If those are different versions of tokio or different configurations, you have to duplicate them one way or another.

@bugproof
Copy link

bugproof commented Jun 19, 2022

That design doesn't allow different projects to use different versions or configurations of a dependency.

I think it does. There's just directory per version in a single shared location. If two crates use the same version those artifacts should be re-used. I currently use CARGO_TARGET_DIR to save up some space but don't know if there are any drawbacks to that approach.

@nyabinary
Copy link

What's the status of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts C-tracking-issue Category: A tracking issue for something unstable. E-hard Experience: Hard S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. Z-mtime-on-use Nightly: mtime-on-use
Projects
Archived in project
Development

No branches or pull requests

4 participants