Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to effectively clean target folder for CI caching #5885

Open
thomaseizinger opened this issue Aug 13, 2018 · 14 comments
Open

How to effectively clean target folder for CI caching #5885

thomaseizinger opened this issue Aug 13, 2018 · 14 comments
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@thomaseizinger
Copy link

thomaseizinger commented Aug 13, 2018

I have a workspace repository with several crates in it (> 10). A clean build of all of them takes about 10 minutes on Travis, which is why I want to cache the target/ folder.

The problem is, the target folder gets quite big (~ 1.7GB) so it takes also about 3 minutes to upload the cache to S3 after the build.

The question is: How can I clean the target folder from any artifacts generated by my own code?

If I can achieve that, then the target folder would only have the artifacts of all the dependencies in it. As long as they don't change, Travis would not have to re-upload the cache. At the same time, rebuilding only the workspace crates takes only 30 seconds.

I have already tried several things:

  • cargo clean -p for every workspace package
  • Delete all files in target that mention a workspace crate's name

None of the above were enough in order to get the target folder into a state where NOTHING changes between two builds.

I couldn't really understand the layout of the target folder: The artifacts of dependencies seem to be mixed up with those of the workspace crates. Is there some documentation available on how the target folder is structured?

@alexcrichton
Copy link
Member

Thanks for the report! Right now there's not a great answer here in that Cargo doesn't have anything implemented to do something like this nor does it internally support the ability to know which artifacts are super old.

Currently I'd recommend using something like sccache for CI instead of caching the entire target folder in the meantime, but work on this would definitely be appreciated!

@alexcrichton alexcrichton added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Aug 14, 2018
@thomaseizinger
Copy link
Author

Thanks for the quick answer!
We are using sccache now which helps already quite a bit!

@matklad
Copy link
Member

matklad commented Oct 24, 2018

For posterity: in my .travis.ci, in before_cache, I try to explicitly delete artifacts for current workspace, while keeping artifacts from the deps intact:

https://github.com/rust-analyzer/rust-analyzer/blob/9a7db8fa009c612168ef16f6ed72315b5406ed09/.travis.yml#L2-L4

This seems to greatly speed-up CI builds. I wonder if we can make caching easier by adding more structure to the target dir? If, for example, we group artifacts into folders based on source_id, then it should be easy to purge anything not from crates.io.

@thomaseizinger
Copy link
Author

thomaseizinger commented Oct 24, 2018 via email

@matklad
Copy link
Member

matklad commented Dec 4, 2018

So, I've been using the rm ./targetd/debug/deps/changing-stuff trick with rust-analyzer for the past couple of months, and it really makes the huge difference. Here's how the typical build looks like:

https://travis-ci.org/rust-analyzer/rust-analyzer/jobs/463540847

One interesting bit is compilation:

$ cargo test
   Compiling test_utils v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/test_utils)
   Compiling tools v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/tools)
   Compiling ra_syntax v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_syntax)
   Compiling gen_lsp_server v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/gen_lsp_server)
   Compiling ra_editor v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_editor)
   Compiling ra_db v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_db)
   Compiling ra_cli v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_cli)
   Compiling ra_hir v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_hir)
   Compiling ra_analysis v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_analysis)
   Compiling ra_lsp_server v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_lsp_server)
warning: unused `#[macro_use]` import
 --> crates/ra_lsp_server/src/main.rs:5:1
  |
5 | #[macro_use]
  | ^^^^^^^^^^^^
  |
  = note: #[warn(unused_imports)] on by default
                                                                                
warning: unused `#[macro_use]` import
 --> crates/ra_lsp_server/src/main.rs:5:1
  |
5 | #[macro_use]
  | ^^^^^^^^^^^^
  |
  = note: #[warn(unused_imports)] on by default
                                                                                
    Finished dev [unoptimized + debuginfo] target(s) in 1m 32s

This is awesome, considering the fact that rust-analyzer has 145 dependencies, two of which are syns.

Another interesting bit is cache uploading:

before_cache.1
$ find ./target/debug -type f -maxdepth 1 -delete
find: warning: you have specified the -maxdepth option after a non-option argument -type, but options are not positional (-maxdepth affects tests specified before it as well as those specified after it).  Please specify options before other arguments.
before_cache.2
$ rm -fr ./target/debug/{deps,.fingerprint}/{*ra_*,*test*,*tools*,*gen_lsp*}
before_cache.3
$ rm -f  ./target/.rustc_info.json
cache.2
store build cache
nothing changed, not updating cache (yay!)

I think such caching strategy may make a huge difference for project which build/gate on stable (nightly is trickier, because cache will die with the next nightly).

In terms of cost/benefit, I think adding a small help from the Cargo's side (dumping all deps from crates.io source to a separate cachable dir) will be very welcome here.

cc @rust-lang/cargo: perhaps I am overly enthusiastic here, but it does look like a low-hanging watermelon :)

@alexcrichton
Copy link
Member

I don't think there's any disagreement that Cargo can do a lot here, just needs someone to help push through a design!

@matklad
Copy link
Member

matklad commented Dec 6, 2018

Made a rough POC here: matklad@b3566b1

It was harder then anticipated, mainly because we have three different dirs we need to account for: .fingerprint, deps and build. The POC deals only with the first two, by introducing .fingerprint/crates-io and deps/crates-io for dependencies which never change. It successfully avoids rebuilding crates.io dependencies, which do not have build scripts.

The real implementation should probably transpose the order of directories:

target/
  debug/
    .fingerprint/
    deps/
    build/
    cacheable/
      .fingerprint/
      deps/
      build/

so that you can point CI cache to target/debug/cacheable and be done with it. Probably, the cacheable should even come before profile/target_triple.

However, with this design, if we then allow overriding the location of cacheable dir via env_var, we pretty much get the "share common dependencies across projects" behavior, which is also a sought for feature..

@thomaseizinger
Copy link
Author

thomaseizinger commented Dec 7, 2018

Does this also account/work for workspaces? We have had the biggest problems with workspaces because local crates depend on each other and that causes a weird structure inside the target folder.

Nice that this is being worked on! :)

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 7, 2018

I don't know how the files are currently laid out, so my opinion should come with a big helping of salt, it would be lovely to have the rust version in the path for artifacts that can only be used by that version of rust. That would make it straightforward to make a GC that dells artifacts that are for a version of rust that are no longer installed. This is not mutually exclusive with other information being encoded in the path, or stored in some other format (like a timestamp file).

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 14, 2018

cc crates.io having trouble with this.

@thomaseizinger
Copy link
Author

Found something that might be useful in this regard:

https://github.com/ustulation/cargo-prune

petr-tik added a commit to petr-tik/tantivy that referenced this issue Apr 19, 2019
fulmicoton pushed a commit to quickwit-oss/tantivy that referenced this issue Apr 20, 2019
* Delete files from target/ dir to avoid caching them on CI

idea from here rust-lang/cargo#5885 (comment)

* Delete examples
@pksunkara
Copy link

So, we at clap tried to optimize the cache for CI since we kept making the cache bigger every time we ran on Travis. We took inspiration from rust-analyzer and ended up with the following: https://github.com/clap-rs/clap/pull/1658/files#diff-354f30a63fb0907d4ad57269548329e3R5-R16

It has multiple packages and also has trybuild tests. I am not sure if I was overzealous a bit. But I thought you guys might want to know.

@ehuss ehuss added the A-caching Area: caching of dependencies, repositories, and build artifacts label Apr 6, 2020
bors bot added a commit to rtic-rs/rtic that referenced this issue Aug 25, 2020
341: Enable caching for Github Actions r=korken89 a=AfoHT

Using [GHA caching](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows) to store key parts of the testing environment.

One example is downloading `arm-none-eabi-gcc` (which takes roughly 1 minute) for each checkexamples.

The rustup setup is roughly 200MB and restores nicely.

Rust examples can be found [here](https://github.com/actions/cache/blob/master/examples.md#rust---cargo)

**Something to discuss:**

Several notable projects remove some problematic files in order to keep cache size reasonable

[rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/blob/9a7db8fa009c612168ef16f6ed72315b5406ed09/.travis.yml#L2-L4)

[cargo duscussion](rust-lang/cargo#5885)

[tantivity-search](https://github.com/tantivy-search/tantivy/pull/531/files)

[clap-rs](https://github.com/clap-rs/clap/pull/1658/files#diff-354f30a63fb0907d4ad57269548329e3R5-R16)


Co-authored-by: Henrik Tjäder <henrik@tjaders.com>
@epage
Copy link
Contributor

epage commented Oct 23, 2023

FYI there is now https://crates.io/crates/cargo-cache though #12633 might be able to help with this.

@epage epage added the S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. label Oct 23, 2023
@sjackman
Copy link

I'm using the GitHub Action swatinem/rust-cache for caching CI builds.
https://github.com/swatinem/rust-cache#cache-effectiveness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

8 participants