Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to cache git modules #40780

Merged
merged 2 commits into from
Mar 30, 2017
Merged

Conversation

aidanhs
Copy link
Member

@aidanhs aidanhs commented Mar 24, 2017

Partial resolution of #40772, appveyor remains to be done once travis looks like it's working ok.

The approach in this PR is based on the --reference flag to git-clone/git-submodule --update and is a compromise based on the current limitations of the tools we're using.

The ideal would be:

  1. have a cached pristine copy of rust-lang/rust master in $HOME/rustsrc with all submodules initialised
  2. clone the PR branch with git clone --recurse-submodules --reference $HOME/rustsrc git@github.com:rust-lang/rust.git

This would (in the nonexistent ideal world) use the pristine copy as an object cache for the top level repo and all submodules, transferring over the network only the changes on the branch. Unfortunately, a) there is no way to manually control the initial clone with travis and b) even if there was, cloned submodules don't use the submodules of the reference as an object cache. So the steps we end up with are:

  1. have a cached pristine copy of rust-lang/rust master in $HOME/rustsrc with all submodules initialised
  2. have a cloned PR branch
  3. extract the path of each submodule, and explicitly git submodule update --init --reference $HOME/rustsrc/$module $module (i.e. point directly to the location of the pristine submodule repo) for each one

I've also taken some care to make this forward compatible, both for adding and removing submodules.

r? @alexcrichton

@aidanhs aidanhs force-pushed the aphs-cache-git-modules branch 4 times, most recently from 6f54e0d to dba2a07 Compare March 24, 2017 00:22
@aidanhs aidanhs force-pushed the aphs-cache-git-modules branch 5 times, most recently from 876e6e2 to 30272e2 Compare March 24, 2017 01:42
@aidanhs
Copy link
Member Author

aidanhs commented Mar 24, 2017

While muddling through the bits and pieces that broke, I was able to observe the new cache in action. The below logs show step 1 (of 3) as described above (note that step 2 and 3 are effectively constant time so aren't relevant).

Cache creation (https://travis-ci.org/rust-lang/rust/jobs/214478030#L308):

[00:00:00] Cloning into '/home/travis/rustsrc'...
[00:00:00] HEAD is now at e703b33e3e Auto merge of #40759 - alexcrichton:appveyor-retry, r=brson
[00:00:00] Already up-to-date.
[00:00:00] Cleared directory 'cargo'
[00:00:00] Cleared directory 'src/compiler-rt'
[00:00:00] Cleared directory 'src/doc/book'
[00:00:00] Cleared directory 'src/doc/nomicon'
[00:00:00] Cleared directory 'src/doc/reference'
[00:00:00] Cleared directory 'src/jemalloc'
[00:00:00] Cleared directory 'src/liblibc'
[00:00:00] Cleared directory 'src/llvm'
[00:00:00] Cleared directory 'src/rt/hoedown'
[00:00:00] Cleared directory 'src/rust-installer'
[00:00:00] Cloning into '/home/travis/rustsrc/cargo'...
[00:00:01] Cloning into '/home/travis/rustsrc/src/compiler-rt'...
[00:00:04] Cloning into '/home/travis/rustsrc/src/doc/book'...
[00:00:06] Cloning into '/home/travis/rustsrc/src/doc/nomicon'...
[00:00:07] Cloning into '/home/travis/rustsrc/src/doc/reference'...
[00:00:07] Cloning into '/home/travis/rustsrc/src/jemalloc'...
[00:00:08] Cloning into '/home/travis/rustsrc/src/liblibc'...
[00:00:10] Cloning into '/home/travis/rustsrc/src/llvm'...
[00:01:09] Cloning into '/home/travis/rustsrc/src/rt/hoedown'...
[00:01:10] Cloning into '/home/travis/rustsrc/src/rust-installer'...
[00:01:10] Submodule path 'cargo': checked out 'c995e9eb5acf3976ae8674a0dc6d9e958053d9fd'
[00:01:11] Submodule path 'src/compiler-rt': checked out 'd30da544a8afc5d78391dee270bdf40e74a215d3'
[00:01:11] Submodule path 'src/doc/book': checked out '9bd223ca406b1170a24942d6474f9e8a56f4a420'
[00:01:11] Submodule path 'src/doc/nomicon': checked out 'd08fe97d12b41c1ed8cc7701e545864132783941'
[00:01:11] Submodule path 'src/doc/reference': checked out '516549972d61c8946542d1a34afeae97167ff77b'
[00:01:11] Submodule path 'src/jemalloc': checked out '11bfb0dcf85f7aa92abd30524bb1e42e18d108c6'
[00:01:11] Submodule path 'src/liblibc': checked out '64d954c6a76e896fbf7ed5c17e77c40e388abe84'
[00:01:13] Submodule path 'src/llvm': checked out 'd5ef27a79661d4f0d57d7b7d2cdbe9204f790a4a'
[00:01:13] Submodule path 'src/rt/hoedown': checked out 'da282f1bb7277b4d30fa1599ee29ad8eb4dd2a92'
[00:01:13] Submodule path 'src/rust-installer': checked out '4f994850808a572e2cc8d43f968893c8e942e9bf'

Cache reuse (https://travis-ci.org/rust-lang/rust/jobs/214478030#L308):

[00:00:00] Cleared directory 'cargo'
[00:00:00] Submodule 'src/tools/cargo' (https://github.com/rust-lang/cargo.git) unregistered for path 'cargo'
[00:00:02] Cleared directory 'src/compiler-rt'
[00:00:02] Submodule 'src/compiler-rt' (https://github.com/rust-lang/compiler-rt.git) unregistered for path 'src/compiler-rt'
[00:00:02] Cleared directory 'src/doc/book'
[00:00:02] Submodule 'book' (https://github.com/rust-lang/book) unregistered for path 'src/doc/book'
[00:00:02] Cleared directory 'src/doc/nomicon'
[00:00:02] Submodule 'src/doc/nomicon' (https://github.com/rust-lang-nursery/nomicon.git) unregistered for path 'src/doc/nomicon'
[00:00:02] Cleared directory 'src/doc/reference'
[00:00:03] Submodule 'reference' (https://github.com/rust-lang-nursery/reference.git) unregistered for path 'src/doc/reference'
[00:00:03] Cleared directory 'src/jemalloc'
[00:00:03] Submodule 'src/jemalloc' (https://github.com/rust-lang/jemalloc.git) unregistered for path 'src/jemalloc'
[00:00:03] Cleared directory 'src/liblibc'
[00:00:03] Submodule 'src/liblibc' (https://github.com/rust-lang/libc.git) unregistered for path 'src/liblibc'
[00:00:03] Cleared directory 'src/llvm'
[00:00:03] Submodule 'src/llvm' (https://github.com/rust-lang/llvm.git) unregistered for path 'src/llvm'
[00:00:03] Cleared directory 'src/rt/hoedown'
[00:00:03] Submodule 'src/rt/hoedown' (https://github.com/rust-lang/hoedown.git) unregistered for path 'src/rt/hoedown'
[00:00:03] Cleared directory 'src/rust-installer'
[00:00:03] Submodule 'src/rust-installer' (https://github.com/rust-lang/rust-installer.git) unregistered for path 'src/rust-installer'
[00:00:03] Submodule 'src/tools/cargo' (https://github.com/rust-lang/cargo.git) registered for path 'cargo'
[00:00:03] Submodule 'src/compiler-rt' (https://github.com/rust-lang/compiler-rt.git) registered for path 'src/compiler-rt'
[00:00:03] Submodule 'book' (https://github.com/rust-lang/book) registered for path 'src/doc/book'
[00:00:03] Submodule 'src/doc/nomicon' (https://github.com/rust-lang-nursery/nomicon.git) registered for path 'src/doc/nomicon'
[00:00:03] Submodule 'reference' (https://github.com/rust-lang-nursery/reference.git) registered for path 'src/doc/reference'
[00:00:03] Submodule 'src/jemalloc' (https://github.com/rust-lang/jemalloc.git) registered for path 'src/jemalloc'
[00:00:03] Submodule 'src/liblibc' (https://github.com/rust-lang/libc.git) registered for path 'src/liblibc'
[00:00:03] Submodule 'src/llvm' (https://github.com/rust-lang/llvm.git) registered for path 'src/llvm'
[00:00:03] Submodule 'src/rt/hoedown' (https://github.com/rust-lang/hoedown.git) registered for path 'src/rt/hoedown'
[00:00:03] Submodule 'src/rust-installer' (https://github.com/rust-lang/rust-installer.git) registered for path 'src/rust-installer'
[00:00:03] Submodule path 'cargo': checked out 'c995e9eb5acf3976ae8674a0dc6d9e958053d9fd'
[00:00:04] Submodule path 'src/compiler-rt': checked out 'd30da544a8afc5d78391dee270bdf40e74a215d3'
[00:00:04] Submodule path 'src/doc/book': checked out '9bd223ca406b1170a24942d6474f9e8a56f4a420'
[00:00:04] Submodule path 'src/doc/nomicon': checked out 'd08fe97d12b41c1ed8cc7701e545864132783941'
[00:00:04] Submodule path 'src/doc/reference': checked out '516549972d61c8946542d1a34afeae97167ff77b'
[00:00:04] Submodule path 'src/jemalloc': checked out '11bfb0dcf85f7aa92abd30524bb1e42e18d108c6'
[00:00:04] Submodule path 'src/liblibc': checked out '64d954c6a76e896fbf7ed5c17e77c40e388abe84'
[00:00:05] Submodule path 'src/llvm': checked out 'd5ef27a79661d4f0d57d7b7d2cdbe9204f790a4a'
[00:00:05] Submodule path 'src/rt/hoedown': checked out 'da282f1bb7277b4d30fa1599ee29ad8eb4dd2a92'
[00:00:05] Submodule path 'src/rust-installer': checked out '4f994850808a572e2cc8d43f968893c8e942e9bf'

I've finished messing with this PR now, I think the next build will be successful on travis.

@alexcrichton
Copy link
Member

Thanks again for the PR! Exciting to see improvements already!

Some thoughts on this from me would be:

  • Could this maybe be extracted to a shell script to work on Windows as well? We have the biggest gains to earn on AppVeyor so it'd be awesome to make sure we can leverage this there.
  • I wonder if this could use git submodule foreach to iterate over submodules?
  • I'm not personally very familiar with --reference, but just to clarify this handles updates correctly? In the sense that we'll often want to fetch remote objects that aren't present in the local repository, but this will fall back to updating the local repository's cache from the remote location?

@aidanhs
Copy link
Member Author

aidanhs commented Mar 24, 2017

Could this maybe be extracted to a shell script to work on Windows as well? We have the biggest gains to earn on AppVeyor so it'd be awesome to make sure we can leverage this there.

Absolutely (I actually considered asking if this somewhat unwieldy inline shell script should become a separate file!), but I wanted to give it a chance to work out any minor kinks on Linux first as that's where I develop and can investigate issues. If you're keen to get it on appveyor asap then I can do this now.

I wonder if this could use git submodule foreach to iterate over submodules?

Sadly not :( http://stackoverflow.com/questions/12641469/list-submodules-in-a-git-repository:

The git submodule foreach command could echo the names of the submodule, but that only works once they have been checked out which has not happened after the init step.

I'm not personally very familiar with --reference, but just to clarify this handles updates correctly? In the sense that we'll often want to fetch remote objects that aren't present in the local repository, but this will fall back to updating the local repository's cache from the remote location?

Things are done in two steps:

  1. pristine copy (cache) is explicitly updated with the latest of everything from master
  2. branch checkout is updated with the pristine copy as a reference. --reference means that objects are 'borrowed' from alternate locations on disk (i.e. git will look in those directories for an object it has a reference for), and if it isn't present during pull/clone/update it will be fetched from the remote as normal. Note that this fetching does not update the cache, hence step 1.

Hopefully this answers your question?

@alexcrichton
Copy link
Member

Sadly not :( http://stackoverflow.com/questions/12641469/list-submodules-in-a-git-repository:

Good lord.

Hopefully this answers your question?

Does indeed, thanks! Sounds good to me :)

@aidanhs
Copy link
Member Author

aidanhs commented Mar 24, 2017

Started to look at appveyor and encountered a potential problem related to cache size - https://www.appveyor.com/docs/build-cache/#cache-size-beta.

All the git repos combined come to ~1.1GB when compressed in the way they describe, so depending on the plan rust is on (and the size of the objects, which I'm not sure how to inspect), this could blow the cache. @alexcrichton is this possibly an issue, or should I just assume it's fine and continue?

@alexcrichton
Copy link
Member

Yeah that should be ok, I believe we have a larger cache than normal. Right now we're caching entire builds of LLVM and that's probably larger than the git repo!

@aidanhs aidanhs force-pushed the aphs-cache-git-modules branch 10 times, most recently from 47326eb to 9afae97 Compare March 27, 2017 17:30
@aidanhs
Copy link
Member Author

aidanhs commented Mar 27, 2017

Ok, I think this is ready to try. It now implements corruption paranoia, only caching if the pristine repo is known to be in a good state. All the logic is in a standalone script and should also work on appveyor, though I'm not really able to test it. The travis cache seems to work fine (population and usage).

@alexcrichton assuming the changes look ok, is it possible to do two tries with bors (one for populating cache, one for cache usage if the first is successful)? Just one try will only test appveyor cache population. Ideally there'd be a way to just test one appveyor build combo (like travis ALLOW_PR) but I don't know if that's possible?

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Unfortunately I don't know of a way to bounce, save caches, then try again. I think caching for this PR's Travis builds should be working?

.travis.yml Outdated
@@ -134,11 +134,14 @@ script:
- >
if [ "$ALLOW_PR" = "" ] && [ "$TRAVIS_BRANCH" != "auto" ]; then
echo skipping, not a full build;
elif [ "$TRAVIS_OS_NAME" = "osx" ]; then
travis_retry stamp sh -c 'git submodule deinit -f . && git submodule update --init' &&
exit 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wreaks havoc with Travis's script, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not seen any issues with it, but I'll rewrite to use &&.

.travis.yml Outdated
travis_retry stamp sh -c 'git submodule deinit -f . && git submodule update --init' &&
exit 0;
fi;
set -o errexit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could just use && below?

@aidanhs aidanhs force-pushed the aphs-cache-git-modules branch from 9afae97 to 0347ff5 Compare March 29, 2017 01:52
@aidanhs
Copy link
Member Author

aidanhs commented Mar 29, 2017

Looks great! Unfortunately I don't know of a way to bounce, save caches, then try again. I think caching for this PR's Travis builds should be working?

Ah of course, I now recall that appveyor does mention not saving caches on PRs. I've squinted hard at the changes exclusive to appveyor and I can't think of any issues outside of possible 'permission denied' errors (which will be caught by bors e.g. when creating C:\cache\rustsrc), but I'm not intimately familiar with mingw/msys. Worst case the cache can be cleared and a revert merged, but very much doubt it will come to that.

Yes, the travis cache works fine. https://travis-ci.org/rust-lang/rust/jobs/216280916#L446-L450 - this is the time taken for a checkout of the LLVM submodule, using the pristine repo as a reference (and not retrieving anything from the network). In fact, you can see the total time for the init_repo.sh script is 55s, including the incremental cache update (previously ~1min5s, which hits the network much more heavily).

I've changed .travis.yml to use shell continuations rather than errexit (sorry, I amended the commit before I remembered that the preferred method is to add a new commit) and made some very small changes to the retry utility in a new commit.

@alexcrichton
Copy link
Member

@bors: r+

Looks great! Let's see how this fares on appveyor

@bors
Copy link
Contributor

bors commented Mar 29, 2017

📌 Commit 96e174f has been approved by alexcrichton

@alexcrichton alexcrichton reopened this Mar 29, 2017
@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Mar 29, 2017

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Mar 29, 2017

📌 Commit 96e174f has been approved by alexcrichton

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Mar 29, 2017
…lexcrichton

Attempt to cache git modules

Partial resolution of rust-lang#40772, appveyor remains to be done once travis looks like it's working ok.

The approach in this PR is based on the `--reference` flag to `git-clone`/`git-submodule --update` and is a compromise based on the current limitations of the tools we're using.

The ideal would be:
1. have a cached pristine copy of rust-lang/rust master in `$HOME/rustsrc` with all submodules initialised
2. clone the PR branch with `git clone --recurse-submodules --reference $HOME/rustsrc git@github.com:rust-lang/rust.git`

This would (in the nonexistent ideal world) use the pristine copy as an object cache for the top level repo and all submodules, transferring over the network only the changes on the branch. Unfortunately, a) there is no way to manually control the initial clone with travis and b) even if there was, cloned submodules don't use the submodules of the reference as an object cache. So the steps we end up with are:

1. have a cached pristine copy of rust-lang/rust master in `$HOME/rustsrc` with all submodules initialised
2. have a cloned PR branch
3. extract the path of each submodule, and explicitly `git submodule update --init --reference $HOME/rustsrc/$module $module` (i.e. point directly to the location of the pristine submodule repo) for each one

I've also taken some care to make this forward compatible, both for adding and removing submodules.

r? @alexcrichton
bors added a commit that referenced this pull request Mar 29, 2017
Rollup of 6 pull requests

- Successful merges: #40780, #40814, #40816, #40832, #40901, #40907
- Failed merges:
@bors
Copy link
Contributor

bors commented Mar 30, 2017

⌛ Testing commit 96e174f with merge c82f132...

@bors bors merged commit 96e174f into rust-lang:master Mar 30, 2017
@aidanhs
Copy link
Member Author

aidanhs commented Mar 30, 2017

It merged. Next build: https://ci.appveyor.com/project/rust-lang/rust/build/1.0.2621

Build started
git clone -q --depth=1 --branch=auto https://github.com/rust-lang/rust.git C:\projects\rust
git checkout -qf d9d7931152af5ac7d2fa208af85c3262af694247
Restoring build cache
Cache 'C:\cache\rustsrc' - Unzipping...Error uncompressing cache item: 7z.exe process has exited with code 2. Check C:\Users\appveyor\AppData\Local\Temp\1\build-cache-logs\611919de52bbda249e7468a0750b9e479f441c99.zip.001.log for details.

Uh oh. I'm going to prepare a PR to disable this on appveyor.

@aidanhs aidanhs mentioned this pull request Mar 30, 2017
bors added a commit that referenced this pull request Mar 30, 2017
Disable appveyor cache

Reverts just appveyor part of #40780. r? @aturon
@aidanhs aidanhs deleted the aphs-cache-git-modules branch March 30, 2017 11:51
@aidanhs aidanhs mentioned this pull request Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants