-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't clone submodules for tools that aren't being built #76653
Comments
Many of these must be cloned with the current global workspace, because otherwise cargo will fail to resolve the workspace. I think that may be true for all of them - or at least all cargo projects. Once downloadable LLVM lands we can skip that though. |
@ehuss @Eh2406 -- I think you've both been working on Cargo's resolver recently. How feasible would it be to explicitly tell Cargo "this crate, and its dependencies, should be presumed unchanged" when re-resolving a lockfile? My guess is it's quite hard. Currently we get something like this whenever a submodule is just not yet initialized:
One "solution" to this perhaps is to duplicate submodule's Cargo.toml's in-tree but stubbed out and pointing at empty crates (basically just so resolver works). But I'd really prefer to avoid doing that, and it does seem like Cargo should have all the information already available. We could plausibly arrange for x.py to fetch just the Cargo.toml's for these submodules, I guess, though that also seems painful. |
I think it will be quite difficult, since there are a few other things in Cargo that expect the manifests and files to be there. Do the cargo-based projects really matter performance-wise? It seems like submodule stuff is completely dominated by llvm. On my system, it looks like it is 3.5 times larger than all the others added together. Also, rust-by-example is unnecessarily huge. I think squashing the gh-pages branch in that repo would shrink it dramatically. The next largest is rust-analyzer. That's more feasible to ignore, since it is in its own little world. Unfortunately it has node-modules checked in, which is quite huge. What is the status of using |
AFAIK patch to |
I agree LLVM is the main issue - it would be nice to have the rest downloaded on-demand, but it's not dramatically hurting setup times I think.
👍 sounds easy enough to do |
|
Do you have suggestions for how to do this? I tried https://stackoverflow.com/a/1661283/7669110 locally, which worked in the sense that the history is now one commit, but didn't decrease the size of .git at all even after running |
You probably still have the remote branch. Try removing |
I think the issue you created looks correct. It needs someone with write permissions to that repo to do it directly. As with anything git-related, there are many ways to accomplish the same task. If it is wrong, worst case we can rebuild the branch. I don't think there is any history there that is important, and at least I have a local backup copy. I did this before with some repo (I can't remember which). I think GitHub will gc pretty quick (probably within 24 hours), though I imagine they have heuristics on how often that happens. |
Don't clone LLVM submodule when download-ci-llvm is set Previously, `downloading_llvm` would check `self.build` while it was still an empty string, and think it was always false. This fixes the check. This addresses the worst part of rust-lang#76653. There are still some large submodules being downloaded (in particular, `rustc-by-example` is 146 MB, and all the submodules combined are 311 MB), but this is a lot better than the whopping 1.4 GB before.
Hmm, #81520 doesn't help with this as much as I thought because you have to make a config.toml before ever running a command with x.py. I wonder if we could clone LLVM lazily instead (only when it needs to be built). |
@Mark-Simulacrum what do you think about doing this in
|
That seems largely fine to me, presuming it doesn't duplicate too much code or so (I wouldn't expect it to). |
Move llvm submodule updates to rustbuild This enables better caching, since LLVM is only updated when needed, not whenever x.py is run. Before, bootstrap.py had to use heuristics to guess if LLVM would be needed, and updated the module more often than necessary as a result. This syncs the LLVM submodule only just before building the compiler, so people working on the standard library never have to worry about it. Example output: ``` Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu) Updating submodule src/llvm-project Submodule 'src/llvm-project' (https://github.com/rust-lang/llvm-project.git) registered for path 'src/llvm-project' Submodule path 'src/llvm-project': checked out 'f9a8d70b6e0365ac2172ca6b7f1de0341297458d' ``` Implements rust-lang#76653 (comment). This could be easily extended to other submodules, like `rust-by-example` and `rustc-dev-guide`, which aren't needed for cargo's workspace resolution.
…imulacrum Update all submodules that rustbuild doesn't depend on lazily This only updates the submodules the first time they're needed, instead of unconditionally the first time you run x.py. Ideally, this would move *all* submodules to rustbuild and not exclude some tools and backtrace. Unfortunately, cargo requires all `Cargo.toml` files in the whole workspace to be present to build any crate. On my machine, this takes the time for an initial submodule clone (for `x.py --help`) from 55.70 to 15.87 seconds. Helps with rust-lang#76653. Builds on rust-lang#86015 and should not be merged before (only the last commit is relevant).
Update: since #82653, only the following submodules are cloned:
Those take up a total of 67 M, most of them in miri, RLS, and cargo:
Given that .git is 795 MB overall I'm going to go ahead and close this issue in favor of #63978. |
Right now, this is the first thing you see on a fresh clone:
This can go on for many minutes, especially on a slow connection. Instead,
x.py
should only clone the submodules when they're actually needed.I think
cargo build
requires having theCargo.toml
of the submodules to work, which might be tricky ... maybex.py
could default to a shallow clone? The main thing I'd love to have is forx.py check
to not have to download LLVM.Thanks to @Lokathor and @thomcc for encouraging me to throw away my clone and start from scratch ;)
The text was updated successfully, but these errors were encountered: