-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetch publisher data from crates.io, start build artifact caching #2079
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much of the permission related parts of #1757 we would need here?
The main thing we're missing is that we're not marking the cache directory as read-only when we restore it. I would prefer to do that, but since we're only using builds by the same owner, it can only be a correctness issue, not a security issue, and in practice I doubt anything will go wrong.
monitor free disk space & delete old artifacts if it gets too full
We need to at least clear the cache every 24 hours when we update the toolchain, or we will run out of disk space fairly soon whoops, I see you already did this. IMO we should also clean the cache after every ~10 builds or so, but I feel less strongly about that.
I left comments on your other questions inline.
Can you expand on the rationale for going by same owner rather than by same crate? It seems to me that any cross-crate sharing is very iffy, since crates typically expect to build in a distinct workspace rather than in the same target directories as other crates. Crater has encountered a long tail of problems here that we're gradually fixing or hiding under the rug, and I wouldn't want to repeat that for something user facing like docs.rs. One particular thing to call out is to make sure to completely get rid of the cache on each nightly bump, Crater used to share target directories for that but stopped as we were hitting too many "wrong rustc version" errors in a typical run. Given that lack of sharing opportunity, it seems like this would only help if crates are published more than once a day, which seems like a rare use case compared to the complexity of getting this right. Long-term (or really hopefully quite soon) I hope that the build environment will be an entirely ephemeral ec2 instance, which may get replaced even after every build eventually. That'll make caching here even harder. |
@Mark-Simulacrum #1757 has a detailed rationale. The tldr is that crates in workspaces often have very large dependency trees that are the same between all the crates in the workspace.
Crates in a workspace are often published all at once, one after another. |
Thank you for taking the time to look at it!
We have some publishers that always release batches of crates (sometimes 50-100 releases at once), also visible in the current queue. Recent examples would be
Could you elaborate on this? Where these build failures? In which cases did crater share the target directory?
We're already doing this in this PR. When the nightly version switches, we purge the whole cache.
See the rationale, this is mainly for bulk-publishes.
I believe one-instance-per-build is more mid- to long-term. |
@jyn514 could you explain more? You mean making it readonly when it's in the cache? Or when it's restored for the build? To me a correctness issue with security impact is worth digging into, if we can protect ourselves against ourselves ;) |
When it's restored. The idea is to prevent cargo (or build scripts) from modifying files it's already created in a previous build.
This is a correctness issue that does not have a security impact. |
So the build will ever only create files? And never update or delete them?
👍 |
that's what I was imagining, yeah. |
Crater currently shares target directories across ~all (up to disk space limits, so no per-crate property) crates it builds for a single run and compiler version. Historically Crater used a single target directory for both the baseline and new compiler versions, which regularly exposed us to errors. I didn't spend a lot of time tracking those errors down in terms of root cause, but they typically were of the shape "found artifact X compiled by different version of rustc" or something along those lines. I think I have seen cases where a crate won't build or will fail tests due to stale files, e.g., because it's build script assumes that it is only running once -- but with different features etc that may not be the case -- but I don't have links to those off hand. I think my advice here comes down to: be prepared for trouble if you do this, and make sure you have a way to disable it. Consider only enabling for known well-behaved crates, especially if you already have logic for those to automatically de-prioritize them in the queue. This would also make me feel better because relying on crate owners as a security boundary doesn't feel right to me. For example, I have publishing rights to some rust-lang-owned crates, but I wouldn't want to see that mean that there is a security boundary breached if I have a side-project auto-published crate from some random repository or whatever. It seems to me that the primary benefit is to already large projects, and ones we already treat specially in terms of queues etc, so we can readily mark those as sharing a build directory while not using any crate property.
👍 -- this seems like it would prevent most of the issues Crater saw.
Yes, that seems right in terms of current trajectory. I would like to move us eventually to having no or minimal sandboxing ourselves (in Crater, docs.rs, playground) but rather rely on boundaries made by others (e.g., EC2 instance isolation). |
You can't set We could use |
I'll keep that in mind, thanks! The deprioritization is currently only based on name patterns, which means there are false positives where crates from other orgs might be also deprioritized only because they share the same prefix. So we would have to keep in mind that crates from different owners might be mixed up under the same prefix.
The current idea is that the security boundary is not the owner, but the publisher.
So this sounds like it boils down to a decision between a manual opt-in for caching artifacts, and automated caching based on some criteria (publisher, ... )
I'll check out if / how that would work. I assume just setting |
Hm, this sounds you would prefer the current approach over using sccache? |
I definitely wanted to run some tests before starting to implement this |
4c3cc3c
to
5e9a80e
Compare
@jyn514 @Nemo157 I think this is ready for another round of checks. It would be awesome to have your feedback on what might be missing, from my side it's relatively complete. I'm not 100% certain about the disk-space safeguard, typically cache size limits are configured as cache-size, but that would have meant I have to |
In the meantime we got a much bigger EC2 machine for the docs.rs server, which improved build speed very much. For now I would park this and focus on other topics that I see more critical, and revisit it when it becomes a problem again. |
This is the first draft of my idea for the artifact caching between builds, to solve #1757 .
The basic idea is:
also,
I intentionally didn't compress the cache to I can easily just rename / move the folder.
My first short tests worked, but this needs validating by more people.
open questions
fs::rename
? Someone with a linux machine should test if if it would also fallback to the copy mechanism, which could be too slow for caching. There is also a chance I'm missing some docker specifics that make the move / symlink impossible.TODO