Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC builds are taking 30 min in PR CI #122546

Open
RalfJung opened this issue Mar 15, 2024 · 13 comments
Open

GCC builds are taking 30 min in PR CI #122546

RalfJung opened this issue Mar 15, 2024 · 13 comments
Labels
T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Mar 15, 2024

Looking at the logs for the "PR - x86_64-gnu-llvm-16" and "PR - x86_64-gnu-tools" CI jobs, both of them seem to be spending 30 minutes building GCC. Excerpt from the llvm-16 logs:

2024-03-14T20:31:31.0313582Z #13 [8/8] RUN sh /scripts/build-gccjit.sh /scripts
2024-03-14T20:31:31.1420705Z #13 0.262 + cd /scripts
2024-03-14T20:31:31.1421386Z #13 0.262 + git clone https://github.com/antoyo/gcc gcc-src
[...]
2024-03-14T21:02:52.4850388Z #13 1881.6 + ln -s /scripts/gcc-install/lib/libgccjit.so /usr/lib/x86_64-linux-gnu/libgccjit.so
2024-03-14T21:02:52.6362274Z #13 1881.6 + ln -s /scripts/gcc-install/lib/libgccjit.so /usr/lib/x86_64-linux-gnu/libgccjit.so.0
2024-03-14T21:02:53.0817083Z #13 DONE 1882.2s

Notice the timestamps and the time printed at the end (that's from Docker, I think).

Not sure if these should be cached, but something seems to be going wrong here. Note that this happens before rustc tests are run, so this increase the latency for ui test failures by 30min (or by 50% above the baseline of not running GCC).

This probably regressed in #122042. (The build-gccjit.sh script did not exist before that PR.)

Cc @antoyo @GuillaumeGomez @Mark-Simulacrum

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 15, 2024
@RalfJung
Copy link
Member Author

@GuillaumeGomez wrote

It is run it 3 CIs: gnu-tools, llvm-16 and llvm-17.

What is the value of running it in more than one job? The LLVM version doesn't affect the GCC backend, does it?

But also, this really should be cached somehow.

@antoyo
Copy link
Contributor

antoyo commented Mar 15, 2024

@Mark-Simulacrum mentionned that the caching should be done automatically since the Docker builds are cached.

@jieyouxu jieyouxu added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. and removed T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Mar 15, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Mar 15, 2024

Docker builds aren't actually cached on PR CI, just on auto/try builds.

@Mark-Simulacrum
Copy link
Member

Did that regress recently? I feel like the S3 based caching did work on PR CI - I don't quite understand why we can't fetch (without ever putting) the cache from PR builds.

@Kobzol
Copy link
Contributor

Kobzol commented Mar 15, 2024

Hmm, I thought that this has never worked, because there was nothing to fetch, since nothing was uploaded. But now I realize that this was probably only the case for mingw-check and in general for PR workflows that were not also executed on auto/try builds. For the other workflows, it probably was possible to cache the images, since they were shared on PR and auto/try builds.

In that case the regression most likely came from the Docker caching change, where we stopped using S3 and switched to a Docker registry instead. I'll take a look if we can use caching on PR workflows where the Docker image is shared.

@GuillaumeGomez
Copy link
Member

Should be greatly improved once #122496 is merged.

@RalfJung
Copy link
Member Author

Yeah that speeds it up to around 7.5 minutes.

This question remains open, though:

What is the value of running it [the GCC backend] in more than one job? The LLVM version doesn't affect the GCC backend, does it?

@GuillaumeGomez
Copy link
Member

It doesn't. It's mostly about config I think.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 15, 2024

I don't know what you mean by that. We should only be running GCC tests (and building GCC) in one job, unless doing it in multiple jobs gives us more coverage. (Probably one PR job and one bors job, as AFAIK those are defined independently. But maybe they share base images or so, I don't know how that is set up.)

@GuillaumeGomez
Copy link
Member

I mean that we test backends if certain config are turned on. We just need to change them. I don't think running it in multiple CI is bringing any advantages. However I'm worried that just like when we changed the LLVM version, the tests for the GCC backend were dropped. iirc, it's when we decided to run GCC backend tests in all LLVM CIs.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 15, 2024 via email

@GuillaumeGomez
Copy link
Member

I'm not advising against doing it or anything. Just that even if low, chances are not null. We need to find a solution for this regression to not happen again. It would indeed be interesting to see how cranelift is tested.

@Kobzol
Copy link
Contributor

Kobzol commented Mar 15, 2024

Btw: #122563 should fix the caching problem, after it PR CI should be hopefully fast (again).

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants