-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miserable reliability with macos-14 runners #10680
Comments
Hey @Galkon! Latest image version runners will provide more precious space to run your builds. And, if I understand correctly how chunks rendering works, it should resolve your issue. I don't see a clear reason for the crash in this log, but most likely the process didn't have enough disk space or other resources (which is less likely) to complete. |
Thanks for the swift response! Happy to be your guinea pig. Thursday and Friday we are typically doing releases anyways. |
I know adding a +1 to an issue is typically not a great practice but this also seriously impacts our private builds. I had filed a ticket with github support: https://support.github.com/ticket/personal/0/2996091 This had affected one of our 4 macos workflows, now affects 2, to the point that they never pass. As part of our debugging, we confirmed that When they do fail, there's 2 possible outcomes:
In both cases, the workflow time breakdown for the macos runner always caps out just over 30m, even though the step execution time (if available) doesn't align with that. The "instability" and attention by support has gotten to the point of considering a different vendor for mac-os based CI builds. If possible, please also include me on the test runs on thursday/friday. Happy to grant support access to our private repo as well for help debugging, or hop on a call to debug further. |
Hey @Galkon!
Please check on the workflows mentioned state and come back to us with the updates. 🙇 |
Hey @nsheaps! I'm not sure that your problem and the original problem stated in the issue are related. It's hard to deny such a possibility without logs, examples and a test project, yes. But the description doesn't really match up. If you think that the problem was related to disk space, then the last completed update should have solved it for you. But based on the description, I can say that it looks more like your workflow has exhausted the runner's resources. I can't say for sure without a workflow example and links to successful and unsuccessful launches - then I can at least track the version of the image used and the connection between the updates of the software used and the crashes. |
Thanks for the update. We will be running several builds throughout today and tomorrow, I will let you know if I run into any issues. |
I think in some way it is related, as we had stable performance prior, then some change was made and then our workflows started failing in weird ways.
We did note that disk space usage (measured by I think it's also somewhat important to note that this affected multiple workflows, not just the ones with high disk usage, so we'll give those a shot too. We've also documented internally to look for other mechanisms as the disk usage is coming from running our entire stack on the runner itself so we can run e2e tests with playwright, which isn't a lightweight task, but unfortunately that move is not a quick one to make. I'll report back later today with the results of our testing. If it ends up not working still, I'm happy to toss any info your way either here or in a new ticket. Appreciate the updates and attention here @erik-bershel ! |
@nsheaps 👋 |
@erik-bershel it looks like we are still having issues. The issue is the chunking issue mentioned in prior posts.
|
Hey @Galkon! Sorry to hear it. So, that wasn't disk originally. 😞 |
@erik-bershel looks like some of our failures on one workflow might have been related to our code but one of our workflows (an e2e test workflow) still is not working with Based on what you shared it sounds like perhaps the visionOS stuff was included in all available macos images but was only removed from macos14 so not sure if it will work, but I know a past commit run with On a known good commit, we added some debug info too, before and after snapshots of
Let me know what other info I can provide here EDIT: After noting the disk usage, there is definitely disk exhaustion. Docs say you have 14GB available and this shows 20 being used, though worth noting that space consumed can definitely also include space used by the runner and logs. What I don't understand is why the same commit could have such a drastically different performance (From working to absolutely broken), with the one possible exception of localstack being localstack and how they recently changed the container to download all the dependencies at runtime. Now that I mention that, I just took a look and our dockerfile for localstack doesn't lock the base image, gonna try locking it to something lower and see if that helps. |
Not sure if this is the proper place, but on macOS 14 Arm64 running Consider the following job: jobs:
test:
runs-on: macos-latest # macOS 14 Arm64 as of now
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Setup Node.js 22
uses: actions/setup-node@v3
with:
node-version: 22
cache: "yarn"
- name: This is where things happen
run: |
node --version # 22
source ~/.bashrc
node --version # 20 !!! |
Description
Over the last ~1 month, the macos-14 runners reliability has seriously degraded. We have used them since March, and used to have relatively no issues. We have absolutely no issues with our self-hosted macOS intel runner, so I am about to do the same for apple silicon.
At this point they are borderline unusable, having to manually delete xcode and iOS simulators just for the drive to have free space is a bit insane. And even with that, random failures that are not addressable by code or config changes frequently occur.
Platforms affected
Runner images affected
Image version and build link
Image: macos-14-arm64
Version: 20240918.8
Included Software: https://github.com/actions/runner-images/blob/macos-14-arm64/20240[9](https://github.com/highlight-ing/highlight/actions/runs/11022480898/job/30613014291#step:1:10)18.8/images/macos/macos-14-arm64-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/macos-14-arm64%2F20240918.8
Not public, but here is a link:
https://github.com/highlight-ing/highlight/actions/runs/11022480898/job/30613014291
Is it regression?
Yes.
https://github.com/highlight-ing/highlight/actions/runs/11021711293/job/30609399929
Image: macos-14-arm64
Version: 20240918.8
Included Software: https://github.com/actions/runner-images/blob/macos-14-arm64/20240918.8/images/macos/macos-14-arm64-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/macos-14-arm64%2F20240918.8
Expected behavior
Builds should reliably work and not randomly throw errors that go away when you re-run them half a dozen times.
Actual behavior
I play russian roulette with github actions to see if my builds work or if I need to spend 4 hours re-running until it does.
Repro steps
No reliable reproduction. It just happens sometimes (regularly) while building an electron app for macOS arm64 using electron-builder.
The text was updated successfully, but these errors were encountered: