Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Move non-coverage FreeBSD Cirrus CI jobs to GitHub Actions #15618

Merged
merged 4 commits into from
Oct 28, 2023

Conversation

kinke
Copy link
Contributor

@kinke kinke commented Sep 19, 2023

By running them in a qemu VM on Linux runners.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @kinke! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15618"

@kinke kinke force-pushed the gha_freebsd branch 4 times, most recently from a04e218 to d61933c Compare September 19, 2023 01:34
@kinke
Copy link
Contributor Author

kinke commented Sep 19, 2023

This primarily shows

  • the ugly required YAML duplication / no way to reuse the native job steps (YAML anchors support wouldn't help here either),
  • the poor visual separation of substeps in the log, and
  • the surprisingly bad performance, at least with qemu on Linux here - just managing to build DMD and start the DMD testsuite in an hour, which hints at something like a 10x slow-down vs. a native job. Never seen such a slowdown with qemu on my local Linux box, where the performance is near-native.

@kinke
Copy link
Contributor Author

kinke commented Sep 19, 2023

Okay, the performance on Mac runners is much better - the ci/run.sh build step takes 5-6 minutes in the FreeBSD VM (derivable by enabling the log timestamps in the UI - this took something like 35 minutes in the qemu VM on Linux), vs. 2-3 minutes for the native Linux/Mac jobs. The 'unit' tests fail (also did in the qemu VMs on Linux), otherwise the DMD testsuite at least passes.

@ibuclaw
Copy link
Member

ibuclaw commented Sep 20, 2023

23-09-19T16:49:41.2233380Z 1) global.deinitialize
2023-09-19T16:49:41.2235820Z core.exception.AssertError@test/unit/deinitialization.d(152): unittest failure

This appeared 7x - once for each deinitialize member function.

assert(objc is null);

Does FreeBSD even have objc support in dmd?

@kinke
Copy link
Contributor Author

kinke commented Sep 20, 2023

Right interesting, I didn't realize it's complaining about a single assertion. I find it bizarre that this unittest is apparently invoked multiple times, makes absolutely no sense (Objc.deinitialize, the unittest with that assertion, should be the only failure). The failing assertion hints at that the frontend state hasn't been deinitialized at the time the test is run - dmd.objc.objc is null after dmd.objc.Objc.deinitialize(), and set in dmd.objc.Objc._init() (to a class instance of either dmd.objc.{Supported,Unsupported}).

So my gut feeling is that we only see follow-up errors, maybe a missing dmd.frontend.deinitializeDMD() after a real-failing test - there are lots of

@beforeEach initializeFrontend()
{
    import dmd.frontend : initDMD;
    initDMD();
}

@afterEach deinitializeFrontend()
{
    import dmd.frontend : deinitializeDMD;
    deinitializeDMD();
}

for these tests.

@ibuclaw
Copy link
Member

ibuclaw commented Oct 26, 2023

Only just got round to running these in my own FBSD VMs, all unit tests pass. Rebase and re-run?

@kinke
Copy link
Contributor Author

kinke commented Oct 26, 2023

Well IIRC, the tests were just fine in the Cirrus FreeBSD VMs before too. So I very much doubt a rebase changes anything, but I'll try. Edit: Ah okay, you did, but no changes apparently.

@ibuclaw
Copy link
Member

ibuclaw commented Oct 26, 2023

Shame, I've ran the github steps as a script directly on my VM, and it doesn't trigger.

@ibuclaw ibuclaw force-pushed the gha_freebsd branch 4 times, most recently from f2ff760 to b2af2a0 Compare October 26, 2023 22:12
@ibuclaw
Copy link
Member

ibuclaw commented Oct 26, 2023

@kinke looks like all I can do is fudge the test/run.d file to skip the unittest target on FreeBSD.

However, it looks like the std.concurrency unittest has hung. 🤷

@kinke
Copy link
Contributor Author

kinke commented Oct 26, 2023

Oh well, thanks a lot for trying though! 👍 - I've just seen that the Action has been updated to v0.21.0 very recently; bumping it here (currently using v0.19.0) might be worth a shot: https://github.com/cross-platform-actions/action/releases

@ibuclaw
Copy link
Member

ibuclaw commented Oct 27, 2023

Oh well, thanks a lot for trying though! 👍 - I've just seen that the Action has been updated to v0.21.0 very recently; bumping it here (currently using v0.19.0) might be worth a shot: https://github.com/cross-platform-actions/action/releases

Unless most of the changes in the VMs available are made behind the scenes, I doubt it.

Having a look at the alternative vmactions/freebsd-vm - it looks like since 2 weeks that VM doesn't boot anymore.

@ibuclaw
Copy link
Member

ibuclaw commented Oct 27, 2023

Updated to v0.21.0 - looks like both have hung again. The logs aren't rendering for me atm, but appears to be this time in different places to before. So maybe the VMs themselves are unstable to be used for 15 minutes at a time? GItHub says its hosted OSX runners have 1 CPU, and 14G memory. The cross-platform action says it gives its FreeBSD VMs 6GB memory by default. I'm not convinced hitting some resource limit is occurring, but I have seen hanging before because of lack of memory.

@kinke
Copy link
Contributor Author

kinke commented Oct 27, 2023

GItHub says its hosted OSX runners have 1 CPU, and 14G memory. The cross-platform action says it gives its FreeBSD VMs 6GB memory by default.

The hosted Mac runners have 3 cores (https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources). And cross-platform uses these 3 by default, and 13G of memory (https://github.com/cross-platform-actions/action#inputs). Maybe the 1GB for the host is a tad too low? We surely don't need 13G in the VM, 8G should be more than enough.

@ibuclaw
Copy link
Member

ibuclaw commented Oct 27, 2023

Spoken to Jacob, and freezing is probably related to cross-platform-actions/action#29 and/or cross-platform-actions/action#61.

Going to try MacOS+QEMU combination.

Last test results: FBSD13 passed, FBSD12 hung.

@ibuclaw
Copy link
Member

ibuclaw commented Oct 27, 2023

@kinke looks like both succeed with qemu hypervisor? Just going to undo some of the other stuff to see if that trend continues...

https://github.com/dlang/dmd/actions/runs/6671677680/job/18134093839?pr=15618

@kinke
Copy link
Contributor Author

kinke commented Oct 27, 2023

Oh wow - and the performance seems to at least match the default other hypervisor's, way better than on the Linux host! I must admit I didn't even consider giving the Mac+qemu combo a try after the first results.

I guess the 'unit' tests still fail - it did with qemu on Linux for the first experiments.

@ibuclaw
Copy link
Member

ibuclaw commented Oct 27, 2023

With N=3 it's down to 20 minutes.
https://github.com/dlang/dmd/actions/runs/6672127402/job/18135449280?pr=15618

@kinke
Copy link
Contributor Author

kinke commented Oct 27, 2023

Looks like there's room to shave off an additional ~1m20s (from https://github.com/dlang/dmd/actions/runs/6672331301/job/18136041945?pr=15618):

Fri, 27 Oct 2023 21:46:01 GMT Tearing down VM
Fri, 27 Oct 2023 21:46:01 GMT   Syncing back files
Fri, 27 Oct 2023 21:46:01 GMT   /usr/bin/rsync -uzrtopg runner@localhost:work/ /Users/runner/work
Fri, 27 Oct 2023 21:46:03 GMT   skipping non-regular file "dmd/dmd/dmd"
Fri, 27 Oct 2023 21:46:03 GMT   skipping non-regular file "dmd/phobos/generated/freebsd/release/64/libphobos2.so"
Fri, 27 Oct 2023 21:46:03 GMT   skipping non-regular file "dmd/phobos/generated/freebsd/release/64/libphobos2.so.0.105"
Fri, 27 Oct 2023 21:47:20 GMT   Shuting down VM

We don't need any syncing, we don't produce any artifacts etc.

Edit: Should be sync_files: runner-to-vm AFAICT (copy the DMD repo into the VM, but don't copy any build artifacts back to the host).

@ibuclaw
Copy link
Member

ibuclaw commented Oct 27, 2023

Using the default memory (13G) and it's back up to 27-30 minutes again. I guess still within reasonable times given it's a VM on a shared runner. Reducing memory back down to 8G anyway.

https://github.com/dlang/dmd/actions/runs/6672331301/job/18136041945?pr=15618

I also noticed it takes about a minute and half to tear down the VM, looks to be consumed by rsync'ing the files in the VM back to the runner. Trying sync_files: runner-to-vm to shave that time off.

@ibuclaw
Copy link
Member

ibuclaw commented Oct 27, 2023

We don't need any syncing, we don't produce any artifacts etc.

Edit: Should be sync_files: runner-to-vm AFAICT (copy the DMD repo into the VM, but don't copy any build artifacts back to the host).

Ah! I see we're talking over each other. I already had the change primed, was just waiting for the run to finish to count just how long it takes. :-)

@kinke
Copy link
Contributor Author

kinke commented Oct 27, 2023

Using the default memory (13G) and it's back up to 27-30 minutes again.

Oh, if that's consistent, we should probably tell Jacob to change the default value, at least on macOS, giving the host more room to breathe.

@kinke kinke marked this pull request as ready for review October 27, 2023 23:15
@kinke
Copy link
Contributor Author

kinke commented Oct 27, 2023

LVGTM.

Comment on lines +430 to +432
version (FreeBSD) { /* ??? unittest runner fails for no good reason on GHA. */ }
else
newTargets ~= createUnitTestTarget();
Copy link
Member

@ibuclaw ibuclaw Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say this is for no good reason - however when commenting out deinitialization.d:152 explicitly the same module then just failed at deinitialization.d:131.

This could suggest that al globals have an initial value other than null. Which begs the question: Why?, but actually also Why not? - afterall as a security feature, you can tell a compiler to default initialize all variables with a bit mask value (as opposed to zeros or garbage). Though I am struggling to justify this for global data. But equally surely you're testing it wrong if you're checking the value before it's been initialized. 🤷

@ibuclaw ibuclaw merged commit a8fa992 into dlang:master Oct 28, 2023
42 of 43 checks passed
@ibuclaw
Copy link
Member

ibuclaw commented Oct 28, 2023

Need to really get all pipelines green again, and this is progress in getting there.

@kinke kinke deleted the gha_freebsd branch October 28, 2023 16:54
@jacob-carlborg
Copy link
Contributor

@kinke FYI, the Linux runners don't support hardware accelerated nested virtualization (KVM) while the macOS runners do (Hypervisor framework). That's why the big difference in performance.

@jacob-carlborg
Copy link
Contributor

no way to reuse the native job steps

It depends on what the steps do. I try to do as little as possible inside the VM. Because of performance and to be able to use the GitHub action steps. That's also why files are synced in both directions. For example, you can do a checkout before the VM step and then use a step to create GitHub release after the VM step.

@kinke
Copy link
Contributor Author

kinke commented Oct 28, 2023

I try to do as little as possible inside the VM.

I fully understand that it'd be total overkill to be able to run arbitrary actions in the VM, with all that node.js crap etc. I was just comparing this to the Cirrus CI approach, where the VM setup is done as built-in job properties (e.g., https://github.com/ldc-developers/ldc/blob/9928895c548af3789c7fcc7947d5e9b3ac152ae9/.cirrus.yml#L307-L314), and the steps are basically limited to bash steps AFAIK, so the native and VM steps can be trivially shared via YAML anchors (https://github.com/ldc-developers/ldc/blob/9928895c548af3789c7fcc7947d5e9b3ac152ae9/.cirrus.yml#L376 - another feature that GHA doesn't support).

Considering the GHA limitations/flexibility, I can't think of anything your action could do better to improve our use case here. :) Thanks a lot for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants