-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: port CircleCI jobs to GH Actions and improve sharness reporting #9355
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galargh thank you for investing time into this.
I'd like us to merge this, and run both for a while – ideally a full release cycle – to compare speed, developer experience, and cost.
Small asks/suggestions below, but other than that lgm. let me know when this is ready for merge (resolve your own TODOs below), and I'll make final look and merge.
if (`${context.repo.owner}/${context.repo.repo}` === 'ipfs/kubo') { | ||
return { | ||
labels: ['self-hosted', 'linux', 'x64', 'kubo'], | ||
parallel: 10, | ||
aws: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭
@galargh @BigLep
After we merge this PR, how long we need to run old setup on circleci and github actions with self-hosted worker on AWS in parallel to be able to compare costs? 1-2 months?
(Cost is only one of factors, we also need to account DX/UX, speed and security, so even if this is bit extra, we will still want to switch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I can get a ballpark estimate on the cost that will help me determine whether this is even worth paying attention. Making our setup easier to support/maintain and let external developers more easily fork and run CI is worth a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really rough estimate here and I totally might have missed something but I think it's gonna work out to something like this:
- GitHub Actions: ~$99 / month (30 days * $2 for ALB, Gateway, Lambda, etc. + 261 runs in the past 30 days * $0.15 for EC2, EBS, S3, etc.)
- CircleCI: ~$468 / month (780427 credits used in the past 30 days * $0.0006 per credit)
@lidel Thanks for the review :) I think I incorporated all of the feedback now. I'll deal with updating actions references in a subsequent PR (eg. #9355 (comment)). |
I've fixed some flaky tests and bumped this PR to have changes from master (waiting for CI to finish). |
makes it look similar to circleci + less visual noise
Unifies behavior by removing skip when upstream is broken. If this turns out to be too noisy, we can change it later.
Static assets do not depend on Kubo binary, so we cna run build once, and then run only e2e against Kubo binary to save some time.
User may be confused and only see failing step with GREP. This makes it more friendly.
makes easier to compare later
.github/workflows/sharness.yml
Outdated
files: kubo/coverage/sharness_tests.coverprofile | ||
- name: Aggregate results | ||
run: find kubo/test/sharness/test-results -name 't*-*.sh.*.counts' | kubo/test/sharness/lib/sharness/aggregate-results.sh > kubo/test/sharness/test-results/summary.txt | ||
- name: 👉️ If this step failed, go to «Summary» (top left), open the «HTML Report», and inspect the «Failures» column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
test/sharness/lib/sharness/aggregate-results.sh will include header with failing jobs, for example: failed test(s): t0172-content-routing-over-http.sh Printing the summary to stdout will make it possible to quickly eyeball what failed, without having to look at details in HTML Reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @galargh, HUGE quality of life improvement 🙌 ❤️
Pushed small tweaks and have one concern below, but nothing blocking this from being merged.
There is a lot of fixups, so I'm going to squash merge.
Let's see how this behaves during 0.18 and 0.19 releases,
and decide after 0.19 if we are ok with removal of CircleCI.
suites: | ||
- 'exchange-files' | ||
- 'files pin circuit ipns cid-version-agnostic ipns-pubsub pubsub' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 a potential concern: our CI will ignore any new suites added to https://github.com/ipfs/interop
@galargh I don't want to block merge on fixing this, but do you mind opening a separate PR that refactors it in a way that ensures we always run all suites (incl. ones added in future)?
Resolves #8991
Prepares for resolution of #8804 and #9084
In this PR I port CircleCI jobs to GitHub Actions. The newly added GitHub Actions are marked as
ci/gh-experiment
for now. We want to allow a transition period where both CircleCI and GitHub Actions jobs are running. After the transition period we're going to evaluate correctness (were the results the same?) and performance (are runtimes comparable?).2 out of the new workflows are running on self-hosted runners:
gobuild
andsharness
. The self-hosted runners were set up using: https://github.com/pl-strflt/tf-aws-gh-runner. Thekubo
runner configuration can be found here: https://github.com/pl-strflt/tf-aws-gh-runner/blob/013c56d4bf827c60186a38235a126b5418314461/runners.tf#L47-L71The transition period for the new workflows running on self-hosted runners should be ~1 month. The transition period for the rest of the workflows can be shorter.
Feature highlight: easy to know why sharness failed
As part of the migration work, I also improved sharness reporting (to make it usable in GitHub Actions). Sharness runs from GitHub Actions will output HTML reports (example) where both unknown breakages (a.k.a failures) and known breakages (a.k.a errors) are clearly marked, and stdout/stderr can be accesses per testcase/testsuite.
Testing