Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: port CircleCI jobs to GH Actions and improve sharness reporting #9355

Merged
merged 66 commits into from
Jan 13, 2023

Conversation

galargh
Copy link
Contributor

@galargh galargh commented Oct 19, 2022

Resolves #8991
Prepares for resolution of #8804 and #9084

In this PR I port CircleCI jobs to GitHub Actions. The newly added GitHub Actions are marked as ci/gh-experiment for now. We want to allow a transition period where both CircleCI and GitHub Actions jobs are running. After the transition period we're going to evaluate correctness (were the results the same?) and performance (are runtimes comparable?).

2 out of the new workflows are running on self-hosted runners: gobuild and sharness. The self-hosted runners were set up using: https://github.com/pl-strflt/tf-aws-gh-runner. The kubo runner configuration can be found here: https://github.com/pl-strflt/tf-aws-gh-runner/blob/013c56d4bf827c60186a38235a126b5418314461/runners.tf#L47-L71

The transition period for the new workflows running on self-hosted runners should be ~1 month. The transition period for the rest of the workflows can be shorter.

Feature highlight: easy to know why sharness failed

As part of the migration work, I also improved sharness reporting (to make it usable in GitHub Actions). Sharness runs from GitHub Actions will output HTML reports (example) where both unknown breakages (a.k.a failures) and known breakages (a.k.a errors) are clearly marked, and stdout/stderr can be accesses per testcase/testsuite.

Screenshot 2022-12-02 at 14-58-51 feat port CircleCI jobs to GH Actions and improve sharness reporting · ipfs_kubo@79f8457

Screenshot 2022-12-02 at 14-58-07 Unit Test Results

Testing
  • CI on this PR passes for GitHub Actions
  • CI on this PR passes for CircleCI

@lidel lidel self-assigned this Nov 15, 2022
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galargh thank you for investing time into this.

I'd like us to merge this, and run both for a while – ideally a full release cycle – to compare speed, developer experience, and cost.

Small asks/suggestions below, but other than that lgm. let me know when this is ready for merge (resolve your own TODOs below), and I'll make final look and merge.

.github/workflows/build.yml Outdated Show resolved Hide resolved
.github/workflows/build.yml Outdated Show resolved Hide resolved
Comment on lines +20 to +24
if (`${context.repo.owner}/${context.repo.repo}` === 'ipfs/kubo') {
return {
labels: ['self-hosted', 'linux', 'x64', 'kubo'],
parallel: 10,
aws: true
Copy link
Member

@lidel lidel Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭
@galargh @BigLep
After we merge this PR, how long we need to run old setup on circleci and github actions with self-hosted worker on AWS in parallel to be able to compare costs? 1-2 months?
(Cost is only one of factors, we also need to account DX/UX, speed and security, so even if this is bit extra, we will still want to switch).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I can get a ballpark estimate on the cost that will help me determine whether this is even worth paying attention. Making our setup easier to support/maintain and let external developers more easily fork and run CI is worth a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really rough estimate here and I totally might have missed something but I think it's gonna work out to something like this:

test/sharness/lib/test-generate-junit-html.sh Show resolved Hide resolved
@galargh galargh requested a review from lidel December 5, 2022 12:52
@galargh
Copy link
Contributor Author

galargh commented Dec 5, 2022

@lidel Thanks for the review :) I think I incorporated all of the feedback now. I'll deal with updating actions references in a subsequent PR (eg. #9355 (comment)).

@lidel
Copy link
Member

lidel commented Jan 5, 2023

I've fixed some flaky tests and bumped this PR to have changes from master (waiting for CI to finish).
Sadly run out of time before long weekend, but will review next week.

lidel and others added 6 commits January 13, 2023 14:23
makes it look similar to circleci + less visual noise
Unifies behavior by removing skip when upstream is broken.
If this turns out to be too noisy, we can change it later.
Static assets do not depend on Kubo binary, so we cna run build once,
and then run only e2e against Kubo binary to save some time.
User may be confused and only see failing step with GREP.
This makes it more friendly.
makes easier to compare later
files: kubo/coverage/sharness_tests.coverprofile
- name: Aggregate results
run: find kubo/test/sharness/test-results -name 't*-*.sh.*.counts' | kubo/test/sharness/lib/sharness/aggregate-results.sh > kubo/test/sharness/test-results/summary.txt
- name: 👉️ If this step failed, go to «Summary» (top left), open the «HTML Report», and inspect the «Failures» column.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

test/sharness/lib/sharness/aggregate-results.sh
will include header with failing jobs, for example:

failed test(s): t0172-content-routing-over-http.sh

Printing the summary to stdout will make it possible to quickly eyeball
what failed, without having to look at details in HTML Reports.
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @galargh, HUGE quality of life improvement 🙌 ❤️

Pushed small tweaks and have one concern below, but nothing blocking this from being merged.

There is a lot of fixups, so I'm going to squash merge.

Let's see how this behaves during 0.18 and 0.19 releases,
and decide after 0.19 if we are ok with removal of CircleCI.

Comment on lines +45 to +47
suites:
- 'exchange-files'
- 'files pin circuit ipns cid-version-agnostic ipns-pubsub pubsub'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 a potential concern: our CI will ignore any new suites added to https://github.com/ipfs/interop

@galargh I don't want to block merge on fixing this, but do you mind opening a separate PR that refactors it in a way that ensures we always run all suites (incl. ones added in future)?

@lidel lidel merged commit d4cd414 into master Jan 13, 2023
@lidel lidel deleted the ci/move-to-github-actions branch January 13, 2023 23:50
@lidel lidel restored the ci/move-to-github-actions branch January 14, 2023 00:05
@lidel lidel deleted the ci/move-to-github-actions branch January 14, 2023 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Improve sharness tests failure reports
3 participants