Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CircleCI with Github Actions #588

Merged
merged 4 commits into from
Mar 24, 2021

Conversation

rleh
Copy link
Member

@rleh rleh commented Mar 21, 2021

Let's see how much time Github Actions takes to run all tests...

grafik

@rleh rleh requested a review from salkinium March 21, 2021 13:58
@rleh
Copy link
Member Author

rleh commented Mar 21, 2021

Maybe I have to apply 15f9b3f again...

@salkinium
Copy link
Member

This is already looking very, very promising though…

@rleh
Copy link
Member Author

rleh commented Mar 21, 2021

"stm32f4-compile-all: No space left on device"

@salkinium
Copy link
Member

"stm32f4-compile-all: No space left on device"

It's the SCons CacheDir. We use to accelerate the compilation, since most generated files are identical. I'm checking if there's a way to cap the cache, since we probably don't need all of it.

@salkinium
Copy link
Member

Ok, there's no built-in method, let me see if I can hack something together…

@salkinium
Copy link
Member

I have a hacky solution I just need to clean it up a little…

@rleh
Copy link
Member Author

rleh commented Mar 21, 2021

Super, thank you 👍🏽

I'm currently trying to setup a bunch of self-hosted runners to check if that would result in further build time reduction.

@salkinium
Copy link
Member

There seems to be no difference between os.cpu_count() and 8.
Since we're not limited by the 4x parallelism I would perhaps start the long >=1h compile-all jobs already at the start: F7 (1h), L4 (>1h), F4 (>1h). Everything else can remain in that order to save a little CPU power (think of the penguins).

@salkinium
Copy link
Member

Meh, 1h20 is still a long running time. The free GH Actions tier has a 20 parallel jobs limit, so I'll check if we can split these jobs up into two 40mins jobs.

@salkinium
Copy link
Member

Ok, the current job splitting yields something like 10mins for the examples, +30mins for the compile all, so about 40mins total. I'd be happy with that. It will however slow down if multiple PRs run at the same time due to the 20 jobs limit.

@rleh
Copy link
Member Author

rleh commented Mar 21, 2021

Reminder: We should also test the website and api-docs jobs (by pushing to a develop-* branch).

@salkinium
Copy link
Member

Pushed a develop-gh-actions branch, but I still cannot find where the Workflow from hal.yml with manual trigger on: workflow_dispatch is in the GitHub UI.

@rleh
Copy link
Member Author

rleh commented Mar 22, 2021

I guess the workflow file has to be merged for the "start workflow" button to appear.

Should look like this:
grafik

@rleh
Copy link
Member Author

rleh commented Mar 22, 2021

Ahhh, but it does work on pull-requests.

But this could be a solution:

jobs:
  build:
    if: github.event.review.state == 'approved'`

https://gh.neting.ccmunity/t/feature-request-trigger-action-on-pull-request-approved/18413/3

@salkinium
Copy link
Member

Oh, nice, let me try that!

Copy link
Member

@salkinium salkinium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approval should trigger the HAL builds…

@salkinium
Copy link
Member

Oh, it worked!!!

The only limitation is that we cannot approve our own PRs… so how do we trigger this on our own PRs?

@rleh
Copy link
Member Author

rleh commented Mar 22, 2021

https://github.com/modm-io/modm/actions/runs/676361934:

  • Homepage deploy job failed because of sudo (see above)
  • api-docs job runs out of disk space

@salkinium
Copy link
Member

api-docs job runs out of disk space

I've added on-the-fly deduplication to the job, let's see if it remains <10GB or whatever it is.

Copy link
Member

@salkinium salkinium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this again…

@salkinium
Copy link
Member

Cloning into 'docs/modm.io'...
Host key verification failed.
fatal: Could not read from remote repository.

Something with the key is still wrong?

@salkinium
Copy link
Member

The api docs jobs now runs through, but the upload seems to not have worked correctly, at least the timestamp seems to be from the CircleCI job, not the GHA job.

The compile-all jobs now don't run the devices from the quick-test again, so that should perhaps save a few mins per job.

@salkinium
Copy link
Member

The only limitation is that we cannot approve our own PRs… so how do we trigger this on our own PRs?

I've added both the approving PR trigger and the manual workflow trigger, so it should be fine. (Unless GHA doesn't support both at the same time, but the docs didn't deny it).

@salkinium
Copy link
Member

Ok, @rleh I'm happy now with this splitting and config. 12mins for the fast CI, plus an additional 30mins after approval and manual trigger. I think we should clean this up, fix the docs jobs and then merge it, so that we have something faster now than the CircleCI mess. Then later we can switch to self hosting, perhaps optimize the splitting there again?

Copy link
Member Author

@rleh rleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approve

SSH problem in build-upload-docs job ist fixed, see https://github.com/modm-io/modm/actions/runs/683762422.

I squashed the commits together; ready to merge from my side!

@rleh
Copy link
Member Author

rleh commented Mar 24, 2021

https://github.com/modm-io/modm/runs/2186185100:

Error generating documentation for device stm32f405zgt7: [Errno 2] No such file or directory: '/tmp/tmp3i01lxvw/output/develop/api/stm32f437zit7/structmodm_1_1amnb_1_1Response.html'

Where does this error come from?

@salkinium
Copy link
Member

salkinium commented Mar 24, 2021

Race condition from the de-duplicator.

@salkinium
Copy link
Member

There was an issue with the small time between file.unlink() and file.symlink_to(), which caused a race condition. I think I've fixed this now.

@salkinium
Copy link
Member

I fixed the race condition and sped up interlinking code so that it saved 20mins or so… docs.modm.io is up-to-date.
I'll merge this after the CI passes.

Copy link
Member

@salkinium salkinium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if all required jobs pass.

@salkinium salkinium merged commit e062d39 into modm-io:develop Mar 24, 2021
@salkinium
Copy link
Member

\o/ yay

Thanks a lot!

@salkinium
Copy link
Member

Hm, well the manually triggered workflow only works for branches inside the repo, not for Pull Request. So I still cannot trigger it for my own PRs.

Maybe we can do something with an input.

@rleh rleh deleted the feature/ci_gh_actions branch March 25, 2021 01:31
@rleh
Copy link
Member Author

rleh commented Mar 25, 2021

So I still cannot trigger it for my own PRs.

Is that a real issue?

We could use a label (e.g. build-hal) instead of the approval: https://stackoverflow.com/a/62331521

Using multiple labels we could even only run some tests depending on the PR scope. Would save more penguins, but could cause us problems...

@salkinium
Copy link
Member

Is that a real issue?

Maybe not, perhaps the HAL quick tests are enough. Let's see how it goes. But someone needs to approve my PRs now, otherwise I cannot run the whole CI…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants