Skip to content

Commit

Permalink
Feature/docs (#6)
Browse files Browse the repository at this point in the history
* Describe archtecture basics

* Update docs

* Update docs

* Update docs

* Update doc

Co-authored-by: Gertjan Maas <gertjan.maas@philips.com>
  • Loading branch information
npalm and gertjanmaas authored May 12, 2020
1 parent 1e4320f commit 1b9a245
Show file tree
Hide file tree
Showing 3 changed files with 106 additions and 2 deletions.
103 changes: 102 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,109 @@

> WIP: Module is in development
This [Terraform](https://www.terraform.io/) modules create the required infra structure needed to host [GitHub Action](https://github.com/features/actions) self hosted runners on [AWS spot instances](https://aws.amazon.com/ec2/spot/). All logic required to handle the lifecycle for an action runners is implemented in AWS Lambda functions.

## Motivation

GitHub Actions `self hosted` runners provides you with a flexible option to run your CI workloads on compute of your choice. Currently there is no option provided to automate the creation and scaling of action runners. This module takes care of creating the AWS infra structure to host action runners on spot instances. And provides lambda modules to orchestrate the lifecycle of the action runners.

Lambda is chosen as runtime for two major reasons. First it allows to create small components with minimal access to AWS and GitHub. Secondly it provides a scalable setup for minimal costs that works on repo level and scales to organization level. The lambdas will create Linux based EC2 instances with Docker to serve CI workloads that can run on Linux and/or Docker. The main goal is here to support Docker based workloads.

A logical question would be why not Kubernetes? In the current approach we stay close to the way the GitHub action runners are available today. The approach is to install the runner on a host where the required software is available. With this setup we stay quite close to the current GitHub approach. Another logical choice would be AWS Auto Scaling groups. This choice would typically require much more permissions on instance level to GitHub. And besides that, scaling up and down is not trivial.

## Overview

The process of scaling runners on demand starts by registering a GitHub App which sends a [check run event](https://developer.github.com/v3/activity/events/types/#checkrunevent) via a webhook to the API Gateway. The Gateway triggers a lambda which will verify the signature and filter for queued build events. Accepted events are posted on a SQS queue. Messages on this queue will be delayed for a configurable amount of seconds to give the available runners time to pick up this build.

In case the build is not picked up yet and no limits are reached the lambda requests a registration token for a new runner at GitHub, stores the token in the SSM parameter store and starts an EC2 instance via a launch template. The EC2 instance installs the required software via a [`user_data`](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) script, fetches and deletes the registration token from SSM and configures the action runner.

Scaling down the runners is at the moment brute-forced, every configurable amount of minutes a lambda will check every runner (instance) if it is busy. In case the runner is not busy it will be removed from GitHub and the instance terminated in AWS. At the moment there seems no other option to scale down more smoothly.

Downloading the GitHub Action Runner distribution can be occasionally slow (more than 10 minutes). Therefore a lambda is introduced that synchronizes the action runner binary from GitHub to an S3 bucket. The EC2 instance will fetch the distribution from the S3 bucket instead of the internet.

![Architecture](docs/component-overview.svg)

Permission are managed on several places. Below the most important ones. For details check the Terraform sources.

- The GitHub App requires access to actions and publish `check_run` events to AWS.
- The scale up lambda should have access to EC2 for creating and tagging instances.
- The scale down lambda should have access to EC2 to terminate instances.

Besides these permissions, the lambdas also need permission to CloudWatch (for logging and scheduling), SSM and S3.

## Usages

Examples are provided in [the example directory](examples/). Please ensure you have installed the following tools.

- Terraform, or [tfenv](https://github.com/tfutils/tfenv).
- Bash shell or compatible.
- TODO: building lambda ?
- AWS cli

The module support two main scenarios for creating runners. On repository level a runner will be dedicated to only one repository, no other repository can use the runner. On organization level you can use the runner(s) for all the repositories within the organization. See https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners for more information. Before starting the deployment you have to choose one option.

The setup consists of running Terraform to create all AWS resources and configure the GitHub App. The Terraform module requires configuration from the GitHub App and the GitHub app requires output from Terraform. Therefore you should first create the GitHub App, configure the basics. Then run Terraform and finalize the configuration of the GitHub App afterwards.

### Setup GitHub App (part 1)

Go to GitHub and create a new app. Beware you can create apps your organization or for a user. For now we handle only the organization level app.

1. Create app in Github
2. Choose a name
3. Choose a website (mandatory, not required for the module).
4. Disable the webhook for now (we will configure this later).
5. Repository permissions, enable `Checks` to receive events for new builds.
6. _Only for repo level runners!_ - Repository permissions, `Administration` - Read and Write (to register runner)
7. _Only for organization level runners!_ - Organization permissions, `Administration` - Read and Write (to register runner)
8. Save the new app.
9. Next generate a private key on the General page.
10. Make a note of the following app parameters: app id , client ID, and client secret

### Setup terraform module

1. Create a terraform workspace and initiate the module, see the examples for more details.

```terraform
module "runners" {
source = "git::https://github.com/philips-labs/terraform-aws-github-runner/"
aws_region = "eu-west-1"
vpc_id = "vpc-123"
subnet_ids = ["subnet-123", "subnet-456"]
environment = "gh-ci"
github_app = {
key_base64 = "base64string"
id = "1"
client_id = "c-123"
client_secret = "secret"
webhook_secret = "secret"
}
enable_organization_runners = true
}
```

2. Run terraform by using the following commands

```bash
terraform init
terrafrom apply
```

3. Check the terraform output for the API gateway url, which you need in the next step.

### Setup GitHub App (part 2)

Go back to the GitHub App and update the following settings.

1. Enable the webhook.
2. Provide the webhook url, should be part of the output of terraform.
3. Provide the webhook secret.
4. Enable the `Check run` event for the webhook.

## Examples

## Philips Forest

Expand All @@ -14,7 +115,7 @@ This module is part of the Philips Forest.
/ __\__ _ __ ___ ___| |_
/ _\/ _ \| '__/ _ \/ __| __|
/ / | (_) | | | __/\__ \ |_
\/ \___/|_| \___||___/\__|
\/ \___/|_| \___||___/\__|
Infrastructure
```
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture.drawio
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<mxfile host="Electron" modified="2020-04-30T12:17:18.456Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/13.0.1 Chrome/80.0.3987.163 Electron/8.2.1 Safari/537.36" etag="9ghISsadq6Y1mu9wezsm" version="13.0.1" type="device"><diagram id="L_F16tsUUGYpfsHa_sY0" name="Page-1">7Vzbcts2EP0azbQP1ZAAb3qU5EvSOm1sJ+OmLx5IhCnGEMGCUCT76wtQoEQClE1HtCXVSuwJsQRx2d1zsFiQ6cDhdHHOUDr5RENMOsAKFx140gHAtnpA/CMlD0tJL+gtBRGLQ1VpLbiOH3HxpJLO4hBnlYqcUsLjtCoc0yTBY16RIcbovFrtjpJqrymKsCG4HiNiSm/ikE+W0gD4a/kHHEeTomfbU/OboqKymkk2QSGdl0TwtAOHjFK+vJouhphI5RV6WT53tuHuamAMJ7zJAxahX8nCu/zny+P4CnwLvqHL778BZZ4fiMzUjPs310IwJHQWqoHzh0IbKY0TnmvUHYgf0eHQ6rjizlCWusDVBHrZrwpssyTbqAr0sl8V2Hrztta/rQ+wJDBKleYtrX+rNEDxAwd0xkmc4OHK9ywhjBgKY2GTISWUCVlCE6G9wYRPiSjZ4nI+iTm+TtFYanUucCNkdzThyvttUJSV4mWrwntSeT1dRBJoXTTPnG7E6CzNu/wo/L/27q24vB3nxhSNcEbvcTGwDoDi75n0lsFdTIg24B+Y8VgAoU/iSLbNqewKqRLBd1y2KGYRJ9FFXjqBlhp5XRchyiY4VNMxnVf5s+wVL0oi5cznmE4xZw+iSnHXU8BSzFIU52uYeo67lE1KEO35qiJS1BCtml6jR1woAL0ATJ6BJQNAOBTsooqU8QmNaILI6Vo6EEZLwpWe1nUuqNR/7j/fMecPylnQjNOqd21UbUZnbIyfGH/Br4hFmD9Rz1nWk3N50lAME8TjH1UmbV3rtqH1/uePQnCOOJ6jB8MEtaDd5LQ6mMW9nuOenIHSvZOYiYbiHICJNJiGJ/GM24fWwK1D4F3+R4dHgb0LNMLkM81i1fyIck6nz4JzLEaFWdUtniMYlKVLddzFCxxuYhyGl1605JuBKNYxD0rj20ipvxWsw6CKdRvaXdeAux+YaC9krbuduXDe4NGE0vufwHzJTDgJ+zJskcYmdHwvRWSUlwub526EGC/qKb4WT57FpGjHcDTXcQeeYxDMz7MFbMgWdkO2KBnStpS5GzOIau6zjE/WbgN8zW2K4KxoYjlJ9VQ5dtIb8vSGgsL/iqaWejCayn1rNctG7vbno3fh8WDu3d7/yX+/v/lj7KSFtkvu9kH0LCyH2TTOMkEPWR7essNacmqcqHb6TZ1oS4fp+YVZlaFB4L2eoZ/SSMnQ5zGfzEZC1k/TDvCIjLtGws5exFe6L9m7spDULnYlyxkrBtFWnNWNn1yS9FVkjgWtRTTrRmpWW68NOjTrl4aaQNB3N7vHVkuDf+BxoNOQ2d29igOdxgvythHgmR+cWs7LIsATyx3a/ruJAAmajkLUTvDnwr0L/mzbcKq9RrTbNFaz9grS7iZIX86wkADr+lLmqsxF8ZcTMb4HHH7CWYYi/Gv7FAAdJ3jhJnAwtKHrvRsKyP7N2sG/p0fxu8c/MJf4IcNix5sdFi8UeH+WGIC3V8RQjLuk/zxnfzUTeGZZDSEIpI9nTGhoLEdpvwIjHIOCtwwKdFIAPTP9+7YhQWC45KmYjhzy3+J3Giezw6MHr2nc4O8VPUBzL1DYIpQpybP/qRkg3Csz2A0ORI6B2KvTbn76N0d8PGmHeu1gz6h3NYVDTbgU9Pk8z/baBnh9HhR6QcXETqAlzl85CWo3SKFtxRzqzKQZMajKe8cJ+dsYmImlbflShr2JJ7CsccvjKb4doSx/tgUe0A/lds8DZgj2RR6PJAe4Mes1XfKD/Vrye4YJhgSjZJautmbHfdch77v0ZOzuQW8emG1ytZqXnMxj8r43DLyyA9gb7avbTPPaVVMtqB1YVbX7NWp3g5pDrtWD7W+ygKF40BXlKxzFGc/1b7HcEodFvaDm+GuXZ9K2Znqov7/2yrEYMDfT/ZxfpYGvlIEt80DkSOyvv7Oj03TGcUsUA6uvPgTOrpndTB/UOB48Ot6BO56eyt2540Ezn2B3T+g8IRSF0vniJOOIkINb2ZpuKsB+veELzE2FTgTyaiA2eiyu2eltSwie1YfQfxkhAN+37feTY8zg6yxCHtw1F3iHhXHY9LVcuF9vb0FzG9cujo8Zv/Yz/w7YNTr9A0Nn0xexQOuJ/u30bL6J9TUNEcf5mtv+NzbHEPxNk3oQVHHtWt6OcW1m8mu2fi2vEEe/e/Otn55Nfsut31M5vbqvL/5iEUriR5SbS3e8d/7RBbRq7NbOJxeiuP5GfpnTXP9PA/D0Pw==</diagram></mxfile>
<mxfile host="Electron" modified="2020-05-12T08:11:10.930Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/13.0.3 Chrome/80.0.3987.163 Electron/8.2.1 Safari/537.36" etag="C6iqQcv9BustWJvTIf3e" version="13.0.3" type="device"><diagram id="L_F16tsUUGYpfsHa_sY0" name="Page-1">7Vttc5s4EP41nrn7kAwg3vzRL0mu1/aaNO1lel88spExDUaukGMnv/4kEBgk2SExzkuTtjNFi5Bg99lndyW5Awbz9RmBi9lnHKC4YxnBugOGHcsyu2aX/cclt7mk6wtBSKJAdNoILqM7JISGkC6jAKW1jhTjmEaLunCCkwRNaE0GCcGrercpjuuzLmCIFMHlBMaq9CoK6CyX+pa3kf+FonBWzGy64vvmsOgsviSdwQCvKiJw0gEDgjHNr+brAYq58gq95M+dbrlbvhhBCW3ygBHj7/Havfjv293kq/XD/wEvfh7xB/gwNzBeii/uXV0ywSDGy0C8OL0ttLHAUUIzjTp99o9NODA6Drsz4K1jy5EEcturC0y1xceoC+S2VxeY8vCmNL8pv2BFoLRqwxvS/EblBdk/0MdLGkcJGpTYM5gwJDCImE0GOMaEyRKcMO31Z3Qes5bJLleziKLLBZxwra6Y3zDZFCdUoN+0irZQPB+VoWfBr+frkDvaMVyl9nFI8HKRTfmB4V97d8QuR5PMmGwQSvA1Kl6sYwH295SjpT+N4lh64RtEaMQcoRdHIR+bYj4VFK0YTSkfkX1FlISfstYQGOLNdVMEMJ2hQHyOCl6BZz4rWldEAsxnCM8RJbesS3HXFY4lmMUWzdXGTV3byWWziov6BbVAQQ1hOfTGe9iFcCC9M33p+0M4I9NRAs3Rx/Pedfrh36Ou4ktDvEpiDJkjGeMogWKOqkehgNGNaGJCZzjECYxPNtI+s2ISlIrb9PmEuUEyQP1ElN4K9MAlxXW4bdV1ipdkgnaxAxCMC0mI6K6O4tP51+y0HUExpNFNnVx1hhCPnnPC2djctZyazd2uWx8if1Px1MacPULgbaWb4LHt83S79XmAxK0P688u8jfYYKvUSSO47bROBW9fl4yJSKqgTOPeChc4PXfgu1VHNbeygMxOks+XQ7Xg5o5Zd3Og8XNLqLvm55YnYeMxfq5VvKvo/fzL5TcO/xskcPfELh7DMYrPcRrRKIsDE/YeiFTM90nqUCdzpXth7TGmFM/34pAiabuPQuy2GWQvE5uKiXvnH5jgDFK0giqNazOBbZFQzhDYva7tDE+tyr1hRNhAubESjggpSGdeBoy+o3PlafZHjrnbsFAaeXfEL0FSwd19WQtMF7k6ptEaBdvSGIJyFOVJTJ81dekMXESjUKi/FWYBBqgxi2mpzOL5KrEUstZBp+biV2g8w/j6EZRSMRJKgh6vhLipYzy55qJ4nLULi2cggoQW/USMYE+eRnExjhoxbKfv2gp/PZ4rmqYbZkOuqBjSNIRx98xALK8ejsp6rxgi/0glA1EHcuWB/GOnUTLTVv6gpg9nEZ0tx0zWWyw6lhvzDH9M2FVIS7tWUFhjFy0DVlBY0AjJHUkJWaKg2EZS8ygIYl1ZUt6QqWWFGNpDnB6H4qv2JgxX5gug8oWm4PCc7SDbq96wfDUBRL+WKOUwI8vkJWUk95i3HmoKjBwgHbEbUozTdjryqDrEAfW6whJsc9C6wm4chvbNek49/8SwH5b1DA1nYHpvJuuJ4XwcwJZKKUMKOU+Y8OgJTF0xeQ6uejybOE0TFuNF0IltSbW0sXtZQ+5vG0+wrOFso5+LJWISy7i84AvUan7yx5Ap7xYFn1GawhD92T5dAdv2H1ik9QcmcNw3Q1fpr/QwxVm5xPNcXGXqkq0wSrm6s2wr4RfPn2i9pKUfQ6XHf+7cTy71V+7o+h/69/XVx4m9aFzP7Vm7mbZEZ/IQLa0ey6WdbYKdNHtP/zrNbp4uXgdPpyk6SIVYGLCC+Xxr1DLKlWZjuXjPC193XgikvBD41pNxrX4hTOXab4jMowRSpG5uvHGSbbr1Jox9ZBwbwBcK3pNQHcOrAcfxvEaEeiAO08dtzS4ZmuMb9B61dwOqtai9AZ7jmdZ+wHuCmKfWpGrM4/v671Hv94p6DvCfOeqpyOtl5k9r2OvzgySRJg7uiz/X6AHgPQx/lueZ5hsqb0E72HOljN/tPnN1W1TXr2XhDTTeKmz9ZNJ+ejZ/Uz2D1jdM9uJSoGZ97fKl2CFvttonOr84JswWMhA54ZuEqRhEy47ZNuKIRnM0GsM0e/YAEfgpWVCPGu+VeWfT/Yf2z2fut5aqHq/6vsjPyWrUn3lIR3uYYGsdVJ4KYIE8uoPj0gfqJ9iHWvXvxIYC7/K3B2KWTvV4/5ZCCLjArUF/z7KodKjaoEdSNd5K0bSrFNQdJflCQpgwE2T0Jpv3jZ8gAcXRkCc4QbKTQGqOGECK+vrT6u8l7qsqcW3nhZUZ1iOzskMbWGMvcJwuMB1FSUphMtHkeqeO7wBbg3OR7bVgP8f1a/azNWfffc3R90LWuv3Uo0EDgt4X5RvtfO78DUfTRVSWOnSlM2Fl7b5v8uBJYJOHaHqy1TPqL+jbEhxb2ma1Pf127tbTLLv7b9tmXUtqltV++C0MNUi/s+auJEsqK21XzbKelDXV3wu9229nkizZT5O1tGQ/1tz8ADt3183P2MHJ/w==</diagram></mxfile>
3 changes: 3 additions & 0 deletions docs/component-overview.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1b9a245

Please sign in to comment.