feat: Strict label check and replace disable_check_wokflow_job_labels…

… by opt in enable_workflow_job_labels_check (philips-labs#1591) * Check strict labels * feat: Replace disable_check_wokflow_job_labels by opt in enable_workflow_job_labels_check, and check all labels. * Make check strict * update docs * cleanup
mcaulifn · Jan 10, 2022 · 405b11d · 405b11d
1 parent 27e974d
commit 405b11d
Show file tree

Hide file tree

Showing 16 changed files with 233 additions and 124 deletions.
diff --git a/README.md b/README.md
@@ -83,7 +83,7 @@ Besides these permissions, the lambdas also need permission to CloudWatch (for l
 To be able to support a number of use-cases the module has quite a lot configuration options. We try to choose reasonable defaults. The several examples also shows for the main cases how to configure the runners.
 
 - Org vs Repo level. You can configure the module to connect the runners in GitHub on a org level and share the runners in your org. Or set the runners on repo level. The module will install the runner to the repo. This can be multiple repo's but runners are not shared between repo's.
-- Checkrun vs Workflow job event. You can configure the webhook in GitHub to send checkrun or workflow job events to the webhook. Workflow job events are introduced by GitHub in September 2021 and are designed to support scalable runners. We advise when possible to use the workflow job event, you can set `disable_check_wokflow_job_labels = true` to disable the label check. 
+- Checkrun vs Workflow job event. You can configure the webhook in GitHub to send checkrun or workflow job events to the webhook. Workflow job events are introduced by GitHub in September 2021 and are designed to support scalable runners. We advise when possible to use the workflow job event, you can set `runner_enable_workflow_job_labels_check = true` to let the webhook only accept jobs based on the labels configured. The webhook will check the custom labels provided via the variable `runner_extra_labels` and the GitHub managed labels, "self-hosted", OS and architecture. The OS and architecture are derived from the settings. By default the check is disabled.
 - Linux vs Windows. you can configure the os types linux and win. Linux will be used by default.
 - Re-use vs Ephemeral. By default runners are re-used for till detected idle, once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners are only working in combination with the workflow job event. We also suggest to use a pre-build AMI to improve the start time of jobs.
 - GitHub cloud vs GitHub enterprise server (GHES). The runner support GitHub cloud as well GitHub enterprise service. For GHES we rely on our community to test and support. We have no possibility to test ourselves on GHES.
@@ -382,7 +382,6 @@ In case the setup does not work as intended follow the trace of events:
 | <a name="input_cloudwatch_config"></a> [cloudwatch\_config](#input\_cloudwatch\_config) | (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | `string` | `null` | no |
 | <a name="input_create_service_linked_role_spot"></a> [create\_service\_linked\_role\_spot](#input\_create\_service\_linked\_role\_spot) | (optional) create the serviced linked role for spot instances that is required by the scale-up lambda. | `bool` | `false` | no |
 | <a name="input_delay_webhook_event"></a> [delay\_webhook\_event](#input\_delay\_webhook\_event) | The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event. | `number` | `30` | no |
-| <a name="input_disable_check_wokflow_job_labels"></a> [disable\_check\_wokflow\_job\_labels](#input\_disable\_check\_wokflow\_job\_labels) | Disable the the check of workflow labels for received workflow job events. | `bool` | `false` | no |
 | <a name="input_enable_cloudwatch_agent"></a> [enable\_cloudwatch\_agent](#input\_enable\_cloudwatch\_agent) | Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via `cloudwatch_config`. | `bool` | `true` | no |
 | <a name="input_enable_ephemeral_runners"></a> [enable\_ephemeral\_runners](#input\_enable\_ephemeral\_runners) | Enable ephemeral runners, runners will only be used once. | `bool` | `false` | no |
 | <a name="input_enable_organization_runners"></a> [enable\_organization\_runners](#input\_enable\_organization\_runners) | Register runners to organization, instead of repo level | `bool` | `false` | no |
@@ -426,7 +425,8 @@ In case the setup does not work as intended follow the trace of events:
 | <a name="input_runner_boot_time_in_minutes"></a> [runner\_boot\_time\_in\_minutes](#input\_runner\_boot\_time\_in\_minutes) | The minimum time for an EC2 runner to boot and register as a runner. | `number` | `5` | no |
 | <a name="input_runner_ec2_tags"></a> [runner\_ec2\_tags](#input\_runner\_ec2\_tags) | Map of tags that will be added to the launch template instance tag specificatons. | `map(string)` | `{}` | no |
 | <a name="input_runner_egress_rules"></a> [runner\_egress\_rules](#input\_runner\_egress\_rules) | List of egress rules for the GitHub runner instances. | <pre>list(object({<br>    cidr_blocks      = list(string)<br>    ipv6_cidr_blocks = list(string)<br>    prefix_list_ids  = list(string)<br>    from_port        = number<br>    protocol         = string<br>    security_groups  = list(string)<br>    self             = bool<br>    to_port          = number<br>    description      = string<br>  }))</pre> | <pre>[<br>  {<br>    "cidr_blocks": [<br>      "0.0.0.0/0"<br>    ],<br>    "description": null,<br>    "from_port": 0,<br>    "ipv6_cidr_blocks": [<br>      "::/0"<br>    ],<br>    "prefix_list_ids": null,<br>    "protocol": "-1",<br>    "security_groups": null,<br>    "self": null,<br>    "to_port": 0<br>  }<br>]</pre> | no |
-| <a name="input_runner_extra_labels"></a> [runner\_extra\_labels](#input\_runner\_extra\_labels) | Extra labels for the runners (GitHub). Separate each label by a comma | `string` | `""` | no |
+| <a name="input_runner_enable_workflow_job_labels_check"></a> [runner\_enable\_workflow\_job\_labels\_check](#input\_runner\_enable\_workflow\_job\_labels\_check) | If set to true all labels in the workflow job even are matched agaist the custom labels and GitHub labels (os, architecture and `self-hosted`). When the labels are not matching the event is dropped at the webhook. | `bool` | `false` | no |
+| <a name="input_runner_extra_labels"></a> [runner\_extra\_labels](#input\_runner\_extra\_labels) | Extra (custom) labels for the runners (GitHub). Separate each label by a comma. Labels checks on the webhook can be enforced by setting `enable_workflow_job_labels_check`. GitHub read-only labels should not be provided. | `string` | `""` | no |
 | <a name="input_runner_group_name"></a> [runner\_group\_name](#input\_runner\_group\_name) | Name of the runner group. | `string` | `"Default"` | no |
 | <a name="input_runner_iam_role_managed_policy_arns"></a> [runner\_iam\_role\_managed\_policy\_arns](#input\_runner\_iam\_role\_managed\_policy\_arns) | Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role | `list(string)` | `[]` | no |
 | <a name="input_runner_log_files"></a> [runner\_log\_files](#input\_runner\_log\_files) | (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | <pre>list(object({<br>    log_group_name   = string<br>    prefix_log_group = bool<br>    file_path        = string<br>    log_stream_name  = string<br>  }))</pre> | `null` | no |

diff --git a/examples/ephemeral/main.tf b/examples/ephemeral/main.tf
@@ -35,6 +35,9 @@ module "runners" {
   enable_organization_runners = true
   runner_extra_labels         = "default,example"
 
+  # enable workflow labels check
+  # runner_enable_workflow_job_labels_check = true
+
   # enable access to the runners via SSM
   enable_ssm_on_runners = true
 
@@ -55,12 +58,12 @@ module "runners" {
   enable_ephemeral_runners = true
 
   # configure your pre-built AMI
-  # enabled_userdata = false
-  # ami_filter       = { name = ["github-runner-amzn2-x86_64-2021*"] }
-  # ami_owners       = [data.aws_caller_identity.current.account_id]
+  enabled_userdata = false
+  ami_filter       = { name = ["github-runner-amzn2-x86_64-2021*"] }
+  ami_owners       = [data.aws_caller_identity.current.account_id]
 
   # Enable logging
-  # log_level = "debug"
+  log_level = "debug"
 
   # Setup a dead letter queue, by default scale up lambda will kepp retrying to process event in case of scaling error.
   # redrive_policy_build_queue = {

diff --git a/examples/windows/main.tf b/examples/windows/main.tf
@@ -31,7 +31,7 @@ module "runners" {
   runner_extra_labels = "default,example"
 
   # Set the OS to Windows
-  runner_os = "win"
+  runner_os = "windows"
   # we need to give the runner time to start because this is windows.
   runner_boot_time_in_minutes = 20
 

diff --git a/main.tf b/main.tf
@@ -67,8 +67,10 @@ module "webhook" {
   lambda_zip                       = var.webhook_lambda_zip
   lambda_timeout                   = var.webhook_lambda_timeout
   logging_retention_in_days        = var.logging_retention_in_days
-  runner_extra_labels              = var.runner_extra_labels
-  disable_check_wokflow_job_labels = var.disable_check_wokflow_job_labels
+
+  # labels
+  enable_workflow_job_labels_check = var.runner_enable_workflow_job_labels_check
+  runner_labels                    = "self-hosted,${var.runner_os},${var.runner_architecture},${var.runner_extra_labels}"
 
   role_path                 = var.role_path
   role_permissions_boundary = var.role_permissions_boundary

diff --git a/modules/runners/logging.tf b/modules/runners/logging.tf
@@ -12,19 +12,19 @@ locals {
       {
         "log_group_name" : "user_data",
         "prefix_log_group" : true,
-        "file_path" : var.runner_os == "win" ? "C:/UserData.log" : "/var/log/user-data.log",
+        "file_path" : var.runner_os == "windows" ? "C:/UserData.log" : "/var/log/user-data.log",
         "log_stream_name" : "{instance_id}"
       },
       {
         "log_group_name" : "runner",
         "prefix_log_group" : true,
-        "file_path" : var.runner_os == "win" ? "C:/actions-runner/_diag/Runner_*.log" : "/home/runners/actions-runner/_diag/Runner_**.log",
+        "file_path" : var.runner_os == "windows" ? "C:/actions-runner/_diag/Runner_*.log" : "/home/runners/actions-runner/_diag/Runner_**.log",
         "log_stream_name" : "{instance_id}"
       },
       {
         "log_group_name" : "runner-startup",
         "prefix_log_group" : true,
-        "file_path" : var.runner_os == "win" ? "C:/runner-startup.log" : "/var/log/runner-startup.log",
+        "file_path" : var.runner_os == "windows" ? "C:/runner-startup.log" : "/var/log/runner-startup.log",
         "log_stream_name" : "{instance_id}"
       }
     ]

diff --git a/modules/runners/main.tf b/modules/runners/main.tf
@@ -16,23 +16,23 @@ locals {
   kms_key_arn           = var.kms_key_arn != null ? var.kms_key_arn : ""
 
   default_ami = {
-    "win"   = { name = ["Windows_Server-20H2-English-Core-ContainersLatest-*"] }
-    "linux" = var.runner_architecture == "arm64" ? { name = ["amzn2-ami-hvm-2*-arm64-gp2"] } : { name = ["amzn2-ami-hvm-2.*-x86_64-ebs"] }
+    "windows" = { name = ["Windows_Server-20H2-English-Core-ContainersLatest-*"] }
+    "linux"   = var.runner_architecture == "arm64" ? { name = ["amzn2-ami-hvm-2*-arm64-gp2"] } : { name = ["amzn2-ami-hvm-2.*-x86_64-ebs"] }
   }
 
   default_userdata_template = {
-    "win"   = "${path.module}/templates/user-data.ps1"
-    "linux" = "${path.module}/templates/user-data.sh"
+    "windows" = "${path.module}/templates/user-data.ps1"
+    "linux"   = "${path.module}/templates/user-data.sh"
   }
 
   userdata_install_runner = {
-    "win"   = "${path.module}/templates/install-runner.ps1"
-    "linux" = "${path.module}/templates/install-runner.sh"
+    "windows" = "${path.module}/templates/install-runner.ps1"
+    "linux"   = "${path.module}/templates/install-runner.sh"
   }
 
   userdata_start_runner = {
-    "win"   = "${path.module}/templates/start-runner.ps1"
-    "linux" = "${path.module}/templates/start-runner.sh"
+    "windows" = "${path.module}/templates/start-runner.ps1"
+    "linux"   = "${path.module}/templates/start-runner.sh"
   }
 
   ami_filter = coalesce(var.ami_filter, local.default_ami[var.runner_os])

diff --git a/modules/runners/scale-down.tf b/modules/runners/scale-down.tf
@@ -1,8 +1,8 @@
 locals {
   # Windows Runners can take their sweet time to do anything
   min_runtime_defaults = {
-    "win"   = 15
-    "linux" = 5
+    "windows" = 15
+    "linux"   = 5
   }
 }
 resource "aws_lambda_function" "scale_down" {

diff --git a/modules/runners/variables.tf b/modules/runners/variables.tf
@@ -96,7 +96,7 @@ variable "runner_os" {
   default     = "linux"
 
   validation {
-    condition     = contains(["linux", "win"], var.runner_os)
+    condition     = contains(["linux", "windows"], var.runner_os)
     error_message = "Valid values for runner_os are (linux, win)."
   }
 }

diff --git a/modules/webhook/README.md b/modules/webhook/README.md
@@ -74,6 +74,7 @@ No modules.
 |------|-------------|------|---------|:--------:|
 | <a name="input_aws_region"></a> [aws\_region](#input\_aws\_region) | AWS region. | `string` | n/a | yes |
 | <a name="input_disable_check_wokflow_job_labels"></a> [disable\_check\_wokflow\_job\_labels](#input\_disable\_check\_wokflow\_job\_labels) | Disable the the check of workflow labels. | `bool` | `false` | no |
+| <a name="input_enable_workflow_job_labels_check"></a> [enable\_workflow\_job\_labels\_check](#input\_enable\_workflow\_job\_labels\_check) | If set to true all labels in the workflow job even are matched agaist the custom labels and GitHub labels (os, architecture and `self-hosted`). When the labels are not matching the event is dropped at the webhook. | `bool` | `false` | no |
 | <a name="input_environment"></a> [environment](#input\_environment) | A name that identifies the environment, used as prefix and for tagging. | `string` | n/a | yes |
 | <a name="input_github_app_webhook_secret_arn"></a> [github\_app\_webhook\_secret\_arn](#input\_github\_app\_webhook\_secret\_arn) | n/a | `string` | n/a | yes |
 | <a name="input_kms_key_arn"></a> [kms\_key\_arn](#input\_kms\_key\_arn) | Optional CMK Key ARN to be used for Parameter Store. | `string` | `null` | no |
@@ -86,7 +87,7 @@ No modules.
 | <a name="input_repository_white_list"></a> [repository\_white\_list](#input\_repository\_white\_list) | List of repositories allowed to use the github app | `list(string)` | `[]` | no |
 | <a name="input_role_path"></a> [role\_path](#input\_role\_path) | The path that will be added to the role; if not set, the environment name will be used. | `string` | `null` | no |
 | <a name="input_role_permissions_boundary"></a> [role\_permissions\_boundary](#input\_role\_permissions\_boundary) | Permissions boundary that will be added to the created role for the lambda. | `string` | `null` | no |
-| <a name="input_runner_extra_labels"></a> [runner\_extra\_labels](#input\_runner\_extra\_labels) | Extra labels for the runners (GitHub). Separate each label by a comma | `string` | `""` | no |
+| <a name="input_runner_labels"></a> [runner\_labels](#input\_runner\_labels) | Labels for the runners (GitHub). Separate each label by a comma. Labels are used to check events when `runner_enable_workflow_job_labels_check` is set to `true`. | `string` | `""` | no |
 | <a name="input_sqs_build_queue"></a> [sqs\_build\_queue](#input\_sqs\_build\_queue) | SQS queue to publish accepted build events. | <pre>object({<br>    id  = string<br>    arn = string<br>  })</pre> | n/a | yes |
 | <a name="input_sqs_build_queue_fifo"></a> [sqs\_build\_queue\_fifo](#input\_sqs\_build\_queue\_fifo) | Enable a FIFO queue to remain the order of events received by the webhook. Suggest to set to true for repo level runners. | `bool` | `false` | no |
 | <a name="input_tags"></a> [tags](#input\_tags) | Map of tags that will be added to created resources. By default resources will be tagged with name and environment. | `map(string)` | `{}` | no |

diff --git a/modules/webhook/lambdas/webhook/src/lambda.test.ts b/modules/webhook/lambdas/webhook/src/lambda.test.ts
@@ -0,0 +1,108 @@
+import { APIGatewayEvent, Context } from 'aws-lambda';
+import { mocked } from 'ts-jest/utils';
+import { githubWebhook } from './lambda';
+import { handle } from './webhook/handler';
+import { logger } from './webhook/logger';
+
+const event: APIGatewayEvent = {
+  body: JSON.stringify(''),
+  headers: { abc: undefined },
+  httpMethod: '',
+  isBase64Encoded: false,
+  multiValueHeaders: { abc: undefined },
+  multiValueQueryStringParameters: null,
+  path: '',
+  pathParameters: null,
+  queryStringParameters: null,
+  stageVariables: null,
+  resource: '',
+  requestContext: {
+    authorizer: null,
+    accountId: '123456789012',
+    resourceId: '123456',
+    stage: 'prod',
+    requestId: 'c6af9ac6-7b61-11e6-9a41-93e8deadbeef',
+    requestTime: '09/Apr/2015:12:34:56 +0000',
+    requestTimeEpoch: 1428582896000,
+    identity: {
+      cognitoIdentityPoolId: null,
+      accountId: null,
+      cognitoIdentityId: null,
+      caller: null,
+      accessKey: null,
+      sourceIp: '127.0.0.1',
+      cognitoAuthenticationType: null,
+      cognitoAuthenticationProvider: null,
+      userArn: null,
+      userAgent: 'Custom User Agent String',
+      user: null,
+      clientCert: null,
+      apiKey: null,
+      apiKeyId: null,
+      principalOrgId: null,
+    },
+    path: '/prod/path/to/resource',
+    resourcePath: '/{proxy+}',
+    httpMethod: 'POST',
+    apiId: '1234567890',
+    protocol: 'HTTP/1.1',
+  },
+};
+
+const context: Context = {
+  awsRequestId: '1',
+  callbackWaitsForEmptyEventLoop: false,
+  functionName: '',
+  functionVersion: '',
+  getRemainingTimeInMillis: () => 0,
+  invokedFunctionArn: '',
+  logGroupName: '',
+  logStreamName: '',
+  memoryLimitInMB: '',
+  done: () => {
+    return;
+  },
+  fail: () => {
+    return;
+  },
+  succeed: () => {
+    return;
+  },
+};
+
+jest.mock('./webhook/handler');
+
+describe('Test scale up lambda wrapper.', () => {
+  it('Happy flow, resolve.', async () => {
+    const mock = mocked(handle);
+    mock.mockImplementation(() => {
+      return new Promise((resolve) => {
+        resolve({ statusCode: 200 });
+      });
+    });
+
+    const result = await githubWebhook(event, context);
+    expect(result).toEqual({ statusCode: 200 });
+  });
+
+  it('An expected error, resolve.', async () => {
+    const mock = mocked(handle);
+    mock.mockImplementation(() => {
+      return new Promise((resolve) => {
+        resolve({ statusCode: 400 });
+      });
+    });
+
+    const result = await githubWebhook(event, context);
+    expect(result).toEqual({ statusCode: 400 });
+  });
+
+  it('Errors are not thrown.', async () => {
+    const mock = mocked(handle);
+    const logSpy = jest.spyOn(logger, 'error');
+    mock.mockRejectedValue(new Error('some error'));
+    const result = await githubWebhook(event, context);
+    expect(result).toMatchObject({ statusCode: 500 });
+    expect(logSpy).toBeCalledTimes(1);
+  });
+});
diff --git a/modules/webhook/lambdas/webhook/src/lambda.ts b/modules/webhook/lambdas/webhook/src/lambda.ts
@@ -6,14 +6,18 @@ export interface Response {
   statusCode: number;
   body?: string;
 }
-
-export const githubWebhook = async (event: APIGatewayEvent, context: Context, callback: Callback): Promise<void> => {
+export async function githubWebhook(event: APIGatewayEvent, context: Context): Promise<Response> {
   logger.setSettings({ requestId: context.awsRequestId });
   logger.debug(JSON.stringify(event));
+  let result: Response;
   try {
-    const response = await handle(event.headers, event.body as string);
-    callback(null, response);
+    result = await handle(event.headers, event.body as string);
   } catch (e) {
-    callback(e as Error);
+    logger.error(e);
+    result = {
+      statusCode: 500,
+      body: 'Check the Lambda logs for the error details.',
+    };
   }
-};
+  return result;
+}