Skip to content

Latest commit

 

History

History
204 lines (164 loc) · 24.5 KB

File metadata and controls

204 lines (164 loc) · 24.5 KB

Module - Scale runners

This module creates resources required to run the GitHub action runner on AWS EC2 spot instances. The life cycle of the runners on AWS is managed by two lambda functions. One function will handle scaling up, the other scaling down.

Overview

Action runners on EC2

The action runners are created via a launch template; in the launch template only the subnet needs to be provided. During launch the installation is handled via a user data script. The configuration is fetched from SSM parameter store.

Lambda scale up

The scale up lambda is triggered by events on a SQS queue. Events on this queue are delayed, which will give the workflow some time to start running on available runners. For each event the lambda will check if the workflow is still queued and no other limits are reached. In that case the lambda will create a new EC2 instance. The lambda only needs to know which launch template to use and which subnets are available. From the available subnets a random one will be chosen. Once the instance is created the event is assumed as handled, and we assume the workflow wil start at some moment once the created instance is ready.

Lambda scale down

The scale down lambda is triggered via a CloudWatch event. The event is triggered by a cron expression defined in the variable scale_down_schedule_expression (https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html). For scaling down GitHub does not provide a good API yet, therefore we run the scaling down based on this event every x minutes. Each time the lambda is triggered it tries to remove all runners older than x minutes (configurable) managed in this deployment. In case the runner can be removed from GitHub, which means it is not executing a workflow, the lambda will terminate the EC2 instance.

Usages

Usage examples are available in the root module. By default the root module will assume local zip files containing the lambda distribution are available. See the download lambda module for more information.

Lambda Function

The Lambda function is written in TypeScript and requires Node 12.x and yarn. Sources are located in [./lambdas/runners]. Two lambda functions share the same sources, there is one entry point for scaleDown and another one for scaleUp.

Install

cd lambdas/runners
yarn install

Test

Test are implemented with Jest, calls to AWS and GitHub are mocked.

yarn run test

Package

To compile all TypeScript/JavaScript sources in a single file ncc is used.

yarn run dist

Requirements

Name Version
terraform >= 0.14.1
aws >= 3.38

Providers

Name Version
aws >= 3.38

Modules

No modules.

Resources

Name Type
aws_cloudwatch_event_rule.scale_down resource
aws_cloudwatch_event_target.scale_down resource
aws_cloudwatch_log_group.gh_runners resource
aws_cloudwatch_log_group.scale_down resource
aws_cloudwatch_log_group.scale_up resource
aws_iam_instance_profile.runner resource
aws_iam_role.runner resource
aws_iam_role.scale_down resource
aws_iam_role.scale_up resource
aws_iam_role_policy.cloudwatch resource
aws_iam_role_policy.describe_tags resource
aws_iam_role_policy.dist_bucket resource
aws_iam_role_policy.runner_session_manager_aws_managed resource
aws_iam_role_policy.scale_down resource
aws_iam_role_policy.scale_down_logging resource
aws_iam_role_policy.scale_up resource
aws_iam_role_policy.scale_up_logging resource
aws_iam_role_policy.service_linked_role resource
aws_iam_role_policy.ssm_parameters resource
aws_iam_role_policy_attachment.managed_policies resource
aws_iam_role_policy_attachment.scale_down_vpc_execution_role resource
aws_iam_role_policy_attachment.scale_up_vpc_execution_role resource
aws_lambda_event_source_mapping.scale_up resource
aws_lambda_function.scale_down resource
aws_lambda_function.scale_up resource
aws_lambda_permission.scale_down resource
aws_lambda_permission.scale_runners_lambda resource
aws_launch_template.runner resource
aws_security_group.runner_sg resource
aws_ssm_parameter.cloudwatch_agent_config_runner resource
aws_ssm_parameter.runner_agent_mode resource
aws_ssm_parameter.runner_config_run_as resource
aws_ssm_parameter.runner_enable_cloudwatch resource
aws_ami.runner data source
aws_caller_identity.current data source
aws_iam_policy_document.lambda_assume_role_policy data source

Inputs

Name Description Type Default Required
ami_filter Map of lists used to create the AMI filter for the action runner AMI. map(list(string))
{
"name": [
"amzn2-ami-hvm-2.*-x86_64-ebs"
]
}
no
ami_owners The list of owners used to select the AMI of action runner instances. list(string)
[
"amazon"
]
no
aws_region AWS region. string n/a yes
block_device_mappings The EC2 instance block device configuration. Takes the following keys: device_name, delete_on_termination, volume_type, volume_size, encrypted, iops map(string) {} no
cloudwatch_config (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. string null no
create_service_linked_role_spot (optional) create the service linked role for spot instances that is required by the scale-up lambda. bool false no
egress_rules List of egress rules for the GitHub runner instances.
list(object({
cidr_blocks = list(string)
ipv6_cidr_blocks = list(string)
prefix_list_ids = list(string)
from_port = number
protocol = string
security_groups = list(string)
self = bool
to_port = number
description = string
}))
[
{
"cidr_blocks": [
"0.0.0.0/0"
],
"description": null,
"from_port": 0,
"ipv6_cidr_blocks": [
"::/0"
],
"prefix_list_ids": null,
"protocol": "-1",
"security_groups": null,
"self": null,
"to_port": 0
}
]
no
enable_cloudwatch_agent Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via cloudwatch_config. bool true no
enable_organization_runners n/a bool n/a yes
enable_ssm_on_runners Enable to allow access to the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances. bool n/a yes
environment A name that identifies the environment, used as prefix and for tagging. string n/a yes
ghes_ssl_verify GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). bool true no
ghes_url GitHub Enterprise Server URL. DO NOT SET IF USING PUBLIC GITHUB string null no
github_app_parameters Parameter Store for GitHub App Parameters.
object({
key_base64 = map(string)
id = map(string)
})
n/a yes
idle_config List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle.
list(object({
cron = string
timeZone = string
idleCount = number
}))
[] no
instance_profile_path The path that will be added to the instance_profile, if not set the environment name will be used. string null no
instance_type [DEPRECATED] See instance_types. string "m5.large" no
instance_types List of instance types for the action runner. list(string) null no
key_name Key pair name string null no
kms_key_arn Optional CMK Key ARN to be used for Parameter Store. string null no
lambda_s3_bucket S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly. any null no
lambda_security_group_ids List of security group IDs associated with the Lambda function. list(string) [] no
lambda_subnet_ids List of subnets in which the lambda will be launched, the subnets needs to be subnets in the vpc_id. list(string) [] no
lambda_timeout_scale_down Time out for the scale down lambda in seconds. number 60 no
lambda_timeout_scale_up Time out for the scale up lambda in seconds. number 60 no
lambda_zip File location of the lambda zip file. string null no
log_level Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. string "info" no
log_type Logging format for lambda logging. Valid values are 'json', 'pretty', 'hidden'. string "pretty" no
logging_retention_in_days Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. number 180 no
market_options Market options for the action runner instances. string "spot" no
metadata_options Metadata options for the ec2 runner instances. map(any)
{
"http_endpoint": "enabled",
"http_put_response_hop_limit": 1,
"http_tokens": "optional"
}
no
minimum_running_time_in_minutes The time an ec2 action runner should be running at minimum before terminated if non busy. number 5 no
overrides This map provides the possibility to override some defaults. The following attributes are supported: name_sg overrides the Name tag for all security groups created by this module. name_runner_agent_instance overrides the Name tag for the ec2 instance defined in the auto launch configuration. name_docker_machine_runners overrides the Name tag spot instances created by the runner agent. map(string)
{
"name_runner": "",
"name_sg": ""
}
no
role_path The path that will be added to the role; if not set, the environment name will be used. string null no
role_permissions_boundary Permissions boundary that will be added to the created role for the lambda. string null no
runner_additional_security_group_ids (optional) List of additional security groups IDs to apply to the runner list(string) [] no
runner_architecture The platform architecture of the runner instance_type. string "x64" no
runner_as_root Run the action runner under the root user. bool false no
runner_boot_time_in_minutes The minimum time for an EC2 runner to boot and register as a runner. number 5 no
runner_ec2_tags Map of tags that will be added to the launch template instance tag specificatons. map(string) {} no
runner_extra_labels Extra labels for the runners (GitHub). Separate each label by a comma string "" no
runner_group_name Name of the runner group. string "Default" no
runner_iam_role_managed_policy_arns Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role list(string) [] no
runner_log_files (optional) List of logfiles to send to CloudWatch, will only be used if enable_cloudwatch_agent is set to true. Object description: log_group_name: Name of the log group, prefix_log_group: If true, the log group name will be prefixed with /github-self-hosted-runners/<var.environment>, file_path: path to the log file, log_stream_name: name of the log stream.
list(object({
log_group_name = string
prefix_log_group = bool
file_path = string
log_stream_name = string
}))
[
{
"file_path": "/var/log/messages",
"log_group_name": "messages",
"log_stream_name": "{instance_id}",
"prefix_log_group": true
},
{
"file_path": "/var/log/user-data.log",
"log_group_name": "user_data",
"log_stream_name": "{instance_id}",
"prefix_log_group": true
},
{
"file_path": "/var/log/runner-startup.log",
"log_group_name": "runner-startup",
"log_stream_name": "{instance_id}",
"prefix_log_group": true
},
{
"file_path": "/home/ec2-user/actions-runner/diag/Runner**.log",
"log_group_name": "runner",
"log_stream_name": "{instance_id}",
"prefix_log_group": true
}
]
no
runners_lambda_s3_key S3 key for runners lambda function. Required if using S3 bucket to specify lambdas. any null no
runners_lambda_s3_object_version S3 object version for runners lambda function. Useful if S3 versioning is enabled on source bucket. any null no
runners_maximum_count The maximum number of runners that will be created. number 3 no
s3_bucket_runner_binaries n/a
object({
arn = string
})
n/a yes
s3_location_runner_binaries S3 location of runner distribution. string n/a yes
scale_down_schedule_expression Scheduler expression to check every x for scale down. string "cron(*/5 * * * ? *)" no
scale_up_reserved_concurrent_executions Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. number 1 no
sqs_build_queue SQS queue to consume accepted build events.
object({
arn = string
})
n/a yes
subnet_ids List of subnets in which the action runners will be launched, the subnets needs to be subnets in the vpc_id. list(string) n/a yes
tags Map of tags that will be added to created resources. By default resources will be tagged with name and environment. map(string) {} no
enabled_userdata Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI bool true no
userdata_post_install User-data script snippet to insert after GitHub action runner install string "" no
userdata_pre_install User-data script snippet to insert before GitHub action runner install string "" no
userdata_template Alternative user-data template, replacing the default template. By providing your own user_data you have to take care of installing all required software, including the action runner. Variables userdata_pre/post_install are ignored. string null no
volume_size Size of runner volume number 30 no
vpc_id The VPC for the security groups. string n/a yes

Outputs

Name Description
lambda_scale_down n/a
lambda_scale_up n/a
launch_template n/a
role_runner n/a
role_scale_down n/a
role_scale_up n/a

Philips Forest

This module is part of the Philips Forest.


                                                     ___                   _
                                                    / __\__  _ __ ___  ___| |_
                                                   / _\/ _ \| '__/ _ \/ __| __|
                                                  / / | (_) | | |  __/\__ \ |_
                                                  \/   \___/|_|  \___||___/\__|

                                                                 Infrastructure

Talk to the forestkeepers in the forest-channel on Slack.

Slack