This repository contains the Terraform modules for creating a production ready ECS cluster in AWS. This repository is designed to be used with datacube-ecs to define ECS tasks.
- Requirements
- Config
- Create It
- Information
- License(#license)
- Contacts(#contacts)
- terraform > v0.10.0
- ecs-cli > 1.0.0 (bda91d5)
- awslogs (recommended)
- A Public Route53 hosted zone, that will be used as the default endpoint for our apps
- An Amazon Certificate Manager generated wildcard certificate for the default hosted zone
- An S3 bucket to store infrastructure state
- A DynamoDB table with the primary key name:
LockID
- This is used to lock changes to the state ensuring only one person can deploy at once. - An SSH KeyPair to be used to access the servers
To separate the config for your clusters your should create a new folder in the workspaces folder for each cluster. This folder should include:
- backend.cfg
- terraform.tfvars
To enable separation between dev and prod infrastructure we utilise workspaces. Workspaces required a backend.cfg file to identify the backend config for each cluster. This has been designed to enable a multi account workflow separating dev and prod.
bucket="ga-aws-dea-dev-tfstate"
key = "datacube-dev/terraform.tfstate"
region = "ap-southeast-2"
dynamodb_table = "terraform"
Name | Description | Type | Default | Required |
---|---|---|---|---|
bucket | The S3 bucket to be used to store terraform state files | string | `` | yes |
key | The path of the statefile within the bucket | string | `` | yes |
region | The region of the bucket | string | `` | yes |
dynamodb_table | A DynamoDB table with the primary key : LockID, used to lock modifications to the statefile | string | `` | yes |
The cluster requires a number of variables to be defined in the terraform.tfvars
file, these variables are defined below
Name | Description | Type | Default | Required |
---|---|---|---|---|
vpc_cidr | IP range used by the Virtual Private Cloud in CIDR format | string | 10.0.0.0/16 |
yes |
cluster | Cluster name used to seperate resources from other clusters in the same account | string | `` | yes |
state_bucket | The S3 bucket that is used to store Terraform state, stored as in paramater store to enable ecs tasks to run terraform | string | `` | yes |
workspace | Used to seperate infrastructure, should match the workspace folder | string | `` | yes |
public_subnet_cidrs | List of IP Ranges in CIDR format, must be within vpc_cidr range | List | `` | yes |
private_subnet_cidrs | List of IP Ranges in CIDR format, must be within vpc_cidr range | List | `` | yes |
database_subnet_cidrs | List of IP Ranges in CIDR format, must be within vpc_cidr range | List | `` | yes |
availability_zones | List of AZs to create resources in | List | `` | yes |
max_size | Max number of worker nodes | number | `` | yes |
min_size | Min number of worker nodes, should be at least 2 for High Availability | number | `` | yes |
desired_capacity | Default number of worker nodes to start the cluster with, this will be changed dynamically | number | `` | yes |
instance_type | The type of instance to run as a worker node | string | `` | yes |
owner | A team name to tag on all resources | string | `` | yes |
enable_jumpbox | If a bastion/jumpbox should be created | boolean | `` | yes |
key_name | The ssh keypair to associate with the jumpbox and worker nodes | string | `` | yes |
db_multi_az | Should the Database be created in multiple AZs (Higher Availability), or not (cheaper) | boolean | `` | yes |
dns_zone | The default DNS zone to associate with the cluster, apps will be given the dns name app.{dns_zone} | boolean | `` | yes |
dns_zone | The region your *.{dns_zone} cert was issued in | string | `` | yes |
config_root | A variable to be stored in paramater store, enables apps to find cluster specific config | string | `` | yes |
- Set secrets as environment variables to be injected into the script
TF_VAR_ssh_ip_address
will restrict ssh access to the jumpbox to this ip address if the jumpbox is enabledTF_db_admin_username
will set the admin username of the rds to this variableTF_db_admin_password
will set the admin password of the rds to this variable
- Deploy your workspace:
./deploy.sh workspace-name
e.g../deploy.sh datacube-dev
ECS stands for EC2 Container Service and is the AWS platform for running Docker containers. The full documentation about ECS can be found here, the development guide can be found here. A more fun read can be found at The Hitchhiker's Guide to AWS ECS and Docker
To understand ECS it is good to state the obvious differences against the competitors like Kubernetes or DC/OS Mesos. The major differences are that ECS can not be run on-prem and that it lacks advanced features. These two differences can either been seen as weakness or as strengths.
As stated above ECS needs EC2 worker nodes to run Docker containers on. To do so you need infrastructure for this. This repository will create the following infrastructure.
What are we creating:
- VPC with a /16 ip address range and an internet gateway
- We are choosing a region and a number of availability zones we want to use. default of three for HA
- In every availability zone we are creating a private, public and database subnet with a /24 ip address range
- Public subnet convention is 10.x.0.x and 10.x.1.x etc..
- Private subnet convention is 10.x.10.x and 10.x.11.x etc..
- Private subnet convention is 10.x.20.x and 10.x.21.x etc..
- In the public subnet we place either NAT instances or a Nat Gateway. Nat instances are the default as they are cheaper for high throughput applications.
- We create an Autoscaling group to host our ECS worker nodes
- We create an ECS cluster that the worker nodes connect to
To be able to create the stated infrastructure we are using Terraform. To allow everyone to use the infrastructure code, this repository contains the code as Terraform modules so it can be easily used by others.
Creating one big module does not really give a benefit of modules. Therefore the ECS module itself consists of different modules. This way it is easier for others to make changes, swap modules or use pieces from this repository even if not setting up ECS.
Details regarding how a module works or why it is setup is described in the module itself if needed.
Modules need to be used to create infrastructure. For an example on how to use the modules to create a working ECS cluster see ecs.tf and ecf.tfvars.
Note: You need to use Terraform version 0.10.0 and above
You should not put your ECS instances directly on the internet. You should not allow SSH access to the instances directly but use a bastion/jumpbox server for that. Having SSH access to the dev workspace is fine but you should not allow SSH access to production instances. You don't want to make any manual changes in the production workspace.
All ECS logs are pushed to cloudfront, we recommend using awslogs to monitor your logs, it can be installed using pythons' pip pip install awslogs
Because the EC2 nodes are created by us it means we need to make sure they are up to date and secure. It is possible to create an own AMI with your own OS, Docker, ECS agent and everything else. But it is much easier to use the ECS optimized AMIs which are maintained by AWS with a secure AWS Linux, regular security patches, recommended versions of ECS agent, Docker and more...
This cluster is configured to pull the latest version of the ECS AMI, running it on a schedule will ensure your worker nodes are semi-regularly updated.
To know when to update your EC2 node you can subscribe to AWS ECS AMI updates, like described here. Note: We can not create a sample module for this because terraform does not support email protocol on SNS.
If you need to perform an update you will need to update the information in the ecs_instances and then apply the changes on the cluster. This will only create a new launch_configuration but it will not touch the running instances. Therefore you need to replace your instances one by one. There are three ways to do this:
Terminating the instances, but this may cause disruption to your application users. By terminating an instance a new one will be started with the new launch_configuration
Double the size of your cluster and your applications and when everything is up and running scale the cluster down. This might be a costly operation and you also need to specify or protect the new instances so that the AWS auto scale does not terminate the new instances instead of the old ones.
The best option is to drain the containers from an ECS instance like described here. Then you can terminate the instance without disrupting your application users. This can be done by doubling the EC2 nodes instances in your cluster or just by one and doing this slowly one by one. Currently, there is no automated/scripted way to do this. See issue #3
This code is licensed under the MIT license. See the license deed for more details.
Tom Butler
Lead Developer
tom.butler@ga.gov.au