Elasticsearch is a core part of Pelias and a complex piece of infrastructure.
There are many ways to successfully set up an Elasticsearch cluster, and many different possible sets of requirements.
This repository attempts to collect best practices for a production-ready Elasticsearch cluster run in AWS using Terraform.
- Terraform 0.11.x: Terraform 0.12 is not yet supported
This project is compatible with Elasticsearch 7 only. Use historical releases
before v7.0.0
to support Elasticsearch 5 or 6. Going forward, the major
version of this project will track the supported Elasticsearch major version.
Terraform will need an AWS IAM user account that has permissions to create all the resources needed.
The Terraform user will need the AmazonEC2FullAccess
policy attached, as well as IAM permissions.
For IAM permissions the IAMFullAccess
policy can be used, or for more fine grained control, use this policy document:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1498231117000",
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:AttachRolePolicy",
"iam:AttachUserPolicy",
"iam:CreateRole",
"iam:UpdateAssumeRolePolicy",
"iam:CreateInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:GetInstanceProfile",
"iam:ListInstanceProfilesForRole",
"iam:DeleteRole",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:CreatePolicy",
"iam:DetachRolePolicy",
"iam:DeletePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListPolicyVersions",
"iam:DeletePolicyVersion",
"iam:GetRole",
"iam:PutRolePolicy",
"iam:GetRolePolicy",
"iam:DeleteRolePolicy",
"iam:RemoveRoleFromInstanceProfile",
"iam:PassRole"
],
"Resource": [
"*"
]
}
]
}
Once the AWS user has credentials, they need to be usable.
The easiest way to use AWS credentials is to put them in ~/.aws/credentials
. This file even supports several accounts which is quite nice:
cat ~/.aws/credentials
[default]
aws_access_key_id = defaultKey
aws_secret_access_key = defaultSecret
region = us-east-1
output = json
[site1]
aws_access_key_id = key1
aws_secret_access_key = secret1
region = us-east-1
output = json
[site2]
aws_access_key_id = key2
aws_secret_access_key = secret2
region = us-east-1
output = json
Now, different keys can be selected with export AWS_PROFILE=site1
. Run that command before anything below and the credentials will be picked up automatically.
Once the terraform user has been set up, create an access key and keep the credentials handy for the next section
Packer images are used to avoid lengthy startup times when launching new Elasticsearch instances.
See instructions in pelias/packer-elasticsearch
Once the Packer images are built, they are automatically detected by the Terraform configuration.
While it can be run directly, this directory's code is best used as a Terraform module.
Create a file, for example elasticsearch.tf
, with contents like the following:
# define this once, possibly in another file if you want to run multiple clusters
provider "aws" {
region = "us-east-1"
version = "~> 1.60"
}
provider "template" {
version = "~> 2.1"
}
module "elasticsearch-prod-a" {
source = "github.com/pelias/terraform-elasticsearch?ref=v7.2.0" # check Github for the latest tagged releases
aws_vpc_id = "vpc-1234" # the ID of an existing VPC in which to create the instances
ssh_key_name = "ssh-key-to-use"
availability_zones = "us-east-1a,us-east-1b,us-east-1c" # List of availability zones to use. Optional in us-east-1, required otherwise
environment = "dev" # or whatever unique environment you choose
elasticsearch_max_instances = 2 # 2 r5.large instances is suitable for a minimal full-planet production build with replicas
elasticsearch_min_instances = 2
elasticsearch_desired_instances = 2
elasticsearch_data_volume_size = 350
elasticsearch_instance_type = "r5.large"
elasticsearch_heap_memory_percent = 50
ssh_ip_range = "172.20.0.0/16" # adjust this if you'd like SSH access to be limited, or remove if you don't want that
ami_env_tag_filter = "prod" # this variable can be adjusted if you tag your AMIs differently, or removed to use the latest AMI
subnet_name_filter = "us-east-*" # if you only want to launch Elasticsearch instances in some subnets, provide a filter to find the subnets. Remove if all subnets are ok
subnet_name_filter_property = "tag:Name" # change this if you would like to filter subnets on a tag value other than name. This can be used to create more complex selections of subnets than the prefix-matching allowed in `subnet_name_filter`
# the following section is all optional, and if configured, will load an existing snapshot from S3 on startup
snapshot_s3_bucket = "name-of-your-s3-bucket" # required to load snapshot
snapshot_base_path = "path/to/your/snapshot" # required to load snapshot
#snapshot_name = "name-of-your-snapshot" # optional, will load first snapshot if omitted
snapshot_alias_name = "pelias" # if you'd like an alias created, use this variable
snapshot_replica_count = 1 # 1 is the default, modify as desired
# you must set at least one tag as a workaround to https://github.com/pelias/terraform-elasticsearch/issues/12
tags {
env = "dev"
}
}
Adjust any variables for your use case.
All that should be needed to create everything required for elasticsearch is to run the following:
terraform init
for initializing Terraform and fetching the module code, and then
terraform apply
Once that's done, it will print out the DNS name of the load balancer used to access Elasticsearch:
Here's some example output
Outputs:
aws_elb = internal-search-dev-elasticsearch-elb-XXXXXXXX.us-east-1.elb.amazonaws.com
If using this code with the Pelias Helm chart, this section is useful.
Copy the DNS name from the Terraform output, and use it to replace the elasticsearch.host
value in the Kubernetes chart.
Update the chart with helm update pelias ./pelias/kubernetes -f yourValues.yaml
or similar, and new API instances with the correct settings will automatically be launched.
Thanks to the following sources for inspiration and code:
https://github.com/nadnerb/terraform-elasticsearch
https://github.com/floragunncom/packer-elasticsearch/blob/master/elastic.json