This terraform module automates populating some Tamr config variables that are generated as outputs from other AWS scale-out modules.
Smallest complete fully working example. This example might require extra resources to run the example.
This module creates:
- A template_file data source which renders the contents of a populated Tamr config.
- If
rendered_config_path
is provided, the populated Tamr config will be output to a yaml file in this path.
Name | Version |
---|---|
terraform | >= 0.13 |
Name | Version |
---|---|
local | n/a |
template | n/a |
Name | Description | Type | Default | Required |
---|---|---|---|---|
ephemeral_spark_configured | True if EMR was configured for ephemeral spark clusters. | bool |
n/a | yes |
es_domain_endpoint | Endpoint of Elasticsearch domain. | string |
n/a | yes |
rds_pg_hostname | Hostname of RDS postgres instance. | string |
n/a | yes |
rds_pg_password | Master password for RDS postgres database instance. | string |
n/a | yes |
spark_cluster_log_uri | The path to the S3 location where logs for the Spark cluster are stored. | string |
n/a | yes |
tamr_data_bucket | Name of Tamr root directory bucket. | string |
n/a | yes |
additional_templated_variables | Mapping of additional Tamr variables (not included in template) to its value. If a variable name in this map defines the same key as an input variable, the value specified in this map takes precedence. | map(string) |
{} |
no |
apps_dms_default_cloud_provider | Defines the default cloud service provider for DMS when APPS_DMS_ENABLED is set to true |
string |
"s3" |
no |
apps_dms_enabled | Set to true to enable the Data Movement Service (DMS) |
bool |
true |
no |
config_template_path | Path to Tamr config template. | string |
"./tamr-config.yml" |
no |
core_ebs_size | The core EBS volume size, in gibibytes (GiB). | string |
"" |
no |
core_ebs_type | Type of volumes to attach to the core nodes. Valid options are gp2, io1, standard and st1. | string |
"" |
no |
core_ebs_volumes_count | Number of volumes to attach to the core nodes. | string |
"" |
no |
core_group_instance_count | Number of Amazon EC2 instances used to execute the job flow. | string |
"" |
no |
core_instance_type | The EC2 instance type of the core nodes. | string |
"" |
no |
emr_additional_core_sg_id | Security group ID of the EMR Additional Core Security Group. | string |
"" |
no |
emr_additional_master_sg_id | Security group ID of the EMR Additional Master Security Group. | string |
"" |
no |
emr_cluster_name_prefix | A prefix to add to the name of created EMR Spark clusters | string |
"tamr-emr-" |
no |
emr_instance_profile_name | Name of instance profile for EMR EC2 instances. | string |
"" |
no |
emr_key_pair_name | Name of the Key Pair that will be attached to the EMR EC2 instances. | string |
"" |
no |
emr_managed_core_sg_id | Security group ID of the EMR Managed Core Security Group. | string |
"" |
no |
emr_managed_master_sg_id | Security group ID of the EMR Managed Master Security Group. | string |
"" |
no |
emr_release_label | The release label for the Amazon EMR release. | string |
"emr-5.29.0" |
no |
emr_root_volume_size | The size, in GiB, of the EBS root device volume of the Linux AMI that is used for each EMR EC2 instance. | string |
"10" |
no |
emr_service_access_sg_id | Security group ID of EMR Service Access Security Group. | string |
"" |
no |
emr_service_role_name | Name of IAM service role for EMR cluster. | string |
"" |
no |
emr_subnet_id | ID of the subnet where the EMR cluster will be created. | string |
"" |
no |
emr_tags | Map of tags to add to new resources in EMR | map(string) |
{} |
no |
emrfs_dynamodb_table_name | Name for the EMRFS DynamoDB table. | string |
"" |
no |
es_enabled | Whether or not to enable Elasticsearch by setting TAMR_ES_ENABLED flag | bool |
true |
no |
hbase_config_path | Path to HBase configuration in EMR root directory bucket. | string |
"config/hbase/conf.dist/" |
no |
hbase_namespace | n/a | string |
"tamr" |
no |
hbase_number_of_regions | Number of regions to create by default in HBase | string |
"1000" |
no |
hbase_number_of_salt_values | Number of distinct salt values to be used for prefixing row keys in HBase tables. Must be >= hbase_number_of_regions | string |
"1000" |
no |
hbase_storage_mode | Storage mode for HBase. Valid values: SHARED , DEDICATED |
string |
"SHARED" |
no |
master_ebs_size | The master EBS volume size, in gibibytes (GiB). | string |
"" |
no |
master_ebs_type | Type of volumes to attach to the master nodes. Valid options are gp2, io1, standard and st1. | string |
"" |
no |
master_ebs_volumes_count | Number of volumes to attach to the master nodes. | string |
"" |
no |
master_instance_type | The EC2 instance type of the master nodes. | string |
"" |
no |
rds_pg_db_port | The RDS postgres database port. | number |
5432 |
no |
rds_pg_dbname | RDS postgres database name. | string |
"doit" |
no |
rds_pg_username | Master username for RDS postgres database instance. | string |
"tamr" |
no |
rendered_config_path | If provided, the populated Tamr config will be output to this path. Include a file name (E.g. /path/to/config.yml). NOTE: Any required parent directories will be created automatically, and any existing file with the given name will be overwritten. | string |
"" |
no |
spark_driver_memory | n/a | string |
"5G" |
no |
spark_emr_cluster_id | Spark cluster ID. Value will not be used if deployment is spinning up ephemeral Spark clusters. | string |
"" |
no |
spark_executor_cores | n/a | number |
2 |
no |
spark_executor_instances | n/a | number |
2 |
no |
spark_executor_memory | n/a | string |
"8G" |
no |
tamr_backup_emr_cluster_id | ID of the static EMR cluster to run s3distcp on when backing up to or restoring from S3. | string |
"" |
no |
tamr_data_path | Path in root directory bucket (bucket provided for tamr_bucket_name input) to write data to. | string |
"tamr/unify-data" |
no |
tamr_external_storage_providers | Filesystem connection information for external storage providers. | string |
"" |
no |
tamr_file_based_hbase_backup_enabled | Whether to backup contents of HBase root directory to backup path | bool |
true |
no |
tamr_spark_config_override | A list of spark config overrides. If not set all jobs will run with the default spark settings. Used for setting job-by-job spark resource settings. | string |
"" |
no |
tamr_spark_properties_override | JSON blob of spark properties to override. If not set, will use a default set of properties that should work for most use cases. | string |
"" |
no |
tamr_unify_backup_aws_role_based_access | Set to true if Tamr should use EC2 instance profile (role-based) credentials instead of static credentials |
bool |
true |
no |
tamr_unify_backup_es | Defines whether or not to back up Elasticsearch | bool |
false |
no |
tamr_unify_backup_path | Identifies the path for storing backup files | string |
"tamr/backups" |
no |
Name | Description |
---|---|
rendered | Rendered Tamr config |
This repo is based on:
Run make terraform/docs
to generate the section of docs around terraform inputs, outputs and requirements.
Run make lint
, this will run terraform fmt, in addition to a few other checks to detect whitespace issues.
NOTE: this requires having docker working on the machine running the test
- Update version contained in
VERSION
- Document changes in
CHANGELOG.md
- Create a tag in github for the commit associated with the version
Apache 2 Licensed. See LICENSE for full details.