Skip to content

Commit

Permalink
fix: update location and region handling to allow deploy outside of t…
Browse files Browse the repository at this point in the history
…he US (#226)

* delete unused region input

* make cloud build respect the region defined by the user

* make the creation of the app engine application  respect the region defined by the user

* fix usage of region and location in the examples

* inline zone local

* add note to explain region, location, and trusted_location

* replace region with pubsub_resource_location and remove unused outputs

* rename  input in READMEs
  • Loading branch information
daniel-cit authored Dec 15, 2021
1 parent 2800ec1 commit 16eaff7
Show file tree
Hide file tree
Showing 22 changed files with 95 additions and 65 deletions.
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ module "secured_data_warehouse" {
terraform_service_account = TERRAFORM_SERVICE_ACCOUNT
access_context_manager_policy_id = ACCESS_CONTEXT_MANAGER_POLICY_ID
bucket_name = DATA_INGESTION_BUCKET_NAME
region = REGION
pubsub_resource_location = PUBSUB_RESOURCE_LOCATION
location = LOCATION
trusted_locations = TRUSTED_LOCATIONS
dataset_id = DATASET_ID
confidential_dataset_id = CONFIDENTIAL_DATASET_ID
cmek_keyring_name = CMEK_KEYRING_NAME
Expand All @@ -47,6 +49,12 @@ module "secured_data_warehouse" {
}
```

**Note:** There are three inputs related to GCP Locations in the module:

- `pubsub_resource_location`: is used to define which GCP location will be used to [Restrict Pub/Sub resource locations](https://cloud.google.com/pubsub/docs/resource-location-restriction). This policy offers a way to ensure that messages published to a topic are never persisted outside of a Google Cloud regions you specify, regardless of where the publish requests originate. **Zones or multi-region locations are not supported**.
- `location`: is used to define which GCP region will be used for all other resources created: [Cloud Storage buckets](https://cloud.google.com/storage/docs/locations), [BigQuery datasets](https://cloud.google.com/bigquery/docs/locations), and [Cloud KMS key rings](https://cloud.google.com/kms/docs/locations). **Multi-region locations are supported**.
- `trusted_locations`: is a list of locations that are used to set an [Organization Policy](https://cloud.google.com/resource-manager/docs/organization-policy/defining-locations#location_types) that restricts the GCP locations that can be used in the projects of the Secured Data Warehouse. Both `pubsub_resource_location` and `location` must respect this restriction.

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Inputs

Expand Down Expand Up @@ -79,17 +87,17 @@ module "secured_data_warehouse" {
| delete\_contents\_on\_destroy | (Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present. | `bool` | `false` | no |
| key\_rotation\_period\_seconds | Rotation period for keys. The default value is 30 days. | `string` | `"2592000s"` | no |
| kms\_key\_protection\_level | The protection level to use when creating a key. Possible values: ["SOFTWARE", "HSM"] | `string` | `"HSM"` | no |
| location | The location for the KMS Customer Managed Encryption Keys, Bucket, and Bigquery dataset. This location can be a multiregion, if it is empty the region value will be used. | `string` | `""` | no |
| location | The location for the KMS Customer Managed Encryption Keys, Cloud Storage Buckets, and Bigquery datasets. This location can be a multi-region. | `string` | `"us-east4"` | no |
| network\_administrator\_group | Google Cloud IAM group that reviews network configuration. Typically, this includes members of the networking team. | `string` | n/a | yes |
| non\_confidential\_data\_project\_id | The ID of the project in which the Bigquery will be created. | `string` | n/a | yes |
| org\_id | GCP Organization ID. | `string` | n/a | yes |
| perimeter\_additional\_members | The list additional members to be added on perimeter access. Prefix user: (user:email@email.com) or serviceAccount: (serviceAccount:my-service-account@email.com) is required. | `list(string)` | `[]` | no |
| region | The region in which the resources will be deployed. | `string` | `"us-east4"` | no |
| pubsub\_resource\_location | The location in which the messages published to Pub/Sub will be persisted. This location cannot be a multi-region. | `string` | `"us-east4"` | no |
| sdx\_project\_number | The Project Number to configure Secure data exchange with egress rule for the dataflow templates. | `string` | n/a | yes |
| security\_administrator\_group | Google Cloud IAM group that administers security configurations in the organization(org policies, KMS, VPC service perimeter). | `string` | n/a | yes |
| security\_analyst\_group | Google Cloud IAM group that monitors and responds to security incidents. | `string` | n/a | yes |
| terraform\_service\_account | The email address of the service account that will run the Terraform code. | `string` | n/a | yes |
| trusted\_locations | This is a list of trusted regions where location-based GCP resources can be created. ie us-locations eu-locations. | `list(string)` | <pre>[<br> "us-locations",<br> "eu-locations"<br>]</pre> | no |
| trusted\_locations | This is a list of trusted regions where location-based GCP resources can be created. | `list(string)` | <pre>[<br> "us-locations"<br>]</pre> | no |
| trusted\_subnetworks | The URI of the subnetworks where resources are going to be deployed. | `list(string)` | `[]` | no |

## Outputs
Expand Down
3 changes: 1 addition & 2 deletions examples/batch-data-ingestion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ configure with the firewall rules and DNS configurations described below.
This examples uses a [csv file with sample data](./assets/cc_10000_records.csv) as input for the dataflow job.
You can create new files with different sizes using the [sample-cc-generator](../../helpers/sample-cc-generator/README.md) helper.
This new file must be placed in the [assets folder](./assets)
You need to change value of the local `cc_file_name` in the [main.tf](./main.tf#L23) file to use the new sample file:
You need to change value of the local `cc_file_name` in the [main.tf](./main.tf#L22) file to use the new sample file:

```hcl
locals {
Expand Down Expand Up @@ -84,7 +84,6 @@ since the Batch Dataflow job ends when the pipeline finishes .
| controller\_service\_account | The Service Account email that will be used to identify the VMs in which the jobs are running. |
| dataflow\_temp\_bucket\_name | The name of the dataflow temporary bucket. |
| df\_job\_network | The URI of the VPC being created. |
| df\_job\_region | The region of the newly created Dataflow job. |
| df\_job\_subnetwork | The name of the subnetwork used for create Dataflow job. |
| project\_id | The data ingestion project's ID. |
| scheduler\_id | Cloud Scheduler Job id created. |
Expand Down
2 changes: 1 addition & 1 deletion examples/batch-data-ingestion/httpRequest.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"jobName": "batch-dataflow-flow",
"environment": {
"maxWorkers": 5,
"zone": "${location}",
"zone": "${zone}",
"ipConfiguration": "WORKER_IP_PRIVATE",
"enableStreamingEngine": true,
"network": "${network_self_link}",
Expand Down
13 changes: 6 additions & 7 deletions examples/batch-data-ingestion/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
* limitations under the License.
*/
locals {
region = "us-east4"
location = "us-east4-a"
location = "us-east4"
schema_file = "schema.json"
transform_code_file = "transform.js"
dataset_id = "dts_data_ingestion"
Expand All @@ -26,7 +25,7 @@ locals {
httpRequestTemplate = templatefile(
"${path.module}/httpRequest.tmpl",
{
location = local.location,
zone = "us-east4-a",
network_self_link = var.network_self_link,
dataflow_service_account = module.data_ingestion.dataflow_controller_service_account_email,
subnetwork_self_link = var.subnetwork_self_link,
Expand All @@ -53,8 +52,8 @@ module "data_ingestion" {
terraform_service_account = var.terraform_service_account
access_context_manager_policy_id = var.access_context_manager_policy_id
bucket_name = "data-ingestion"
location = local.region
region = local.region
pubsub_resource_location = local.location
location = local.location
dataset_id = local.dataset_id
cmek_keyring_name = "cmek_keyring"
delete_contents_on_destroy = var.delete_contents_on_destroy
Expand Down Expand Up @@ -130,7 +129,7 @@ resource "google_cloud_scheduler_job" "scheduler" {
# Scheduler need App Engine enabled in the project to run, in the same region where it going to be deployed.
# If you are using App Engine in us-central, you will need to use as region us-central1 for Scheduler.
# You will get a resource not found error if just using us-central.
region = local.region
region = local.location
project = var.data_ingestion_project_id

http_target {
Expand All @@ -139,7 +138,7 @@ resource "google_cloud_scheduler_job" "scheduler" {
"Accept" = "application/json"
"Content-Type" = "application/json"
}
uri = "https://dataflow.googleapis.com/v1b3/projects/${var.data_ingestion_project_id}/locations/${local.region}/templates"
uri = "https://dataflow.googleapis.com/v1b3/projects/${var.data_ingestion_project_id}/locations/${local.location}/templates"
oauth_token {
service_account_email = module.data_ingestion.scheduler_service_account_email
}
Expand Down
5 changes: 0 additions & 5 deletions examples/batch-data-ingestion/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,6 @@ output "dataflow_temp_bucket_name" {
value = module.data_ingestion.data_ingestion_dataflow_bucket_name
}

output "df_job_region" {
description = "The region of the newly created Dataflow job."
value = local.region
}

output "df_job_network" {
description = "The URI of the VPC being created."
value = var.network_self_link
Expand Down
1 change: 1 addition & 0 deletions examples/bigquery-confidential-data/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ module "secured_data_warehouse" {
terraform_service_account = var.terraform_service_account
access_context_manager_policy_id = var.access_context_manager_policy_id
bucket_name = "data-ingestion"
pubsub_resource_location = local.location
location = local.location
dataset_id = local.non_confidential_dataset_id
confidential_dataset_id = local.confidential_dataset_id
Expand Down
3 changes: 1 addition & 2 deletions examples/dataflow-with-dlp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ It uses:
## Prerequisites

1. The [Secured data warehouse](../../README.md#requirements) module requirements to create the Secured data warehouse infrastructure.
1. A `crypto_key` and `wrapped_key` pair. Contact your Security Team to obtain the pair. The `crypto_key` location must be the same location where DLP, Storage and BigQuery are going to be created (`local.region`). There is a [Wrapped Key Helper](../../helpers/wrapped-key/README.md) python script which generates a wrapped key.
1. A `crypto_key` and `wrapped_key` pair. Contact your Security Team to obtain the pair. The `crypto_key` location must be the same location where DLP, Storage and BigQuery are going to be created (`local.location`). There is a [Wrapped Key Helper](../../helpers/wrapped-key/README.md) python script which generates a wrapped key.
1. The identity deploying the example must have permission to grant roles `roles/cloudkms.cryptoKeyDecrypter` and `roles/cloudkms.cryptoKeyEncrypter` in the KMS `crypto_key`. It will be granted to the Data ingestion Dataflow worker service account created by the Secured Data Warehouse module.
1. The identity deploying the example must have permission to grant role `roles/artifactregistry.reader` in the docker repo of the Flex templates.
1. A network and subnetwork in the data ingestion project [configured for Private Google Access](https://cloud.google.com/vpc/docs/configure-private-google-access).
Expand Down Expand Up @@ -78,7 +78,6 @@ locals {
| bucket\_data\_ingestion\_name | The name of the bucket. |
| controller\_service\_account | The Service Account email that will be used to identify the VMs in which the jobs are running. |
| df\_job\_subnetwork | The name of the subnetwork used for create Dataflow job. |
| dlp\_location | The location of the DLP resources. |
| project\_id | The project's ID. |
| template\_id | The ID of the Cloud DLP de-identification template that is created. |

Expand Down
12 changes: 7 additions & 5 deletions examples/dataflow-with-dlp/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/

locals {
region = "us-east4"
location = "us-east4"
dataset_id = "dts_data_ingestion"
cc_file_name = "cc_10000_records.csv"
cc_file_path = "${path.module}/assets"
Expand All @@ -34,6 +34,8 @@ module "data_ingestion" {
bucket_name = "data-ingestion"
dataset_id = local.dataset_id
cmek_keyring_name = "cmek_keyring"
pubsub_resource_location = local.location
location = local.location
delete_contents_on_destroy = var.delete_contents_on_destroy
perimeter_additional_members = var.perimeter_additional_members
data_engineer_group = var.data_engineer_group
Expand Down Expand Up @@ -65,7 +67,7 @@ module "de_identification_template" {
terraform_service_account = var.terraform_service_account
crypto_key = var.crypto_key
wrapped_key = var.wrapped_key
dlp_location = local.region
dlp_location = local.location
template_file = "${path.module}/deidentification.tmpl"
dataflow_service_account = module.data_ingestion.dataflow_controller_service_account_email

Expand All @@ -78,7 +80,7 @@ resource "google_artifact_registry_repository_iam_member" "docker_reader" {
provider = google-beta

project = var.external_flex_template_project_id
location = local.region
location = local.location
repository = "flex-templates"
role = "roles/artifactregistry.reader"
member = "serviceAccount:${module.data_ingestion.dataflow_controller_service_account_email}"
Expand All @@ -94,7 +96,7 @@ module "regional_dlp" {
project_id = var.data_ingestion_project_id
name = "regional-flex-java-gcs-dlp-bq"
container_spec_gcs_path = var.de_identify_template_gs_path
region = local.region
region = local.location
service_account_email = module.data_ingestion.dataflow_controller_service_account_email
subnetwork_self_link = var.subnetwork_self_link
kms_key_name = module.data_ingestion.cmek_data_ingestion_crypto_key
Expand All @@ -108,7 +110,7 @@ module "regional_dlp" {
datasetName = local.dataset_id
batchSize = 1000
dlpProjectId = var.data_governance_project_id
dlpLocation = local.region
dlpLocation = local.location
deidentifyTemplateName = module.de_identification_template.template_full_path

}
Expand Down
5 changes: 0 additions & 5 deletions examples/dataflow-with-dlp/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,6 @@ output "bucket_data_ingestion_name" {
value = module.data_ingestion.data_ingestion_bucket_name
}

output "dlp_location" {
description = "The location of the DLP resources."
value = local.region
}

output "template_id" {
description = "The ID of the Cloud DLP de-identification template that is created."
value = module.de_identification_template.template_id
Expand Down
14 changes: 8 additions & 6 deletions examples/regional-dlp/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

locals {
bq_schema = "book:STRING, author:STRING"
location = "us-east4"
}

module "data_ingestion" {
Expand All @@ -31,7 +32,8 @@ module "data_ingestion" {
bucket_name = "dlp-flex-data-ingestion"
dataset_id = "dlp_flex_data_ingestion"
cmek_keyring_name = "dlp_flex_data-ingestion"
region = "us-east4"
pubsub_resource_location = local.location
location = local.location
delete_contents_on_destroy = var.delete_contents_on_destroy
perimeter_additional_members = var.perimeter_additional_members
data_engineer_group = var.data_engineer_group
Expand All @@ -49,7 +51,7 @@ module "de_identification_template_example" {
dataflow_service_account = module.data_ingestion.dataflow_controller_service_account_email
crypto_key = var.crypto_key
wrapped_key = var.wrapped_key
dlp_location = "us-east4"
dlp_location = local.location
template_file = "${path.module}/templates/deidentification.tpl"

depends_on = [
Expand All @@ -61,7 +63,7 @@ resource "google_artifact_registry_repository_iam_member" "docker_reader" {
provider = google-beta

project = var.external_flex_template_project_id
location = "us-east4"
location = local.location
repository = "flex-templates"
role = "roles/artifactregistry.reader"
member = "serviceAccount:${module.data_ingestion.dataflow_controller_service_account_email}"
Expand All @@ -75,7 +77,7 @@ resource "google_artifact_registry_repository_iam_member" "python_reader" {
provider = google-beta

project = var.external_flex_template_project_id
location = "us-east4"
location = local.location
repository = "python-modules"
role = "roles/artifactregistry.reader"
member = "serviceAccount:${module.data_ingestion.dataflow_controller_service_account_email}"
Expand All @@ -92,7 +94,7 @@ module "regional_dlp" {
name = "regional-flex-python-pubsub-dlp-bq"
container_spec_gcs_path = var.flex_template_gs_path
job_language = "PYTHON"
region = "us-east4"
region = local.location
service_account_email = module.data_ingestion.dataflow_controller_service_account_email
subnetwork_self_link = var.subnetwork_self_link
kms_key_name = module.data_ingestion.cmek_data_ingestion_crypto_key
Expand All @@ -103,7 +105,7 @@ module "regional_dlp" {
parameters = {
input_topic = "projects/${var.data_ingestion_project_id}/topics/${module.data_ingestion.data_ingestion_topic_name}"
deidentification_template_name = "${module.de_identification_template_example.template_full_path}"
dlp_location = "us-east4"
dlp_location = local.location
dlp_project = var.data_governance_project_id
bq_schema = local.bq_schema
output_table = "${var.non_confidential_data_project_id}:${module.data_ingestion.data_ingestion_bigquery_dataset.dataset_id}.classical_books"
Expand Down
2 changes: 2 additions & 0 deletions examples/simple-example/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ module "secured_data_warehouse" {
bucket_name = "bucket_simple_example"
dataset_id = "dataset_simple_example"
cmek_keyring_name = "key_name_simple_example"
pubsub_resource_location = "us-east4"
location = "us-east4"
delete_contents_on_destroy = var.delete_contents_on_destroy
perimeter_additional_members = var.perimeter_additional_members
data_engineer_group = var.data_engineer_group
Expand Down
9 changes: 8 additions & 1 deletion examples/tutorial-standalone/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,14 @@ The required infrastructure includes:
- A Cloud KMS key
- A traffic encryption key for DLP Templates

This example will be deployed at the `us-east4` location, to deploy in another location change the local `location` in example [main.tf](./main.tf#L18) file.
## Google Cloud Locations

This example will be deployed at the `us-east4` location, to deploy in another location,
change the local `location` in the example [main.tf](./main.tf#L18) file.
By default, the Secured Data Warehouse module has an [Organization Policy](https://cloud.google.com/resource-manager/docs/organization-policy/defining-locations)
that only allows the creation of resource in `us-locations`.
To deploy in other locations, update the input [trusted_locations](../../README.md#inputs) with
the appropriated location in the call to the [main module](./main.tf#L33).

## Usage

Expand Down
2 changes: 2 additions & 0 deletions examples/tutorial-standalone/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ module "secured_data_warehouse" {
terraform_service_account = var.terraform_service_account
access_context_manager_policy_id = var.access_context_manager_policy_id
bucket_name = "data-ingestion"
pubsub_resource_location = local.location
location = local.location
trusted_locations = ["us-locations"]
dataset_id = local.non_confidential_dataset_id
confidential_dataset_id = local.confidential_dataset_id
cmek_keyring_name = "cmek_keyring"
Expand Down
Loading

0 comments on commit 16eaff7

Please sign in to comment.