Skip to content

Commit

Permalink
fix: Regional DLP readme review (#216)
Browse files Browse the repository at this point in the history
* Adds documentation links and removes duplicated requirements

* add links and extra requirement

* remove unused variable

* add links for DNS configurations

* Adds period

Co-authored-by: Daniel da Silva Andrade <dandrade@ciandt.com>
  • Loading branch information
amandakarina and daniel-cit authored Dec 10, 2021
1 parent 66ed0a0 commit 6a1ba40
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 199 deletions.
203 changes: 15 additions & 188 deletions examples/regional-dlp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,204 +6,32 @@ It uses:

- The [Secured data warehouse](../../README.md) module to create the Secured data warehouse infrastructure,
- The [de-identification template](../../modules/de-identification-template/README.md) submodule to create the regional structured DLP template,
- A Dataflow flex template to deploy the de-identification job.
- The [Dataflow Flex Template Job](../../modules/dataflow-flex-job/README.md) submodule to deploy a Dataflow Python flex template de-identification job.

## Prerequisites

1. A `crypto_key` and `wrapped_key` pair. Contact your Security Team to obtain the pair. The `crypto_key` location must be the same location used for the `location` variable.
1. An Existing GCP Project
1. The [Secured data warehouse](../../README.md#requirements) module requirements to create the Secured data warehouse infrastructure.
1. A `crypto_key` and `wrapped_key` pair. Contact your Security Team to obtain the pair. The `crypto_key` location must be the same location used for the `location` variable. There is a [Wrapped Key Helper](../../helpers/wrapped-key/README.md) python script which generates a wrapped key.
1. The identity deploying the example must have permission to grant roles "roles/cloudkms.cryptoKeyDecrypter" and "roles/cloudkms.cryptoKeyEncrypter" in the KMS `crypto_key`. It will be granted to the Data ingestion Dataflow worker service account created by the Secured Data Warehouse module.
1. An Existing GCP Project.
1. A pre-build Python Regional DLP De-identification flex template. See [Flex templates](../../flex-templates/README.md).
1. The identity deploying the example must have permissions to grant role "roles/artifactregistry.reader" in the docker and python repos of the Flex templates.
1. A network and subnetwork in the data ingestion project [configured for Private Google Access](https://cloud.google.com/vpc/docs/configure-private-google-access).
1. A network and subnetwork in the Data Ingestion project [configured for Private Google Access](https://cloud.google.com/vpc/docs/configure-private-google-access).

### Firewall rules

- All the egress should be denied
- Allow only Restricted API Egress by TPC at 443 port
- Allow only Private API Egress by TPC at 443 port
- Allow ingress Dataflow workers by TPC at ports 12345 and 12346
- Allow egress Dataflow workers by TPC at ports 12345 and 12346
- [All the egress should be denied](https://cloud.google.com/vpc-service-controls/docs/set-up-private-connectivity#configure-firewall).
- [Allow only Restricted API Egress by TPC at 443 port](https://cloud.google.com/vpc-service-controls/docs/set-up-private-connectivity#configure-firewall).
- [Allow only Private API Egress by TPC at 443 port](https://cloud.google.com/vpc-service-controls/docs/set-up-private-connectivity#configure-firewall).
- [Allow ingress Dataflow workers by TPC at ports 12345 and 12346](https://cloud.google.com/dataflow/docs/guides/routes-firewall#example_firewall_ingress_rule).
- [Allow egress Dataflow workers by TPC at ports 12345 and 12346](https://cloud.google.com/dataflow/docs/guides/routes-firewall#example_firewall_egress_rule).

### DNS configurations

- Restricted Google APIs
- Private Google APIs
- Restricted gcr.io
- Restricted Artifact Registry

## Requirements

### Terraform plugins

- [Terraform](https://www.terraform.io/downloads.html) 0.13.x
- [terraform-provider-google](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_versions#google) plugin ~> v3.77.x
- [terraform-provider-google beta](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_versions#google-beta) plugin ~> v3.77.x

### Configured GCP project

### GCP user account

A user to run this code impersonating a service account with the following IAM roles:

- Project level:
- Service Account User: `roles/iam.serviceAccountUser`
- Service Account Token Creator: `roles/iam.serviceAccountTokenCreator`

You can use the following command to grant these roles, just replace the <project-id> placeholder
with your project id and <your-email-account> with your account:

```bash
export project_id=<project-id>
export user_account=<your-email-account>

gcloud projects add-iam-policy-binding ${project_id} \
--member="user:${user_account}" \
--role="roles/iam.serviceAccountUser"

gcloud projects add-iam-policy-binding ${project_id} \
--member="user:${user_account}" \
--role="roles/iam.serviceAccountTokenCreator"
```

#### A service account to run terraform

The Service Account which will be used to invoke this module must have the following IAM roles:

- Project level:
- Bigquery Admin: `roles/bigquery.admin`
- Storage Admin: `roles/storage.admin`
- Pub/Sub Admin: `roles/pubsub.admin`
- Service Account User: `roles/iam.serviceAccountUser`
- Create Service Accounts: `roles/iam.serviceAccountCreator`
- Delete Service Accounts: `roles/iam.serviceAccountDeleter`
- Service Accounts Token Creator: `roles/iam.serviceAccountTokenCreator`
- Security Reviewer: `roles/iam.securityReviewer`
- Compute Network Admin: `roles/compute.networkAdmin`
- Compute Security Admin: `roles/compute.securityAdmin`
- DNS Admin: `roles/dns.admin`
- Artifact Registry Administrator: `roles/artifactregistry.admin`
- Cloud KMS Admin: `roles/cloudkms.admin`
- Dataflow Developer: `roles/dataflow.developer`
- DLP User: `roles/dlp.user`
- DLP De-identify Templates Editor: `roles/dlp.deidentifyTemplatesEditor`
- Organization level
- Billing User: `roles/billing.user`
- Organization Policy Administrator: `roles/orgpolicy.policyAdmin`
- Access Context Manager Admin: `roles/accesscontextmanager.policyAdmin`
- Organization Administrator: `roles/resourcemanager.organizationAdmin`
- Organization Shared VPC Admin: `roles/compute.xpnAdmin`
- VPC Access Admin: `roles/vpcaccess.admin`

You can use the following command to grant these roles, just replace the placeholders with the correct values.

```bash
export project_id=<project-id>
export organization_id=<organization-id>
export sa_email=<service-account-email>

gcloud organizations add-iam-policy-binding ${organization_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/billing.user"

gcloud organizations add-iam-policy-binding ${organization_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/orgpolicy.policyAdmin"

gcloud organizations add-iam-policy-binding ${organization_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/accesscontextmanager.policyAdmin"

gcloud organizations add-iam-policy-binding ${organization_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/resourcemanager.organizationAdmin"

gcloud organizations add-iam-policy-binding ${organization_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/compute.xpnAdmin"

gcloud organizations add-iam-policy-binding ${organization_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/vpcaccess.admin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/storage.admin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/pubsub.admin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/compute.networkAdmin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/compute.securityAdmin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/bigquery.admin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/dns.admin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/iam.serviceAccountCreator"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/iam.serviceAccountDeleter"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/iam.serviceAccountTokenCreator"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/iam.serviceAccountUser"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/browser"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/artifactregistry.admin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/cloudkms.admin"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/dataflow.developer"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/dlp.deidentifyTemplatesEditor"

gcloud projects add-iam-policy-binding ${project_id} \
--member="serviceAccount:${sa_email}" \
--role="roles/dlp.user"
```

#### Set up Access Policy Context Policy

Obtain the value for the `access_context_manager_policy_id` variable. It can be obtained by running `gcloud access-context-manager policies list --organization YOUR-ORGANIZATION-ID --format="value(name)"`.

If the command return no value, you will need to [create the access context manager policy](https://cloud.google.com/access-context-manager/docs/create-access-policy) for the organization.

```
gcloud access-context-manager policies create \
--organization YOUR-ORGANIZATION-ID --title POLICY_TITLE
```

**Troubleshooting:**
If your user does not have the necessary roles to run the commands above you can [impersonate](https://cloud.google.com/iam/docs/impersonating-service-accounts) the terraform service account that will be used in the deploy by appending `--impersonate-service-account=<sa-email>` to the commands to be run.

- [Restricted Google APIs](https://cloud.google.com/vpc-service-controls/docs/set-up-private-connectivity#configure-routes).
- [Private Google APIs](https://cloud.google.com/vpc/docs/configure-private-google-access).
- [Restricted gcr.io](https://cloud.google.com/vpc-service-controls/docs/set-up-gke#configure-dns).
- [Restricted Artifact Registry](https://cloud.google.com/vpc-service-controls/docs/set-up-gke#configure-dns).

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Inputs
Expand All @@ -222,7 +50,6 @@ If your user does not have the necessary roles to run the commands above you can
| flex\_template\_gs\_path | The Google Cloud Storage gs path to the JSON file built flex template that supports DLP de-identification. | `string` | `""` | no |
| location | The location of Artifact registry. Run `gcloud artifacts locations list` to list available locations. | `string` | `"us-east4"` | no |
| network\_administrator\_group | Google Cloud IAM group that reviews network configuration. Typically, this includes members of the networking team. | `string` | n/a | yes |
| network\_self\_link | The URI of the network where Dataflow is going to be deployed. | `string` | n/a | yes |
| non\_confidential\_data\_project\_id | The ID of the project in which the Bigquery will be created. | `string` | n/a | yes |
| org\_id | GCP Organization ID. | `string` | n/a | yes |
| perimeter\_additional\_members | The list of all members to be added on perimeter access, except the service accounts created by this module. Prefix user: (user:email@email.com) or serviceAccount: (serviceAccount:my-service-account@email.com) is required. | `list(string)` | n/a | yes |
Expand Down
5 changes: 0 additions & 5 deletions examples/regional-dlp/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,6 @@ variable "terraform_service_account" {
type = string
}

variable "network_self_link" {
description = "The URI of the network where Dataflow is going to be deployed."
type = string
}

variable "subnetwork_self_link" {
description = "The URI of the subnetwork where Dataflow is going to be deployed."
type = string
Expand Down
1 change: 0 additions & 1 deletion test/fixtures/regional-dlp/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ module "regional_dlp_example" {
terraform_service_account = var.terraform_service_account
access_context_manager_policy_id = var.access_context_manager_policy_id
flex_template_gs_path = var.python_de_identify_template_gs_path
network_self_link = var.data_ingestion_network_self_link[1]
subnetwork_self_link = var.data_ingestion_subnets_self_link[1]
delete_contents_on_destroy = true
perimeter_additional_members = []
Expand Down
5 changes: 0 additions & 5 deletions test/fixtures/regional-dlp/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,6 @@ variable "non_confidential_data_project_id" {
type = list(string)
}

variable "data_ingestion_network_self_link" {
description = "The URI of the network where Dataflow is going to be deployed."
type = list(string)
}

variable "data_ingestion_subnets_self_link" {
description = "The URI of the subnetwork where Dataflow is going to be deployed."
type = list(string)
Expand Down

0 comments on commit 6a1ba40

Please sign in to comment.