Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GCP ADC for auth in Terraform Deployer #662

Closed
wants to merge 13 commits into from
Closed

Use GCP ADC for auth in Terraform Deployer #662

wants to merge 13 commits into from

Conversation

endorama
Copy link
Member

Terraform Google Provider allow to pass credentials file content in GOOGLE_CREDENTIALS env var.

More about this in the Terraform docs: https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#full-reference

Unfortunately this does not seem to work other GCP Cloud SDK components,
as I've not been able to find references to that environment variable.

As Terraform provider and components support ADC (Application Defatul
Credentials) this commit changes how Terraform Deployer setup
credentials to use ADC instead of Terraform dedicated variable.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 24, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-11T09:06:19.218+0000

  • Duration: 27 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 548
Skipped 0
Total 548

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@endorama endorama requested a review from mtojek January 24, 2022 16:19
@endorama
Copy link
Member Author

/test

1 similar comment
@endorama
Copy link
Member Author

/test

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to extend the terraform deployer with such a change, we have to add a test package here and make sure the flow works.

Let's decide first if this is the way we'd like to follow.

@endorama Could you please clarify (describe) your use case? Is there any PR we can look at?

@@ -16,6 +16,15 @@ cleanup() {
}
trap cleanup EXIT INT TERM

# Save GCP credentials on disk and perform authentication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform deployer is not used just for GCP, but also another cloud provider like AWS, so you can't add a provider-specific code in a common place without any conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry for the bad implementation :) This is an easy fix, adding a guard clause for presence of the GOOGLE_CREDENTIALS variable. Before adding it I'd like to discuss the other points.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't a matter of adding a guard, but is it really necessary to put it here? see: #662 (comment)

@@ -16,6 +16,15 @@ cleanup() {
}
trap cleanup EXIT INT TERM

# Save GCP credentials on disk and perform authentication
# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: have you considered using gcloud emulators? We're already using them here.

echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS"
gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS"
# NOTE: Terraform support authentication through GOOGLE_CREDENTIALS and usual gcloud ADC but other
# tools (like bq) don't support the first, so we always rely on gcloud ADC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please point me to the integration/system test, where you intend to use bq? I'm wondering if something changed since that time and maybe we're fine to use a terraform module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is being used in this PR elastic/integrations#2312

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a terraform resource google_bigquery_job that would allow doing this without bq cli, but it requires source data to be in a GCS bucket.
This approach has multiple disadvantages: it requires more infrastructure to be created which increase complexity, it prevent use of emulator and there is no support to waiting for the load job completion.

I tried asking in the GCP Community Slack if they were considering local file support but did not receive an answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has multiple disadvantages: it requires more infrastructure to be created which increase complexity

What do you mean by "more infrastructure"? It sounds easier to load data on S3, than refactoring service deployer :)

it prevent use of emulator and there is no support to waiting for the load job completion.

This is really unfortunate and without a proper waiter it won't be possible to perform it.

Thanks for the explainaton.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds easier to load data on S3, than refactoring service deployer :)

😅 totally true.

If google_bigquery_job was working as expected I would have chosen that way, because at least is predictable and all within terraform, but I think we could only have something fragile at the moment.

# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate
export "GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json"
echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS"
gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is required by bq, maybe we can execute it just before running bq?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we are going to use other tools, something possible given the new Docker image, I think it make sense to unify authentication so all tools leverage the same credentials and should not do it multiple times (with possible concurrency risks).
Unifying authentication also prevents that Terraform and tools run with different credentials, something that would make the CI harder to debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the pro and cons of moving authentication where bq is used. The main cons I see is around how this affects authentication globally. gcloud auth state is globally shared across all instances of gcloud (and related tools). So doing that within terraform code would create side effects that may be unintuitive or difficult to debug.

What is the downside of the current approach? (given that it does not trigger always but only when it makes sense)

Copy link
Contributor

@mtojek mtojek Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the downside of the current approach? (given that it does not trigger always but only when it makes sense)

I think you included the answer here. If there is a way of not introducing the coupling on auth, why should we do this? Please remember that it isn't a dedicated GCP service.

gcloud auth state is globally shared across all instances of gcloud (and related tools). So doing that within terraform code would create side effects that may be unintuitive or difficult to debug.

It means that you can call it this way:

gcloud auth login
bq
gcloud auth revoke

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are at risk of race conditions if multiple resources are using this commands. Terraform runs in parallel and this command affects a global state.

If there is a way of not introducing the coupling on auth, why should we do this?

My reasoning about this is that coupling is already present as in "terraform requires credentials to run", so if I use it with GCP resources is coupled with GCP auth, if I use it with AWS is coupled with AWS auth. How could we avoid this? I agree that ideally credentials would be passed in env vars and we should not rely on other tools.

Please remember that it isn't a dedicated GCP service.

Could we consider having a dedicated deployer for gcp? Would this help untangling this?

# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate
export "GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json"
echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS"
gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any default location, so that cred-file is not needed to be defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find any documentation allowing to use a default location, the argument is required as is like providing username/password.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it looks like we need to confirm that it works as expected. It would be great if you prepare a similar test package like the special aws. It's the AWS integration with stripped data streams, only one left with a system test using Terraform service deployer to create the EC2 machine.

Here is my proposal on the plan:

You need to do a similar exercise with GCP. You can use this PR to push the sample package, but please make sure it's stripped and CI discovers it. Then, please you can execute the system test, but I'm afraid that it may fail due to configured environment variables. It's about modifying the Jenkinsfile to enable those envs. Then, we should observe some issues around bq and terraform service deployer, and eliminate them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it looks like we need to confirm that it works as expected.

May you clarify what should be confirmed?

What's the goal for this test package? Is to test if tests are being able to be executed correctly through terraform deployer?

Copy link
Contributor

@mtojek mtojek Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the goal - to verify if the terraform service deployer works correctly with Google Cloud: auth, bq, Jenkins changes.

Otherwise, we will end up with ping-backs between both repositories (Integrations and elastic-package), when something doesn't work as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek I added the test package, the pipeline is green but is it really running some tests? https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-662/8/pipeline/279

@mtojek mtojek requested a review from a team January 25, 2022 10:15
@endorama endorama self-assigned this Jan 27, 2022
@mtojek mtojek mentioned this pull request Feb 2, 2022
@mtojek
Copy link
Contributor

mtojek commented Feb 2, 2022

@endorama Please pull the latest main branch.

@endorama
Copy link
Member Author

endorama commented Feb 8, 2022

The CI is failing because the GCP package does not pass the lint check. This is the shown error:

Error: checking package failed: linting package failed: found 4 validation errors:
   1. item [.gitignore] is not allowed in folder [.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]
   2. item [.terraform.lock.hcl] is not allowed in folder [.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]
   3. item [billing-schema.json] is not allowed in folder [.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]
   4. item [test-data.ndjson.tftpl] is not allowed in folder **[.../test/packages/parallel/gcp/data_stream/billing/_dev/deploy/tf]**

File 3 and 4 are mandatory.
File 2 has been added because Terraform suggest to do it for dependency management. In my opinion is a great idea as it makes the build reproducible.
File 1 has been added to prevent local folder or state files to be included.

@mtojek I'll open an issue in https://github.com/elastic/package-spec to discuss this further.

@endorama
Copy link
Member Author

I updated github.com/elastic/package-spec to v1.4.1 for testing waiting for #693

Terraform Google Provider allow to pass credentials file content in
GOOGLE_CREDENTIALS env var.

More about this in the Terraform docs: https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#full-reference

Unfortunately this does not seem to work other GCP Cloud SDK components,
as I've not been able to find references to that environment variable.

As Terraform provider and components support ADC (Application Defatul
Credentials) this commit changes how Terraform Deployer setup
credentials to use ADC instead of Terraform dedicated variable.
With the addition of the `gcp_auth` function the logic flow was split
into multiple pieces.

This commit introduce a main like area where all logic is handled.
.gitignore is not currently supported in package-spec, so package
validation fails.

See elastic/package-spec#273
fi
}

if [[ "${BASH_SOURCE[0]}" = "$0" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this condition required here?

Copy link
Member Author

@endorama endorama Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not required, but it avoid running the code in case the file is sourced and identify a "main" like are, like the main function in go or C. From that point onward all code will be executed only when the file is explicitly executed (bash file.sh or ./file.sh). It helps grouping the relevant "main" code in the same area, preventing arbitrary code being added between functions.

Consider it a good way of writing BASH files, as it aids reading them (like __file__ == __main__ does for Python)

Copy link
Contributor

@mtojek mtojek Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point of view, but please keep it simple as it was before. This file is intended to be called by container engine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is run by container engine but is still read by developers :) The point was to simplifying reading the code flow. Anyway I removed it (will push once rebased onto main)

@@ -2,25 +2,41 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any container logs for the GCP test package in beats-ci-temp-internal/Ingest-manager/elastic-package/PR-662-16/insecure-logs/gcp. Is it intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No is not, from the pipeline logs it seems it not running system tests (command start at line 244, hidden by the task header)

I set up the env.yml file in data_stream/billing/_dev/deploy/tf. Is this not enough to run system tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked your branch and you haven't configured any test policies. Test policies are used by elastic-package during system tests. Otherwise the elastic-package won't know what you re trying to test. It's covered in our manual.

Please take a look at the AWS test package and ec2_metrics tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test package with first test case (not requiring changes in this PR) added in #701

@mtojek mtojek marked this pull request as draft February 11, 2022 15:05
@mtojek
Copy link
Contributor

mtojek commented Feb 11, 2022

@endorama I converted this PR to draft as it's still in progress (missing tests, unaddressed comments).

@mtojek mtojek requested a review from jsoriano February 11, 2022 15:06
@endorama endorama mentioned this pull request Feb 15, 2022
@mtojek
Copy link
Contributor

mtojek commented Mar 16, 2022

@endorama Are you going to rebase this PR as we merged the other one?

@mtojek
Copy link
Contributor

mtojek commented Sep 23, 2022

Closing it for now as it's a stale PR.

@mtojek mtojek closed this Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants