-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GCP ADC for auth in Terraform Deployer #662
Conversation
/test |
1 similar comment
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to extend the terraform deployer with such a change, we have to add a test package here and make sure the flow works.
Let's decide first if this is the way we'd like to follow.
@endorama Could you please clarify (describe) your use case? Is there any PR we can look at?
@@ -16,6 +16,15 @@ cleanup() { | |||
} | |||
trap cleanup EXIT INT TERM | |||
|
|||
# Save GCP credentials on disk and perform authentication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terraform deployer is not used just for GCP, but also another cloud provider like AWS, so you can't add a provider-specific code in a common place without any conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry for the bad implementation :) This is an easy fix, adding a guard clause for presence of the GOOGLE_CREDENTIALS
variable. Before adding it I'd like to discuss the other points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't a matter of adding a guard, but is it really necessary to put it here? see: #662 (comment)
@@ -16,6 +16,15 @@ cleanup() { | |||
} | |||
trap cleanup EXIT INT TERM | |||
|
|||
# Save GCP credentials on disk and perform authentication | |||
# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought: have you considered using gcloud emulators? We're already using them here.
echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS" | ||
gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS" | ||
# NOTE: Terraform support authentication through GOOGLE_CREDENTIALS and usual gcloud ADC but other | ||
# tools (like bq) don't support the first, so we always rely on gcloud ADC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please point me to the integration/system test, where you intend to use bq
? I'm wondering if something changed since that time and maybe we're fine to use a terraform module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is being used in this PR elastic/integrations#2312
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a terraform resource google_bigquery_job that would allow doing this without bq
cli, but it requires source data to be in a GCS bucket.
This approach has multiple disadvantages: it requires more infrastructure to be created which increase complexity, it prevent use of emulator and there is no support to waiting for the load job completion.
I tried asking in the GCP Community Slack if they were considering local file support but did not receive an answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach has multiple disadvantages: it requires more infrastructure to be created which increase complexity
What do you mean by "more infrastructure"? It sounds easier to load data on S3, than refactoring service deployer :)
it prevent use of emulator and there is no support to waiting for the load job completion.
This is really unfortunate and without a proper waiter it won't be possible to perform it.
Thanks for the explainaton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds easier to load data on S3, than refactoring service deployer :)
😅 totally true.
If google_bigquery_job
was working as expected I would have chosen that way, because at least is predictable and all within terraform, but I think we could only have something fragile at the moment.
# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate | ||
export "GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json" | ||
echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS" | ||
gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is required by bq
, maybe we can execute it just before running bq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we are going to use other tools, something possible given the new Docker image, I think it make sense to unify authentication so all tools leverage the same credentials and should not do it multiple times (with possible concurrency risks).
Unifying authentication also prevents that Terraform and tools run with different credentials, something that would make the CI harder to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the pro and cons of moving authentication where bq
is used. The main cons I see is around how this affects authentication globally. gcloud auth
state is globally shared across all instances of gcloud
(and related tools). So doing that within terraform code would create side effects that may be unintuitive or difficult to debug.
What is the downside of the current approach? (given that it does not trigger always but only when it makes sense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the downside of the current approach? (given that it does not trigger always but only when it makes sense)
I think you included the answer here. If there is a way of not introducing the coupling on auth, why should we do this? Please remember that it isn't a dedicated GCP service.
gcloud auth state is globally shared across all instances of gcloud (and related tools). So doing that within terraform code would create side effects that may be unintuitive or difficult to debug.
It means that you can call it this way:
gcloud auth login
bq
gcloud auth revoke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are at risk of race conditions if multiple resources are using this commands. Terraform runs in parallel and this command affects a global state.
If there is a way of not introducing the coupling on auth, why should we do this?
My reasoning about this is that coupling is already present as in "terraform requires credentials to run", so if I use it with GCP resources is coupled with GCP auth, if I use it with AWS is coupled with AWS auth. How could we avoid this? I agree that ideally credentials would be passed in env vars and we should not rely on other tools.
Please remember that it isn't a dedicated GCP service.
Could we consider having a dedicated deployer for gcp? Would this help untangling this?
# NOTE: this is required for bq (and maybe other gcloud related tools) to authenticate | ||
export "GOOGLE_APPLICATION_CREDENTIALS=/root/.config/gcloud/application_default_credentials.json" | ||
echo "$GOOGLE_CREDENTIALS" > "$GOOGLE_APPLICATION_CREDENTIALS" | ||
gcloud auth login --cred-file "$GOOGLE_APPLICATION_CREDENTIALS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any default location, so that cred-file
is not needed to be defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find any documentation allowing to use a default location, the argument is required as is like providing username/password.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so it looks like we need to confirm that it works as expected. It would be great if you prepare a similar test package like the special aws. It's the AWS integration with stripped data streams, only one left with a system test using Terraform service deployer to create the EC2 machine.
Here is my proposal on the plan:
You need to do a similar exercise with GCP. You can use this PR to push the sample package, but please make sure it's stripped and CI discovers it. Then, please you can execute the system test, but I'm afraid that it may fail due to configured environment variables. It's about modifying the Jenkinsfile to enable those envs. Then, we should observe some issues around bq
and terraform service deployer, and eliminate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so it looks like we need to confirm that it works as expected.
May you clarify what should be confirmed?
What's the goal for this test package? Is to test if tests are being able to be executed correctly through terraform deployer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the goal - to verify if the terraform service deployer works correctly with Google Cloud: auth, bq, Jenkins changes.
Otherwise, we will end up with ping-backs between both repositories (Integrations and elastic-package), when something doesn't work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek I added the test package, the pipeline is green but is it really running some tests? https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-662/8/pipeline/279
@endorama Please pull the latest main branch. |
The CI is failing because the GCP package does not pass the lint check. This is the shown error:
File 3 and 4 are mandatory. @mtojek I'll open an issue in https://github.com/elastic/package-spec to discuss this further. |
I updated |
Terraform Google Provider allow to pass credentials file content in GOOGLE_CREDENTIALS env var. More about this in the Terraform docs: https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#full-reference Unfortunately this does not seem to work other GCP Cloud SDK components, as I've not been able to find references to that environment variable. As Terraform provider and components support ADC (Application Defatul Credentials) this commit changes how Terraform Deployer setup credentials to use ADC instead of Terraform dedicated variable.
With the addition of the `gcp_auth` function the logic flow was split into multiple pieces. This commit introduce a main like area where all logic is handled.
.gitignore is not currently supported in package-spec, so package validation fails. See elastic/package-spec#273
fi | ||
} | ||
|
||
if [[ "${BASH_SOURCE[0]}" = "$0" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this condition required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not required, but it avoid running the code in case the file is source
d and identify a "main" like are, like the main
function in go or C. From that point onward all code will be executed only when the file is explicitly executed (bash file.sh
or ./file.sh
). It helps grouping the relevant "main" code in the same area, preventing arbitrary code being added between functions.
Consider it a good way of writing BASH files, as it aids reading them (like __file__ == __main__
does for Python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point of view, but please keep it simple as it was before. This file is intended to be called by container engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is run by container engine but is still read by developers :) The point was to simplifying reading the code flow. Anyway I removed it (will push once rebased onto main)
@@ -2,25 +2,41 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any container logs for the GCP test package in beats-ci-temp-internal/Ingest-manager/elastic-package/PR-662-16/insecure-logs/gcp
. Is it intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No is not, from the pipeline logs it seems it not running system tests (command start at line 244, hidden by the task header)
I set up the env.yml
file in data_stream/billing/_dev/deploy/tf
. Is this not enough to run system tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked your branch and you haven't configured any test policies. Test policies are used by elastic-package during system tests. Otherwise the elastic-package won't know what you re trying to test. It's covered in our manual.
Please take a look at the AWS test package and ec2_metrics tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test package with first test case (not requiring changes in this PR) added in #701
@endorama I converted this PR to draft as it's still in progress (missing tests, unaddressed comments). |
@endorama Are you going to rebase this PR as we merged the other one? |
Closing it for now as it's a stale PR. |
Terraform Google Provider allow to pass credentials file content in
GOOGLE_CREDENTIALS
env var.More about this in the Terraform docs: https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#full-reference
Unfortunately this does not seem to work other GCP Cloud SDK components,
as I've not been able to find references to that environment variable.
As Terraform provider and components support ADC (Application Defatul
Credentials) this commit changes how Terraform Deployer setup
credentials to use ADC instead of Terraform dedicated variable.