-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for critical GKE/GCP Terraform Bugs #1373
Conversation
/cc @chrisst who provided much of these Terraform cleanup / review previously. If you would like to take a look that would be great. |
@@ -23,24 +23,31 @@ data "google_client_config" "default" {} | |||
# Run `terraform taint null_resource.test-setting-variables` before second execution | |||
resource "null_resource" "test-setting-variables" { | |||
provisioner "local-exec" { | |||
command = "${format("echo Current variables set as following - name: %s, project: %s, machineType: %s, initialNodeCount: %s, zone: %s", | |||
command = <<EOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main reason for this cleanup was because the previous formatting totally broke the Terraform plugin for IntelliJ.
Also, I think it reads nicer?
Build Succeeded 👏 Build Id: 9a22ad46-4c02-42ee-bbeb-25dd481dac9b The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
@drichardson: changing LGTM is restricted to collaborators In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- network - the name of the network you want your cluster and firewall rules to reside (default is "default") | ||
|
||
{{% alert title="Warning" color="warning"%}} | ||
On the lines that read `source = "git::https://github.com/googleforgames/agones.git//install/terraform/modules/gke/?ref=master"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @aLekSer had left this pointed at the master branch (rather than a release branch) on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on that, will provide more details soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a brittle approach to me? How do we ensure that we don't accidentally break production systems with development changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we can use both options. The master branch was used so that anyone can play and test terraform apply
with these modules before the first release of these terraform configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we keep optimising for our development, rather than our users -- we should really be optimising for our users.
We can build extra tools for ourselves to handle our development scenarios, but we shouldn't be foisting that onto our users, in my humble opinion.
releases. | ||
|
||
For example, if you are targeting release {{< release-branch >}}, then you will want to have | ||
`source = "git::https://github.com/googleforgames/agones.git//install/terraform/modules/gke/?ref=release-{{< release-branch >}}"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the right approach, can we can do this automatically instead of asking users to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So stack overflow says this can't be dynamic by design (since terraform has to know the value is the same at init
and apply
time), but I didn't actually try it.
I'll try it and find out.
@@ -14,7 +14,7 @@ | |||
|
|||
|
|||
// Run: | |||
// terraform apply [-var agones_version="1.3.0"] | |||
// terraform apply -var project="<YOUR_GCP_ProjectID>" [-var agones_version="1.1.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the agones version to something older? If anything, what about using 1.4.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copy/pasted the previous command, but you are right, should update it to latest. Will do 👍
@@ -87,6 +87,17 @@ Configurable parameters: | |||
- agones_version - the version of agones to install (default is the latest version from the [Helm repository](https://agones.dev/chart/stable)) | |||
- machine_type - machine type for hosting game servers (default is "n1-standard-4") | |||
- node_count - count of game server nodes for the default node pool (default is "4") | |||
- network - the name of the network you want your cluster and firewall rules to reside (default is "default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't customizable in the gke-local or gke-module. They both hard code the network to default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's a good point! I was thinking of the terraform/gke module. Will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From @chrisst 's comments, actually added them to the module definitions - so leaving this in, and adding zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First want to say that overall I agree with these changes.
But after fetching them, I got an issue with previously working make target:
make gcloud-terraform-cluster
Error: No configuration files
Apply requires configuration to be present.
Ah good catch. I will fix this. I figure long term, this make target will go away, as it will get merged into our |
// Update ?ref= to the agones release you are installing. For example, ?ref=release-1.3.0 corresponds | ||
// to Agones version 1.3.0 | ||
// *************************************************************************************************** | ||
source = "git::https://github.com/googleforgames/agones.git//install/terraform/modules/gke/?ref=master" | ||
|
||
cluster = { | ||
"zone" = "us-west1-c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should make a var out of zone also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done.
} | ||
} | ||
|
||
module "helm_agones" { | ||
|
||
source = "../../../install/terraform/modules/helm" | ||
|
||
agones_version = var.agones_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next line:
values_file = ""
Should be changed to
values_file = var.values_file
and please create a new variable values_file
It should contain
-var values_file="../../../helm/agones/values.yaml"
when we install from current branch, not release master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why we want to do this?
Do we want to be sending users to look into our helm files? Seems like something they shouldn't need or want to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for developer tooling, we can fix this after 1.4, but what this is optional:
- there could be values from the helm chart;
- For development we should have an option to change something to values file and it would be reflected in the deployment
repository = "${data.helm_repository.agones.metadata.0.name}" | ||
chart = "${var.chart}" | ||
repository = data.helm_repository.agones.metadata.0.name | ||
chart = var.chart | ||
timeout = 420 | ||
|
||
# Use terraform of the latest >=0.12 version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"${length(var.values_file) == 0 ? "" : file("${var.values_file}")}",
This is where values_file
would be used (line 94)
https://github.com/googleforgames/agones/pull/1373/files#diff-56b861bb69f02c3b94e1de0aef454864R94
} | ||
} | ||
resource "google_container_cluster" "primary" { | ||
name = var.cluster["name"] | ||
location = var.cluster["zone"] | ||
project = var.cluster["project"] | ||
provider = google-beta | ||
network = var.cluster["network"] | ||
provider = google |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to specify this. The provider is effectively determined by string parsing the resource name so this resource will default to google
if not explicitly overridden.
Also FWIW we don't consider the beta provider as an unstable provider. Instead we just provide it as a way for people to call the beta GCP APIs if they are interested in using features/fields that are only available in the `beta version of the product. There are a few more details about the provider versions here. That being said if you aren't using beta features then there is no need to use the beta provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup and explanation 👍 removing the reference.
My concern with the google beta api surface isn't stable, was that if we happen to inadvertently use a beta feature. And since we aren't, there doesn't seem to be a good reason to take the risk.
|
||
cluster = { | ||
"zone" = "us-west1-c" | ||
"name" = var.name | ||
"machineType" = var.machine_type | ||
"initialNodeCount" = var.node_count | ||
"project" = var.project | ||
"network" = "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you planning on changing this to a var and setting the default in the variable instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Done.
@aLekSer looks like you are handling all the fixes for terraform.mk n the developer tooling? I figure it's fine to leave that broken for this PR and for stable release, since that's developer tooling, and it doesn't matter if we fix it post 1.4.0 - then post stable release we can look at fixing that after stable release. How do you feel about that? |
This fixes several critical issues, and also cleans up other aspect along the way. Primarily that the firewall rule would never allow traffic to the cluster, since they were erroneously configured and on a completely different network to the cluster. Bug Fixes: 1. Remove the creation of a separate network for each cluster 1. Make sure the firewall rules don't filter on tags, but target on network tags. 1. Be able to pass in a `network` argument (defaulting to "default"), such that if you want to run a cluster on a different network it is easy to configure - this means that the firewall rule is always on the same network as the cluster. 1. Controller deployment was set to "Always" image pull policy on both the controller and the sidecar. This should never be configured this way for production systems, and is therefore an unsafe default. Cleanup: 1. Remove duplicate helm.tf and cluster.tf in root ./install/terraform path, as they were diverging, and served no value. Both ./gke and ./gke-local installation scripts have also been updated, as well as the documentation. 1. Needed to upgrade Terraform in the build shell, since all modules used a later version. 1. Cleaned up deprecation comments 1. Removed the broken symlinks in ./build I would like to also make it easier to have only one GCP firewall rule for n number of clusters, but we can tackle that in a later PR. Closes #1372
af9b89a
to
ce97e66
Compare
Build Failed 😱 Build Id: 26ae2521-d4da-4658-86db-2605827175b1 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: be84611c-4f1f-42fe-b4d0-2441051336e5 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that approach, I assume that terraform.mk
could be fixed with this terraform configs structure.
Doing the fix in a separate PR
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aLekSer, drichardson, markmandel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
LGTM |
Build Succeeded 👏 Build Id: d89274bf-e741-45b7-bbac-5a3588364fd6 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
This fixes several critical issues, and also cleans up other aspect along the way. Primarily that the firewall rule would never allow traffic to the cluster, since they were erroneously configured and on a completely different network to the cluster. Bug Fixes: 1. Remove the creation of a separate network for each cluster 1. Make sure the firewall rules don't filter on tags, but target on network tags. 1. Be able to pass in a `network` argument (defaulting to "default"), such that if you want to run a cluster on a different network it is easy to configure - this means that the firewall rule is always on the same network as the cluster. 1. Controller deployment was set to "Always" image pull policy on both the controller and the sidecar. This should never be configured this way for production systems, and is therefore an unsafe default. Cleanup: 1. Remove duplicate helm.tf and cluster.tf in root ./install/terraform path, as they were diverging, and served no value. Both ./gke and ./gke-local installation scripts have also been updated, as well as the documentation. 1. Needed to upgrade Terraform in the build shell, since all modules used a later version. 1. Cleaned up deprecation comments 1. Removed the broken symlinks in ./build I would like to also make it easier to have only one GCP firewall rule for n number of clusters, but we can tackle that in a later PR. Closes googleforgames#1372
This fixes several critical issues, and also cleans up other aspect along the way. Primarily that the firewall rule would never allow traffic to the cluster, since they were erroneously configured and on a
completely different network to the cluster.
Bug Fixes:
network
argument (defaulting to "default"), such that if you want to run a cluster on a different network it is easy to configure - this means that the firewall rule is always on the same network as the cluster.google-beta
since it is not the stable provider.Cleanup:
I would like to also make it easier to have only one GCP firewall rule for n number of clusters, but we can tackle that in a later PR.
Closes #1372