-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple clusters management in EKS #616
Conversation
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
deploy/aws/eks/manifests/crd.yaml
Outdated
@@ -0,0 +1,103 @@ | |||
apiVersion: apiextensions.k8s.io/v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we symlink this instead of copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this module is supposed to be workable outside this repository. But it is definitely not a good idea to keep a copy here.
How about kubectl appy -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.0.0-beta.3/manifests/crd.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to work outside this repository? I have been referencing manifests with symlinks in the GCP terraform. The deployment doesn't work without a good way to reference our charts and manifests.
The suggested update is problematic because it will fall out of sync. Better would be to have a variable "manifests" which could be set to ../../manifests
or https://raw.githubusercontent.com/pingcap/tidb-operator/v1.0.0-beta.3
.
Does it help to move the manifests directory under the deploy directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to work outside this repository? I have been referencing manifests with symlinks in the GCP terraform. The deployment doesn't work without a good way to reference our charts and manifests.
Cases that use these modules outside this repository:
- terraform store
state
in the working dir, users may copy this module to manage different EKS instances (one of our colleagues have tried this and got hurt by the symlink) - for advanced users, it is possible to compose the
tidb-operator
module andtidb-cluster
module into their own terraform scripts. A self-contained module will be easier to be composed in this case
Does it help to move the manifests directory under the deploy directory?
Yes, to some extent. But the manifest
directory is still necessary because the ebs-gp2
storage class and local-volume-provisioner.yaml
are dedicated to AWS. I've considered use a small overlay yaml to customize the base local-volume-provisioner.yaml
via kusomize, but this has a low priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used kustomize for GCP, it was quite easy: https://github.com/pingcap/tidb-operator/tree/master/deploy/gcp/manifests/local-ssd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With respect to state, it seems that users are solving a problem by copying and in response we are copying :)
I want to get away from copying as a solution to anything. The way I solve this is by having users instantiate a module, rather than just changing a terraform variable files. A user instantiates the module in a directory representing their environment (staging, production). The usability is about the same for one instantiation, but when you do multiple everything is much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in our case we are missing a nice mechanism to depend on our manifest files. The only way I can see how to do it is to use the file
function to read the file in (and we can write it out to a new local file to avoid memory consumption). But then I think we still end up needing the entire git repo, so it doesn't really seem better than a symlink.
Co-Authored-By: Greg Weber <greg@gregweber.info>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
@@ -38,6 +38,8 @@ spec: | |||
- key: dedicated | |||
operator: Exists | |||
effect: "NoSchedule" | |||
nodeSelector: | |||
localVolume: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local-ssd
would be a slightly better name. GKE adds this tag automatically now: cloud.google.com/gke-local-ssd
deploy/aws/aws-tutorial.tfvars
Outdated
default_cluster_cluster_name = "aws_tutorial" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster name should not contain underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
deploy/aws/README.md
Outdated
} | ||
``` | ||
|
||
## Multiple Cluster Management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Multiple Cluster Management | |
## Multiple TiDB Cluster Management |
Signed-off-by: Aylei <rayingecho@gmail.com>
As per my comments above, I would like to demonstrate usage that is safe for multiple environments. The good news is we have already created a module, we just need to show how to use it as one! The only change we need to make is that all the usages of Then the usage as module is just a new directory with a file
The problem with this is then the suggestion to create multiple TiDB clusters by editing clusters.tf. I suggest we have the user instantiate the tidb-cluster modules themselves. We can move all our top-level terraform to a separate directory, perhaps called "vpc-setup". Then we have a file
|
Great work on this, the above comment shows how easy it is to start changing things around now that things have been properly modularized :) |
Yes, and (luckily) all the The example is great and I'm going to add it to the README👍 |
Glad that makes sense. I think if we add it to the README but also explain how to modify the vpc-setup module directly it will lead to problems where someone will have already modified a module directly. I don't know how to fix the state file at that point to move it to the multiple environment setup. |
IMHO, this UX is more applicable for advanced users with hands-on terraform experience, because after all a little (maybe a lot) glue codes are necessary.
We should document that users should avoid reusing the top-level module because it is not modularized. Then the state of the top-level module will be stored in |
Signed-off-by: Aylei <rayingecho@gmail.com>
The glue code is almost entirely default variables. But those are all just used in clusters.tf so we can move the default values into the clusters module itself. We can also provide a default environment to use. So the user only needs to |
Yes, but the user has already made edits to the top-level module directly. This will end up affecting |
I mean, the top-level module is indeed an example of composing the sub modules. Whether this module located in |
Let me demonstrate if I get you correct now: Maybe a better UX is moving the VPC setup to a separate sub module, and moving the current top-level module to the
Then:
Does this make sense? |
Agreed, I think we can merge this PR now and improve the UX in following PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@aylei yes, that is a good module organization. |
What problem does this PR solve?
Support multiple TiDB clusters management in EKS with some updates per #575
What is changed and how does it work?
The control plane (Kubernetes master and tidb-operator) and data plane (node pools and helm release) are separate into two different modules:
tidb-operator
andtidb-cluster
, a top-level module compose onetidb-operator
module and one or moretidb-cluster
modules to support multiple cluster management.See the README of top-level module, README of tidb-operator module and README of tidb-cluster module for detail.
Check List
Tests
Code changes
Related changes
Does this PR introduce a user-facing change?:
Limitations
.tfvars
, so it is inevitable to edit thecluster.tf
directly for multiple cluster management now;@tennix @jlerche @gregwebs PTAL, most of the works are done by @tennix in bd2343f, I did some clean up and re-organization of code in the follow up commits.
@jlerche I changed the
aws-tutorial.tfvars
to keep compatibility, hope this won't break your tutorial😅