Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for usability of GKE self managed setup script #759

Merged
merged 8 commits into from
Jun 4, 2019

Conversation

KatrinaHoffert
Copy link
Contributor

For issue #758.

This does several things to make this setup much easier. But in short, it allows a single command to be used to setup for the vast majority of cases (with just ./gke-self-managed.sh -n CLUSTER_NAME -z ZONE).

  1. The CLI flags are updated to actually work (ie, represent those used by the GLBC now).
  2. The gce.conf file is automatically populated from querying the cluster. All options can be instead provided as CLI flags if the querying gets them wrong or to speed things up slightly (the time to query is pretty negligible compared to the time to restart the cluster to turn off the default GLBC, though).
  3. The glbc.yaml file is automatically setup with the image instead of letting users create an invalid deployment that won't be caught till runtime.
  4. The script defaults to building and pushing the image, making the script usable for quickly testing changes. The image to use can be specified as a flag instead. The registry to use is customizable the same way make push does.
  5. We handle progress better. No more winging it by hoping the API server is up and running.

Opted not to make a test for it this ticket. We really should have some kinda test (since the script got terribly broken in 2 ways), but it's quite difficult to test given that it requires cluster ownership, needs push access to some container registry, requires an existing cluster, and is very slow no matter how you cut it (it needs to restart a cluster and realistically, a test would also need to create an ingress, which takes a few minutes). I'll make a new issue for testing it, but it's likely to be lower priority as the ROI is complicated.

This is a non-breaking change, since the existing gce.conf file is used as a template and if it were already filled in, those fields would just be ignored. The files that need modification are now copied to and .gitignored, too, so no risks of accidental commits or annoyance of having them show up in the git status.

@k8s-ci-robot
Copy link
Contributor

Hi @KatrinaHoffert. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 24, 2019
@k8s-ci-robot k8s-ci-robot requested review from bowei and freehan May 24, 2019 19:30
@KatrinaHoffert
Copy link
Contributor Author

KatrinaHoffert commented May 24, 2019

And yes, my bash skills are not that great. There's probably tons of room for micro-optimizations and all with the tools I use for filtering text and all. I tried to use gcloud's formatting as much as possible. I don't think it's a big area to nitpick, though, since the amount of time to update the cluster massively dominates the runtime.

There's some areas for future improvement that I did not do due to time constraints. Future improvements could be:

  1. Check the cluster actually exists, as some commands assume it does.
  2. Don't require the CWD to be the script's location.
  3. Better handling of resources already existing (can often be skipped to allow resuming).
  4. Stream the make output with tee.
  5. There's probably some gcloud --format magic that could be used to avoid that ugly cut for getting instance group names from the cluster.
  6. We use the output of gcloud container clusters describe up to three times and it could be cached.
  7. We should check the gcloud config stuff and permissions before we do anything.
  8. We are dependent on the default GLBC HTTP backend already existing. We probably don't need to (just allow a random node port in that case?).
  9. The heck is manual_glbc_provision?
  10. Shebang is not on first line.
  11. Most new commands I added don't handle errors.

@MrHohn
Copy link
Member

MrHohn commented May 24, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 24, 2019
.gitignore Outdated Show resolved Hide resolved
docs/deploy/resources/glbc.yaml Outdated Show resolved Hide resolved
docs/deploy/gke/gke-self-managed.sh Show resolved Hide resolved
docs/deploy/gke/gke-self-managed.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@rramkumar1 rramkumar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, looks good to me. Just a couple more nits before I will lgtm officially.

docs/deploy/resources/gce.conf Outdated Show resolved Hide resolved
docs/deploy/gke/README.md Outdated Show resolved Hide resolved
docs/deploy/gke/gke-self-managed.sh Outdated Show resolved Hide resolved
docs/deploy/gke/gke-self-managed.sh Show resolved Hide resolved
- Removed arguments that no longer exist and those that are defaults.
- Updated command to not depend on sh (not available by default). Use
  same pattern as most GKE containers do.
- Image URL now ensures that the YAML would be invalid (more obvious if
  you miss it).
All values are overridable in case the generation messes up.
We build and push automatically if the user doesn't provide
--image-url.
Better UX this way, too. Also more output about progress and
consistency in file names.
Also fixed details on what the script does. It was confusing before
and had some order wrong.
1. Restored confirmation for cluster being ready after disabling
   the default GLBC. Also made this affected by `--no-confirm`,
   since the intent of that was to allow non-interactive usage
   (cause forced interactive scripts are gross). Made that set
   `--quiet` for gcloud commands while I was at it, since some of
   those can confirm (and made it a versatile "provide arbitrary
   extra gcloud flags" thing). Only applied to mutating gcloud
   commands.
2. Build and push no longer implicit. As a result, either the
   `--image-url` or `--build-and-push` must be set. This is now
   validated.
3. Added validation for if CONTAINER_PREFIX is set while I was at
   it, since it can break build-and-push. I doubt anyone sets it,
   but better safe than sorry. This just avoids an inevitible error.
4. Placeholder used in GLBC YAML changed.
5. Using `.gen` as extension for generated files.
6. Make command is now logged so user sees what we're doing.
7. Clarified help text.
This is very useful for testing. It uses kubectl --dry-run where
possible, so we're actually testing the structure of the YAML files
(the creation of the key does not use this since it depends on
running a real command).

This command can also be used to quickly and cleanly get the commands
this script would run for manual usage.

Fixed bug with --cleanup  where it would ignore any args after the
--cleanup tag (which could result in running the wrong things!).
Also fixed dumb quote bug that wouldn't play nice with eval.
@KatrinaHoffert
Copy link
Contributor Author

All done, @rramkumar1

@rramkumar1
Copy link
Contributor

/lgtm

Thanks for all these changes!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KatrinaHoffert, rramkumar1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2019
@k8s-ci-robot k8s-ci-robot merged commit 688894c into kubernetes:master Jun 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants