Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create all statefulsets immediately #581

Closed
wants to merge 16 commits into from

Conversation

gregwebs
Copy link
Contributor

@gregwebs gregwebs commented Jun 13, 2019

TODO

  • run stability tests
  • decide on renaming wait-for-pd

As per #469 this creates statefulsets and pending pods immediately to help autoscalers scale more quickly.

Design doc is here

A blocking init container wait-for-pd is used to prevent TiKV and TiDB from immediately starting up. This code calls the PD API directly. For TiDB it will call the new status is_initialized API if it is available.

Error handling is also generally improved in the member managers: as a side effect this may fix bugs where the controller sync loop is not re-ran during errors.

The PD API was updated to 3.0 which seems to be backwards compatible for our usage (more testing needed).

Testing notes

I have created 50+ clusters with the operator that has the changes from the first commit. It has the desired effect for autoscaling. The second commit is still being tested.

What is changed and how does it work?

  • When PD is not available for TiKV, or TiKV is not available for TiDB, go ahead and create the statefulset.
  • Don't break from the update function immediately when a RequeueError is seen.

Check List

Tests

  • Unit test: using existing tests
  • E2E test: using existing e2e tests
  • Stability test: ?
  • Manual test: create single clusters of various sizes (multiple cluster creation often runs into issues with the current operator). Test with pd/tikv/tidb version 3.0.0-rc1 and version 3.0.0

Side effects

Increased image size
Originally this change meant shipping another 50M binary. However, overall image size is smaller since this PR was originally started. This work was merged in other Pull requests:

Pending Pod warnings

  • May surprise existing users that pre-provision capacity and monitor for pending pods. Alerts that alert for pending pods could get triggered.

Upgrade procedure

  • When upgrading the tidb-operator image from before this change one must first upgrade the tidb-operator helm chart templates. This is expected, but it could surprise someone since it often isn't necessary.

The PD API was updated to 3.0 which seems to be backwards compatible for our usage. We will probably want to see some additional testing here.

Related changes

  • Need to update the documentation

Does this PR introduce a user-facing change?

Yes, to the extent users look at pod creation (which we have in our tutorials).

Create TiKV/TiDB pending pods immediately to help autoscalers scale more quickly. They wait for their dependencies (e.g. PD) to be ready with an Init container, so they will show up as in state Init while waiting.

@gregwebs gregwebs requested review from weekface, cofyc and tennix June 13, 2019 22:58
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch from 03a6fd1 to 5e5d615 Compare June 16, 2019 13:24
@gregwebs gregwebs changed the title [WIP] create all statefulsets immediately create all statefulsets immediately Jun 18, 2019
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch from d3732eb to 6fdfd02 Compare June 24, 2019 23:34
@pingcap pingcap deleted a comment from tennix Jun 24, 2019
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch 8 times, most recently from f36c244 to bbe51cc Compare June 25, 2019 22:17
@pingcap pingcap deleted a comment from tennix Jun 25, 2019
pkg/apis/pdapi/pdapi.go Outdated Show resolved Hide resolved
pkg/controller/pd_control.go Outdated Show resolved Hide resolved
@gregwebs
Copy link
Contributor Author

gregwebs commented Jul 5, 2019

The usage of PD APIs in wait-for-pd should be updated. For TiDB 3.0 the TiDB component should check the PD api cluster_status for the is_initialized flag as per #586

Create the statefulsets immediately.
A blocking init container wait-for-pd is used to prevent TiKV and TiDB from immediately starting up.
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch 2 times, most recently from 9966e1f to 4ffef82 Compare July 9, 2019 23:32
This gives us the ClusterVersion field for the Config API.
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch from 4ffef82 to a79e221 Compare July 9, 2019 23:55
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch from a79e221 to 07e3ee4 Compare July 10, 2019 00:07
@gregwebs
Copy link
Contributor Author

I see an issue where the operator will expand the tidb statefulset when it is waiting in init for a long time

demo-discovery-5bb58f7bb5-wklmq   1/1     Running    0          18m
demo-monitor-6f9c9fdffb-59bkp     1/1     Running    0          18m
demo-pd-0                         1/1     Running    0          18m
demo-pd-1                         1/1     Running    0          18m
demo-pd-2                         1/1     Running    1          18m
demo-tidb-0                       0/2     Init:0/1   0          18m
demo-tidb-1                       0/2     Init:0/1   0          18m
demo-tidb-2                       0/2     Init:0/1   0          9m23s
demo-tidb-3                       0/2     Init:0/1   0          9m23s
demo-tidb-4                       0/2     Init:0/1   0          4m23s
demo-tikv-0                       0/1     Pending    0          18m
demo-tikv-1                       1/1     Running    0          18m
demo-tikv-2                       1/1     Running    0          18m

@gregwebs
Copy link
Contributor Author

When this came out of the init state the extra tidb pods did get terminated.

@gregwebs
Copy link
Contributor Author

okay, I added some code to handle that case

This only works on newer versions of TiDB
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch from e50a407 to 02e8b73 Compare July 25, 2019 15:46
@CLAassistant
Copy link

CLAassistant commented Aug 9, 2019

CLA assistant check
All committers have signed the CLA.

cmd/wait-for-pd/main.go Show resolved Hide resolved
pkg/manager/member/member.go Show resolved Hide resolved
@gregwebs gregwebs force-pushed the create-statefulset-immediately branch from 7d7825c to 37fdfea Compare September 13, 2019 18:12
@aylei
Copy link
Contributor

aylei commented Dec 23, 2019

@gregwebs Cut of release-1.1 branch is targeted at December 31th, could you please confirm whether this feature is expected to be releases in 1.1 or should be re-targeted to 1.2? Thanks!

@gregwebs
Copy link
Contributor Author

Probably needs to be 1.2

@aylei
Copy link
Contributor

aylei commented Dec 25, 2019

Okay, removed from v1.1 milestone for now, feel free to add it back if this is necessary to land in v1.1

@aylei aylei removed this from the v1.1.0 milestone Dec 25, 2019
@github-actions
Copy link

This pr is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days

@github-actions github-actions bot closed this Mar 15, 2020
yahonda pushed a commit that referenced this pull request Dec 27, 2021
* zh: update warning about deleting PVs

Signed-off-by: Ran <huangran@pingcap.com>

* Update zh/configure-storage-class.md

Co-authored-by: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com>

Co-authored-by: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants