Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd: avoid creating large leases #96836

Closed
mborsz opened this issue Nov 24, 2020 · 9 comments
Closed

etcd: avoid creating large leases #96836

mborsz opened this issue Nov 24, 2020 · 9 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@mborsz
Copy link
Member

mborsz commented Nov 24, 2020

Currently we create a single etcd lease for each 1m of events (code). With high event throughput, this can create large number of objects reusing the same lease. While the lease_revoke operation in etcd is atomic, this blocks all other operations for long period of time.

Currently, in #96038 we are seeing occasional event etcd restarts. All of them happens approx. 1h after cluster start and correlates with lease_revoke operations on initial events. After the lease_revoke I see a number of errors like /health error; QGET failed etcdserver: request timed out (status code 503).

To fix this issue (blocking lease_revoke for a long time making health check fail), we shouldn't be creating large etcd leases.

Proposal: Let's introduce a limit of objects attached to a single lease. When the "prevLease" in leaseManager reaches object limit, we force starting a new one. The exact limit of objects needs to be determined (e.g. by running some scalability test with additional logs or by adding some new metric to kube-apiserver (what exactly?)).

/cc @wojtek-t

@mborsz mborsz added the kind/bug Categorizes issue or PR as related to a bug. label Nov 24, 2020
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 24, 2020
@k8s-ci-robot
Copy link
Contributor

@mborsz: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wojtek-t
Copy link
Member

/sig scalability

@k8s-ci-robot k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 24, 2020
@wojtek-t wojtek-t added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 24, 2020
@mborsz
Copy link
Member Author

mborsz commented Nov 25, 2020

In fact we may consider using leaseReuseDurationSeconds with lower value than 1m instead of introducing objects limit (maybe both). This will reduce number of objects per lease and also will spread deletion of all events over 1m (instead of scheduling deletion of all 1m of events at the same time).

@pacoxu
Copy link
Member

pacoxu commented Nov 25, 2020

How about make leaseReuseDurationSeconds configurable as a tuning option firstly?

@mborsz
Copy link
Member Author

mborsz commented Nov 25, 2020

How about make leaseReuseDurationSeconds configurable as a tuning option firstly?

Sounds reasonable to me.

@goku321
Copy link

goku321 commented Nov 28, 2020

/assign

@mborsz
Copy link
Member Author

mborsz commented Nov 30, 2020

After making leaseReuseDurationSeconds configurable, we still need to have better observability to be able to consciously tune this value:

  1. Adding a prometheus metric with a size (= number of objects) of a lease
  2. Log a warning if the size of lease exceeds some threshold (TBD)

The rationale for adding 2, while we have 1 is that e.g. in scalability tests the most problematic case is when we create a huge number of events in a short period of time on cluster bootstrap, before we have prometheus running. I think those tests can be a good starting point for parameter tuning.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2021
@wojtek-t
Copy link
Member

This has been addressed by linked PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants