-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD-199: bump etcd v3.4.16 #83
ETCD-199: bump etcd v3.4.16 #83
Conversation
In case of URLs that are synonyms, the current lexicographic sorting and compare of the URLs fails with frustrating errors. Make sure to do a full comparison between every set of PeerURLs before failing. Fixes etcd-io#11013
Use golang.org/x/sys/unix for F_OFD_* constants. This fixes the issue that F_OFD_GETLK was defined incorrectly, resulting in bugs such as moby/moby#31182 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
[3.4 backport] pkg/fileutil: fix F_OFD_ constants
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
vendor: bump gorilla/websocket
This fixes etcd being unable to send any message longer than 64 KB as a notification over the websocket. This was because the older version of grpc-websocket-proxy was used and WithMaxRespBodyBufferSize option wasn't set.
etcdserver: Fix 64 KB websocket notification message limit
…de health check in debug level ref. etcd-io#12677 ref. etcd-io@0b9cfa8
[Backport-3.4] etcdserver/api/etcdhttp: log successful etcd server side health check in debug level
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Manual cherry pick of etcd-io#12448 on release 3.4
There are situations where we don't wish to fsync but we do want to write the data. Typically this occurs in clusters where fsync latency (often the result of firmware) transiently spikes. For Kubernetes clusters this causes (many) elections which have knock-on effects such that the API server will transiently fail causing other components fail in turn. By writing the data (buffered and asynchronously flushed, so in most situations the write is fast) and avoiding the fsync we no longer trigger this situation and opportunistically write out the data. Anecdotally: Because the fsync is missing there is the argument that certain types of failure events will cause data corruption or loss, in testing this wasn't seen. If this was to occur the expectation is the member can be readded to a cluster or worst-case restored from a robust persisted snapshot. The etcd members are deployed across isolated racks with different power feeds. An instantaneous failure of all of them simultaneously is unlikely. Testing was usually of the form: * create (Kubernetes) etcd write-churn by creating replicasets of some 1000s of pods * break/fail the leader Failure testing included: * hard node power-off events * disk removal * orderly reboots/shutdown In all cases when the node recovered it was able to rejoin the cluster and synchronize.
When using --unsafe-no-fsync still write out the data
The integration jobs fail with timeouts slightly over 3s, increase this marginally so false failures are less prevalent.
integration: relax leader timeout from 3s to 4s
…tion etcdserver: Fix PeerURL validation
Manual cherry-pick of 9571325 for release-3.4.
etcdserver: fix incorrect metrics generated when clients cancel watches
As go 1.12.2 is what is tested in CI as well as recommended to be built with 1.12.2 we should also pin to this in the go directive version.
[release-3.4]: Pin go version in go.mod to 1.12
Currently in CI the tests are only run with go v1.12, this adds also go v1.15.11. Excludes certain variants for v1.15.
This patch is needed due to go 1.15 erroring on: "Setctty set but Ctty not valid in child".
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
To make it easier to root-cause when /health check fails. For example, we are using load balancer to health check each etcd instance, and when one etcd node gets terminated, it's hard to tell whether etcd "server" was really failing or client (or load balancer") failed to reach the etcd cluster which is also failure in load balancer health check. Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
…r side health check in debug level
…s not part of member list and dataDir exists Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
infra route53 /test configmap-scale |
/test e2e-aws-upgrade |
/test e2e-aws-serial |
/test configmap-scale |
/retest |
/test configmap-scale |
1 similar comment
/test configmap-scale |
Sadly there is an AWS limit we reached, they are still working on it, so not sure when this will pass. Maybe worth waiting until Monday to retest again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
🎉
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hexfusion, lilic The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test configmap-scale |
route53 ... |
infra .............. /test configmap-scale |
/retest Please review the full test history for this PR and help us cut down flakes. |
This PR bumps etcd to v3.4.16 from v3.4.14 using golang 1.12 and is part of a multi PR migration[1] to get to etcd 3.5.0.
v3.4.16 (2021-05-11)
See code changes and v3.4 upgrade guide for any breaking changes.
etcd server
--experimental-warning-apply-duration
flag which allows apply duration threshold to be configurable.--unsafe-no-fsync
to still write-out data avoiding corruption (most of the time).Metrics
v3.4.15 (2021-02-26)
See code changes and v3.4 upgrade guide for any breaking changes.
etcd server
Package
fileutil
F_OFD_
constants.Dependency
gorilla/websocket
to v1.4.2.[1] #85