ETCD-199: bump etcd v3.4.16 #83

hexfusion · 2021-06-16T19:46:47Z

This PR bumps etcd to v3.4.16 from v3.4.14 using golang 1.12 and is part of a multi PR migration[1] to get to etcd 3.5.0.

v3.4.16 (2021-05-11)

See code changes and v3.4 upgrade guide for any breaking changes.

etcd server

Add --experimental-warning-apply-duration flag which allows apply duration threshold to be configurable.
Fix --unsafe-no-fsync to still write-out data avoiding corruption (most of the time).
Reduce around 30% memory allocation by logging range response size without marshal.
Add exclude alarms from health check conditionally.

Metrics

Fix incorrect metrics generated when clients cancel watches back-ported from (etcdserver: fix incorrect metrics generated when clients cancel watches etcd-io/etcd#12196).

v3.4.15 (2021-02-26)

See code changes and v3.4 upgrade guide for any breaking changes.

etcd server

Package `fileutil`

Fix F_OFD_ constants.

Dependency

Bump up gorilla/websocket to v1.4.2.

[1] #85

In case of URLs that are synonyms, the current lexicographic sorting and compare of the URLs fails with frustrating errors. Make sure to do a full comparison between every set of PeerURLs before failing. Fixes etcd-io#11013

Use golang.org/x/sys/unix for F_OFD_* constants. This fixes the issue that F_OFD_GETLK was defined incorrectly, resulting in bugs such as moby/moby#31182 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

[3.4 backport] pkg/fileutil: fix F_OFD_ constants

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

vendor: bump gorilla/websocket

This fixes etcd being unable to send any message longer than 64 KB as a notification over the websocket. This was because the older version of grpc-websocket-proxy was used and WithMaxRespBodyBufferSize option wasn't set.

etcdserver: Fix 64 KB websocket notification message limit

…de health check in debug level ref. etcd-io#12677 ref. etcd-io@0b9cfa8

[Backport-3.4] etcdserver/api/etcdhttp: log successful etcd server side health check in debug level

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

Manual cherry pick of etcd-io#12448 on release 3.4

There are situations where we don't wish to fsync but we do want to write the data. Typically this occurs in clusters where fsync latency (often the result of firmware) transiently spikes. For Kubernetes clusters this causes (many) elections which have knock-on effects such that the API server will transiently fail causing other components fail in turn. By writing the data (buffered and asynchronously flushed, so in most situations the write is fast) and avoiding the fsync we no longer trigger this situation and opportunistically write out the data. Anecdotally: Because the fsync is missing there is the argument that certain types of failure events will cause data corruption or loss, in testing this wasn't seen. If this was to occur the expectation is the member can be readded to a cluster or worst-case restored from a robust persisted snapshot. The etcd members are deployed across isolated racks with different power feeds. An instantaneous failure of all of them simultaneously is unlikely. Testing was usually of the form: * create (Kubernetes) etcd write-churn by creating replicasets of some 1000s of pods * break/fail the leader Failure testing included: * hard node power-off events * disk removal * orderly reboots/shutdown In all cases when the node recovered it was able to rejoin the cluster and synchronize.

When using --unsafe-no-fsync still write out the data

The integration jobs fail with timeouts slightly over 3s, increase this marginally so false failures are less prevalent.

integration: relax leader timeout from 3s to 4s

…tion etcdserver: Fix PeerURL validation

Manual cherry-pick of 9571325 for release-3.4.

etcdserver: fix incorrect metrics generated when clients cancel watches

As go 1.12.2 is what is tested in CI as well as recommended to be built with 1.12.2 we should also pin to this in the go directive version.

[release-3.4]: Pin go version in go.mod to 1.12

Currently in CI the tests are only run with go v1.12, this adds also go v1.15.11. Excludes certain variants for v1.15.

This patch is needed due to go 1.15 erroring on: "Setctty set but Ctty not valid in child".

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

To make it easier to root-cause when /health check fails. For example, we are using load balancer to health check each etcd instance, and when one etcd node gets terminated, it's hard to tell whether etcd "server" was really failing or client (or load balancer") failed to reach the etcd cluster which is also failure in load balancer health check. Signed-off-by: Gyuho Lee <leegyuho@amazon.com>

…r side health check in debug level

…s not part of member list and dataDir exists Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

Reconciling with https://github.com/openshift/ocp-build-data/tree/691e628254f318ce56efda5edc7448ec743c37b8/images/ose-etcd.yml

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

hexfusion · 2021-06-17T17:27:09Z

infra route53

/test configmap-scale

hexfusion · 2021-06-17T19:09:49Z

failed to acquire lease for "aws-quota-slice": resources not found

/test e2e-aws-upgrade

hexfusion · 2021-06-17T19:10:19Z

failed to acquire lease for "aws-quota-slice": resources not found

/test e2e-aws-serial

hexfusion · 2021-06-17T22:26:49Z

/test configmap-scale

lilic · 2021-06-18T06:21:41Z

/retest

rsevilla87 · 2021-06-18T10:23:24Z

/test configmap-scale

rsevilla87 · 2021-06-18T11:30:56Z

/test configmap-scale

lilic · 2021-06-18T11:51:37Z

Sadly there is an AWS limit we reached, they are still working on it, so not sure when this will pass. Maybe worth waiting until Monday to retest again?

lilic

/lgtm
/approve

🎉

openshift-ci · 2021-06-18T12:12:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, lilic

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hexfusion]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hexfusion · 2021-06-18T12:22:46Z

/test configmap-scale

hexfusion · 2021-06-18T13:18:51Z

route53 ...
/test configmap-scale

hexfusion · 2021-06-18T15:05:52Z

infra ..............

/test configmap-scale

openshift-bot · 2021-06-18T18:08:18Z

/retest

Please review the full test history for this PR and help us cut down flakes.

dbavatar and others added 30 commits September 16, 2019 11:49

etcdserver: Fix PeerURL validation

3b8f812

In case of URLs that are synonyms, the current lexicographic sorting and compare of the URLs fails with frustrating errors. Make sure to do a full comparison between every set of PeerURLs before failing. Fixes etcd-io#11013

pkg/fileutil: fix F_OFD_ constants

bea35fd

Use golang.org/x/sys/unix for F_OFD_* constants. This fixes the issue that F_OFD_GETLK was defined incorrectly, resulting in bugs such as moby/moby#31182 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Merge pull request etcd-io#12551 from kolyshkin/3.4-fix-lock

0880605

[3.4 backport] pkg/fileutil: fix F_OFD_ constants

vendor: bump gorilla/websocket

becc228

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

Merge pull request etcd-io#12645 from hexfusion/bump-dep

d51c6c6

vendor: bump gorilla/websocket

etcdserver: Fix 64 KB websocket notification message limit

a40f14d

This fixes etcd being unable to send any message longer than 64 KB as a notification over the websocket. This was because the older version of grpc-websocket-proxy was used and WithMaxRespBodyBufferSize option wasn't set.

Merge pull request etcd-io#12402 from vitalif/release-3.4

a1c5f59

etcdserver: Fix 64 KB websocket notification message limit

[Backport-3.4] etcdserver/api/etcdhttp: log successful etcd server si…

f27ef4d

…de health check in debug level ref. etcd-io#12677 ref. etcd-io@0b9cfa8

Merge pull request etcd-io#12679 from chaochn47/backport_3.4_#12677

3be9460

[Backport-3.4] etcdserver/api/etcdhttp: log successful etcd server side health check in debug level

version: 3.4.15

aa71268

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>

server: Added config parameter experimental-warning-apply-duration

9aeabe4

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

Merge pull request etcd-io#12740 from hexfusion/cp-12448--release-3.4

afd6d8a

Manual cherry pick of etcd-io#12448 on release 3.4

Merge pull request etcd-io#12751 from cwedgwood/nofsyncdowrite

2702f9e

When using --unsafe-no-fsync still write out the data

integration: relax leader timeout from 3s to 4s

c499d9b

The integration jobs fail with timeouts slightly over 3s, increase this marginally so false failures are less prevalent.

Merge pull request etcd-io#12816 from cwedgwood/3.4-relax-gate-timeout

16fe9a8

integration: relax leader timeout from 3s to 4s

Merge pull request etcd-io#12815 from dbavatar/release-3.4-peervalida…

30799c9

…tion etcdserver: Fix PeerURL validation

etcdserver: fix incorrect metrics generated when clients cancel watches

656dc63

Manual cherry-pick of 9571325 for release-3.4.

Merge pull request etcd-io#12803 from cwedgwood/metrics-3.4

82eae92

etcdserver: fix incorrect metrics generated when clients cancel watches

go.mod: Pin go to 1.12 version

ef415e3

As go 1.12.2 is what is tested in CI as well as recommended to be built with 1.12.2 we should also pin to this in the go directive version.

go.sum, go.mod: Run go mod tidy with go 1.12

8557cb2

vendor: Run go mod vendor

b19eb0f

pkpkg/testutil/leak.go: Allowlist created by testing.runTests.func1

91bed2e

Merge pull request etcd-io#12839 from lilic/fix-go-version

b7e5f5b

[release-3.4]: Pin go version in go.mod to 1.12

.travis.yml: Test with go v1.15.11

62596fa

Currently in CI the tests are only run with go v1.12, this adds also go v1.15.11. Excludes certain variants for v1.15.

integration,raft,tests: Comply with go v1.15 gofmt

35bd924

etcdserver,wal: Convert int to string using rune()

0b7e418

go.mod,go.sum: Comply with go v1.15

cfc08e5

go.mod,go.sum: Bump github.com/creack/pty that includes patch

4276c33

This patch is needed due to go 1.15 erroring on: "Setctty set but Ctty not valid in child".

vendor: Run go mod vendor

eeefd61

hexfusion and others added 7 commits June 17, 2021 11:49

UPSTREAM: <carry>: *: ensure zap logger is set before use

c235e61

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

UPSTREAM: <carry>: server: add support for log rotation (etcd-io#12774)

490e5c6

UPSTREAM: <carry>: etcdserver/api/etcdhttp: log successful etcd serve…

b10b841

…r side health check in debug level

DOWNSTREAM: <carry>: discover-etcd-initial-cluster: retry if member i…

4312a12

…s not part of member list and dataDir exists Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

DOWNSTREAM: <carry>: Updating ose-etcd images to be consistent with ART

162058e

Reconciling with https://github.com/openshift/ocp-build-data/tree/691e628254f318ce56efda5edc7448ec743c37b8/images/ose-etcd.yml

DOWNSTREAM: <carry>: vendor: tidy

0956f6d

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

hexfusion force-pushed the bump-3.4.16 branch from fa0139a to 0956f6d Compare June 17, 2021 15:50

hexfusion mentioned this pull request Jun 17, 2021

plan for migration to 3.5.0 in openshift-4.9 #85

Closed

8 tasks

hexfusion changed the title ~~bump etcd v3.4.16~~ ETCD-199: bump etcd v3.4.16 Jun 17, 2021

hexfusion removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2021

lilic approved these changes Jun 18, 2021

View reviewed changes

openshift-ci bot assigned lilic Jun 18, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2021

openshift-merge-robot merged commit 4fd092b into openshift:openshift-4.9 Jun 18, 2021

hexfusion deleted the bump-3.4.16 branch June 18, 2021 18:16

hexfusion mentioned this pull request Jun 18, 2021

ETCD-200: ci-operator/config/openshift/etcd: use golang 1.15 openshift/release#19457

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD-199: bump etcd v3.4.16 #83

ETCD-199: bump etcd v3.4.16 #83

hexfusion commented Jun 16, 2021 •

edited

Loading

hexfusion commented Jun 17, 2021

hexfusion commented Jun 17, 2021

hexfusion commented Jun 17, 2021

hexfusion commented Jun 17, 2021

lilic commented Jun 18, 2021

rsevilla87 commented Jun 18, 2021

rsevilla87 commented Jun 18, 2021

lilic commented Jun 18, 2021

lilic left a comment

openshift-ci bot commented Jun 18, 2021

hexfusion commented Jun 18, 2021

hexfusion commented Jun 18, 2021

hexfusion commented Jun 18, 2021

openshift-bot commented Jun 18, 2021

ETCD-199: bump etcd v3.4.16 #83

ETCD-199: bump etcd v3.4.16 #83

Conversation

hexfusion commented Jun 16, 2021 • edited Loading

v3.4.16 (2021-05-11)

etcd server

Metrics

v3.4.15 (2021-02-26)

etcd server

Package fileutil

Dependency

hexfusion commented Jun 17, 2021

hexfusion commented Jun 17, 2021

hexfusion commented Jun 17, 2021

hexfusion commented Jun 17, 2021

lilic commented Jun 18, 2021

rsevilla87 commented Jun 18, 2021

rsevilla87 commented Jun 18, 2021

lilic commented Jun 18, 2021

lilic left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jun 18, 2021

hexfusion commented Jun 18, 2021

hexfusion commented Jun 18, 2021

hexfusion commented Jun 18, 2021

openshift-bot commented Jun 18, 2021

hexfusion commented Jun 16, 2021 •

edited

Loading

Package `fileutil`