Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade etcd dependency to v3.3.15 #7403

Closed
wants to merge 2 commits into from
Closed

Conversation

jsok
Copy link
Contributor

@jsok jsok commented Sep 2, 2019

Contains an important fix in clientv3 that allows vault to
successfully failover to another etcdv3 endpoint in the event that the
current active connection becomes unavailable.

See also:

Fixes #4961
Fixed #4349

@jsok
Copy link
Contributor Author

jsok commented Sep 2, 2019

Looks like ASF git is having issues right now: https://status.apache.org/incidents/63030p4241xj

@xtrusia
Copy link

xtrusia commented Sep 17, 2019

Hello, Have you tested this commit your test env?
I'm interested in this commit as well. and I'm looking forward to this PR can be merged.

ADDED 1

I still have the issue jsok mentioned.

fatal: unable to access 'https://git.apache.org/thrift.git/': Failed to connect to git.apache.org port 443: Connection timed out

ADDED 2
I solved #1 with export GOPROXY setting but I needed to set below version instead of 0.5.0

google.golang.org/api v0.6.0

@xtrusia
Copy link

xtrusia commented Sep 17, 2019

Hello, Have you tested this commit your test env?
I'm interested in this commit as well. and I'm looking forward to this PR can be merged.

ADDED 1

I still have the issue jsok mentioned.

fatal: unable to access 'https://git.apache.org/thrift.git/': Failed to connect to git.apache.org port 443: Connection timed out

ADDED 2
I solved #1 with export GOPROXY setting but I needed to set below version instead of 0.5.0

google.golang.org/api v0.6.0

Even If I use this branch, It is not working great after shutting down one etcd node.
it takes over 20mins to recover.

@jefferai
Copy link
Member

I get a very different set of go.mod changes when I run go get go.etcd.io/etcd.

@jsok
Copy link
Contributor Author

jsok commented Sep 18, 2019

I get a very different set of go.mod changes when I run go get go.etcd.io/etcd.

vendoring etcd is a bit confusing to me as well - i've tried to pull master in at the commish where 3.3.15 was released, as opposed to using the 3.3.15 tag as between 3.3.14 and 3.3.15 they flip flopped between glide, go.mod and back to glide again 😕

We also don't want to just pull from HEAD of master as that's now 3.4.x stream of code and probably larger an upgrade than we need for this fix.

@jefferai
Copy link
Member

If you pull in from the 3.3.15 tag it is still marked as incompatible but does update deps. I think that's likely the right path for now, assuming you can verify that the pulled in code works properly.

Copy link

@xtrusia xtrusia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my test, test was ok only when #5297 is patched to source.

Added, this is wrong, shutdown test is failed.

golang.org/x/crypto v0.0.0-20190701094942-4def268fd1a4
golang.org/x/net v0.0.0-20190620200207-3b0461eec859
golang.org/x/net v0.0.0-20190813000000-74dc4d7220e7
golang.org/x/oauth2 v0.0.0-20190402181905-9f3314589c9a
google.golang.org/api v0.5.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI This should be v0.6.0 in my test.

go.mod Outdated
@@ -42,7 +42,7 @@ require (
github.com/go-test/deep v1.0.2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for thrift error, as master branch, we need to put below before require block

replace git.apache.org/thrift.git => github.com/apache/thrift v0.12.0

Contains an important fix in clientv3 that allows vault to
successfully failover to another etcdv3 endpoint in the event that the
current active connection becomes unavailable.

See also:
 * etcd-io/etcd#9949
 * etcd-io/etcd#10911
 * https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md#v3314-2019-08-16

Fixes hashicorp#4961
@jefferai
Copy link
Member

jefferai commented Sep 23, 2019

Note that things don't actually build successfully just by updating the deps. Making them build likely means essentially reverting the work that was done to adapt to the new APIs and standardizing on the github.com/coreos/etcd repo names, and then changing them back later.

The likely better approach is to just go to v3.4.1. Updating to that (and an undeclared but necessary update to google.golang.org/api) gives the following set, which is far more manageable, but also has some deep changes (prometheus, grpc, bbolt):

diff --git a/go.mod b/go.mod
index befb49285..0c091b7a4 100644
--- a/go.mod
+++ b/go.mod
@@ -110,22 +110,22 @@ require (
        github.com/pkg/errors v0.8.1
        github.com/posener/complete v1.2.1
        github.com/pquerna/otp v1.1.0
-       github.com/prometheus/client_golang v0.9.3-0.20190127221311-3c4408c8b829
-       github.com/prometheus/common v0.2.0
+       github.com/prometheus/client_golang v1.0.0
+       github.com/prometheus/common v0.4.1
        github.com/ryanuber/columnize v2.1.0+incompatible
        github.com/ryanuber/go-glob v1.0.0
        github.com/samuel/go-zookeeper v0.0.0-20180130194729-c4fab1ac1bec
        github.com/shopspring/decimal v0.0.0-20180709203117-cd690d0c9e24 // indirect
        github.com/streadway/amqp v0.0.0-20190404075320-75d898a42a94 // indirect
        github.com/stretchr/testify v1.3.0
-       go.etcd.io/bbolt v1.3.2
-       go.etcd.io/etcd v0.0.0-20190412021913-f29b1ada1971
+       go.etcd.io/bbolt v1.3.3
+       go.etcd.io/etcd v0.0.0-20190917205325-a14579fbfb1a
        golang.org/x/crypto v0.0.0-20190701094942-4def268fd1a4
        golang.org/x/net v0.0.0-20190813141303-74dc4d7220e7
-       golang.org/x/oauth2 v0.0.0-20190402181905-9f3314589c9a
-       google.golang.org/api v0.5.0
+       golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45
+       google.golang.org/api v0.10.0
        google.golang.org/genproto v0.0.0-20190801165951-fa694d86fc64
-       google.golang.org/grpc v1.22.0
+       google.golang.org/grpc v1.23.1
        gopkg.in/mgo.v2 v2.0.0-20180705113604-9856a29383ce
        gopkg.in/ory-am/dockertest.v3 v3.3.4
        gopkg.in/square/go-jose.v2 v2.3.1

@chrishoffman @briankassouf please weigh in

@xtrusia
Copy link

xtrusia commented Sep 24, 2019

This is working well in my test.

3 etcd 3.3.15 servers + vault master built with below diff,

shutting down etcd server one by one and check
$etcdctl endpoint status --cluster
$vault secrets list -detailed.

Note that things don't actually build successfully just by updating the deps. Making them build likely means essentially reverting the work that was done to adapt to the new APIs and standardizing on the github.com/coreos/etcd repo names, and then changing them back later.

The likely better approach is to just go to v3.4.1. Updating to that (and an undeclared but necessary update to google.golang.org/api) gives the following set, which is far more manageable, but also has some deep changes (prometheus, grpc, bbolt):

diff --git a/go.mod b/go.mod
index befb49285..0c091b7a4 100644
--- a/go.mod
+++ b/go.mod
@@ -110,22 +110,22 @@ require (
        github.com/pkg/errors v0.8.1
        github.com/posener/complete v1.2.1
        github.com/pquerna/otp v1.1.0
-       github.com/prometheus/client_golang v0.9.3-0.20190127221311-3c4408c8b829
-       github.com/prometheus/common v0.2.0
+       github.com/prometheus/client_golang v1.0.0
+       github.com/prometheus/common v0.4.1
        github.com/ryanuber/columnize v2.1.0+incompatible
        github.com/ryanuber/go-glob v1.0.0
        github.com/samuel/go-zookeeper v0.0.0-20180130194729-c4fab1ac1bec
        github.com/shopspring/decimal v0.0.0-20180709203117-cd690d0c9e24 // indirect
        github.com/streadway/amqp v0.0.0-20190404075320-75d898a42a94 // indirect
        github.com/stretchr/testify v1.3.0
-       go.etcd.io/bbolt v1.3.2
-       go.etcd.io/etcd v0.0.0-20190412021913-f29b1ada1971
+       go.etcd.io/bbolt v1.3.3
+       go.etcd.io/etcd v0.0.0-20190917205325-a14579fbfb1a
        golang.org/x/crypto v0.0.0-20190701094942-4def268fd1a4
        golang.org/x/net v0.0.0-20190813141303-74dc4d7220e7
-       golang.org/x/oauth2 v0.0.0-20190402181905-9f3314589c9a
-       google.golang.org/api v0.5.0
+       golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45
+       google.golang.org/api v0.10.0
        google.golang.org/genproto v0.0.0-20190801165951-fa694d86fc64
-       google.golang.org/grpc v1.22.0
+       google.golang.org/grpc v1.23.1
        gopkg.in/mgo.v2 v2.0.0-20180705113604-9856a29383ce
        gopkg.in/ory-am/dockertest.v3 v3.3.4
        gopkg.in/square/go-jose.v2 v2.3.1

@chrishoffman @briankassouf please weigh in

@xtrusia
Copy link

xtrusia commented Sep 25, 2019

hmm.. it doesn;t work well on 2nd test.

This may be related to mine
etcd-io/etcd#11176

This is working well in my test.

3 etcd 3.3.15 servers + vault master built with below diff,

shutting down etcd server one by one and check
$etcdctl endpoint status --cluster
$vault secrets list -detailed.

Note that things don't actually build successfully just by updating the deps. Making them build likely means essentially reverting the work that was done to adapt to the new APIs and standardizing on the github.com/coreos/etcd repo names, and then changing them back later.
The likely better approach is to just go to v3.4.1. Updating to that (and an undeclared but necessary update to google.golang.org/api) gives the following set, which is far more manageable, but also has some deep changes (prometheus, grpc, bbolt):

diff --git a/go.mod b/go.mod
index befb49285..0c091b7a4 100644
--- a/go.mod
+++ b/go.mod
@@ -110,22 +110,22 @@ require (
        github.com/pkg/errors v0.8.1
        github.com/posener/complete v1.2.1
        github.com/pquerna/otp v1.1.0
-       github.com/prometheus/client_golang v0.9.3-0.20190127221311-3c4408c8b829
-       github.com/prometheus/common v0.2.0
+       github.com/prometheus/client_golang v1.0.0
+       github.com/prometheus/common v0.4.1
        github.com/ryanuber/columnize v2.1.0+incompatible
        github.com/ryanuber/go-glob v1.0.0
        github.com/samuel/go-zookeeper v0.0.0-20180130194729-c4fab1ac1bec
        github.com/shopspring/decimal v0.0.0-20180709203117-cd690d0c9e24 // indirect
        github.com/streadway/amqp v0.0.0-20190404075320-75d898a42a94 // indirect
        github.com/stretchr/testify v1.3.0
-       go.etcd.io/bbolt v1.3.2
-       go.etcd.io/etcd v0.0.0-20190412021913-f29b1ada1971
+       go.etcd.io/bbolt v1.3.3
+       go.etcd.io/etcd v0.0.0-20190917205325-a14579fbfb1a
        golang.org/x/crypto v0.0.0-20190701094942-4def268fd1a4
        golang.org/x/net v0.0.0-20190813141303-74dc4d7220e7
-       golang.org/x/oauth2 v0.0.0-20190402181905-9f3314589c9a
-       google.golang.org/api v0.5.0
+       golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45
+       google.golang.org/api v0.10.0
        google.golang.org/genproto v0.0.0-20190801165951-fa694d86fc64
-       google.golang.org/grpc v1.22.0
+       google.golang.org/grpc v1.23.1
        gopkg.in/mgo.v2 v2.0.0-20180705113604-9856a29383ce
        gopkg.in/ory-am/dockertest.v3 v3.3.4
        gopkg.in/square/go-jose.v2 v2.3.1

@chrishoffman @briankassouf please weigh in

@xtrusia
Copy link

xtrusia commented Sep 27, 2019

test env,
3 etcds, 1 vault
shut down etcd node one by one ( even if it is leader or not ) several times

I built test program which is keep asking member list as below
https://pastebin.ubuntu.com/p/pcynNM4W3W/

built vault with above dependency changes and added
DialOptions: []grpc.DialOption{grpc.WithBlock()},

my test program works well even if I shutdown one of etcd node one by one several times
https://pastebin.ubuntu.com/p/HmznhGK5Rx/
there is one context deadline exceeded but it is recovered soon

but vault got error and didn't recovered well ( it is checked with "vault secrets list -detailed" ), it took around 20mins

vault log
https://pastebin.ubuntu.com/p/5RbB6FBwr3/

@michelvocks
Copy link
Contributor

Hi @jsok!

We are interested in getting this merged soon. Would you be able to address the review comments mentioned above by @jefferai in the near term? If not, we can also overtake this issue for you.

Please let us know what you think.

Cheers,
Michel

@jsok
Copy link
Contributor Author

jsok commented Dec 16, 2019

If not, we can also overtake this issue for you

@michelvocks that would be fine, I don't think I'll get to it until early next year at this rate.

Would also be worth jumping to 3.3.18 too potentially.

@michelvocks
Copy link
Contributor

michelvocks commented Dec 17, 2019

Hi @jsok!

Thank you for your quick reply and your effort.
I close this PR in favor of #8037.

Cheers,
Michel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants