Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA cluster reset run into etcd panic #2131

Closed
ElisaMeng opened this issue Aug 15, 2020 · 2 comments
Closed

HA cluster reset run into etcd panic #2131

ElisaMeng opened this issue Aug 15, 2020 · 2 comments
Assignees
Labels
kind/bug Something isn't working status/blocker
Milestone

Comments

@ElisaMeng
Copy link

Environmental Info:
K3s Version:

master branch with k8s 18.6

Node(s) CPU architecture, OS, and Version:

k3os

Cluster Configuration:

3 masters
Describe the bug:

when perform cluster reset to bring up a cluster after it lost quorum, run into panic like this:


4T15:09:17.889190241+08:00] Starting k3s v1.18.6+k3s-026584e1 (026584e1)
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2020-08-14 15:09:17.905293 I | embed: peerTLS: cert = /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt, key = /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.key, trusted-ca = /var/lib/rancher/k3s/server/tls/etcd/peer-ca.crt, client-cert-auth = true, crl-file =
2020-08-14 15:09:17.905760 I | embed: name = master1-5c63d33d
2020-08-14 15:09:17.905775 I | embed: force new cluster
2020-08-14 15:09:17.905779 I | embed: data dir = /var/lib/rancher/k3s/server/db/etcd
2020-08-14 15:09:17.905837 I | embed: member dir = /var/lib/rancher/k3s/server/db/etcd/member
2020-08-14 15:09:17.905843 I | embed: heartbeat = 500ms
2020-08-14 15:09:17.905881 I | embed: election = 5000ms
2020-08-14 15:09:17.905920 I | embed: snapshot count = 100000
2020-08-14 15:09:17.905987 I | embed: advertise client URLs = https://172.20.1.168:2379
2020-08-14 15:09:17.905996 I | embed: initial advertise peer URLs = https://172.20.1.168:2380
2020-08-14 15:09:17.906060 I | embed: initial cluster =
2020-08-14 15:09:17.958057 C | etcdserver: ConfChange Type should be either ConfChangeAddNode or ConfChangeRemoveNode!
panic: ConfChange Type should be either ConfChangeAddNode or ConfChangeRemoveNode!

goroutine 1 [running]:
github.com/rancher/k3s/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc00100fba0, 0x430cca6, 0x4b, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:83 +0x135
github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver.getIDs(0x0, 0x0, 0xc0013aa000, 0x183e, 0x1871, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/raft.go:703 +0x5ef
github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver.restartAsStandaloneNode(0xc00102a860, 0x13, 0x0, 0x0, 0x0, 0x0, 0xc001299500, 0x1, 0x1, 0xc001299400, ...)
	/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/raft.go:608 +0x246
github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver.NewServer(0xc00102a860, 0x13, 0x0, 0x0, 0x0, 0x0, 0xc001299500, 0x1, 0x1, 0xc001299400, ...)
	/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/etcdserver/server.go:482 +0x1060
github.com/rancher/k3s/vendor/go.etcd.io/etcd/embed.StartEtcd(0xc0002ad800, 0xc000ff6680, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/go.etcd.io/etcd/embed/etcd.go:211 +0x9e9
github.com/rancher/k3s/pkg/daemons/executor.Embedded.ETCD(0xc000e6de60, 0x19, 0xc000ab79e0, 0x2d, 0x41ce290, 0x3, 0xc000e6de20, 0x13, 0xc000ab7a40, 0x30, ...)
	/go/src/github.com/rancher/k3s/pkg/daemons/executor/etcd.go:23 +0xa9
github.com/rancher/k3s/pkg/daemons/executor.ETCD(...)
	/go/src/github.com/rancher/k3s/pkg/daemons/executor/executor.go:107
github.com/rancher/k3s/pkg/etcd.(*ETCD).cluster(0xc0005bc400, 0x4c54660, 0xc000ba90c0, 0x1, 0xc000e6de60, 0x19, 0xc000ab79e0, 0x2d, 0x41ce290, 0x3, ...)
	/go/src/github.com/rancher/k3s/pkg/etcd/etcd.go:374 +0x59b
github.com/rancher/k3s/pkg/etcd.(*ETCD).newCluster(0xc0005bc400, 0x4c54660, 0xc000ba90c0, 0x4c54601, 0xc000ba90c0, 0x0)
	/go/src/github.com/rancher/k3s/pkg/etcd/etcd.go:358 +0x241
github.com/rancher/k3s/pkg/etcd.(*ETCD).Reset(0xc0005bc400, 0x4c54660, 0xc000ba90c0, 0x0, 0xc000451ec0, 0xc00016ab60)
	/go/src/github.com/rancher/k3s/pkg/etcd/etcd.go:120 +0x88
github.com/rancher/k3s/pkg/cluster.(*Cluster).start(0xc000d0c5a0, 0x4c54660, 0xc000ba90c0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/pkg/cluster/managed.go:49 +0x6a
github.com/rancher/k3s/pkg/cluster.(*Cluster).Start(0xc000d0c5a0, 0x4c54660, 0xc000ba90c0, 0x0, 0x0, 0x41ebcfe)
	/go/src/github.com/rancher/k3s/pkg/cluster/cluster.go:33 +0x78
github.com/rancher/k3s/pkg/daemons/control.prepare(0x4c54660, 0xc000ba90c0, 0xc0005ff908, 0xc0005b3880, 0x1a, 0x0)
	/go/src/github.com/rancher/k3s/pkg/daemons/control/server.go:358 +0x286c
github.com/rancher/k3s/pkg/daemons/control.Server(0x4c54660, 0xc000ba90c0, 0xc0005ff908, 0xc0006adb70, 0xc0006adb70)
	/go/src/github.com/rancher/k3s/pkg/daemons/control/server.go:89 +0x155
github.com/rancher/k3s/pkg/server.StartServer(0x4c54660, 0xc000ba90c0, 0xc0005ff900, 0xc000ba90c0, 0x2)
	/go/src/github.com/rancher/k3s/pkg/server/server.go:55 +0x90
github.com/rancher/k3s/pkg/cli/server.run(0xc000ba88c0, 0x744b0a0, 0x1, 0xc0005e34c0)
	/go/src/github.com/rancher/k3s/pkg/cli/server/server.go:220 +0x132f
github.com/rancher/k3s/pkg/cli/server.Run(0xc000ba88c0, 0xc000759c70, 0x0)
	/go/src/github.com/rancher/k3s/pkg/cli/server/server.go:35 +0x37
github.com/rancher/k3s/pkg/cli/cmds.InitLogging.func1(0xc000ba88c0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/pkg/cli/cmds/log.go:73 +0xaa
github.com/rancher/k3s/vendor/github.com/rancher/spur/cli.(*Command).Run(0xc000e7d0e0, 0xc000a0d5c0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/spur/cli/command.go:164 +0x4b9
github.com/rancher/k3s/vendor/github.com/rancher/spur/cli.(*App).RunContext(0xc000ac2000, 0x4c546a0, 0xc0000e4010, 0xc000213bf0, 0x3, 0x3, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/spur/cli/app.go:308 +0x5ed
github.com/rancher/k3s/vendor/github.com/rancher/spur/cli.(*App).Run(...)
	/go/src/github.com/rancher/k3s/vendor/github.com/rancher/spur/cli/app.go:225
main.main()
	/go/src/github.com/rancher/k3s/cmd/server/main.go:46 +0x3a6

Steps To Reproduce:

  • Installed K3s with embedded HA

Expected behavior:

Cluster reset should run through
Actual behavior:

panic
Additional context / logs:

This only happen after introducing etcd learner feature.

@ElisaMeng
Copy link
Author

This seem only happen after #2066. Since I run older version, the problem is not there.

@rancher-max
Copy link
Contributor

This should be fixed now on master. See: #2227 (comment)

Issue is the same, but the version on master is now 1.19.1. Validated using commitid: beab211685805a481ffa61070ec53ddefd1adf03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working status/blocker
Projects
None yet
Development

No branches or pull requests

5 participants