-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clientv3/balance: fixed flaky balancer tests #14204
Conversation
9a1b977
to
78c72ec
Compare
fixes #14158 |
Looking at the end of the log from failed run https://github.com/etcd-io/etcd/runs/7049120841?check_suite_focus=true we see {"level":"info","msg":"state changed","picker":"picker-error","balancer-id":"ckyp592dedhk","connected":true,"subconn":"0xaa01b00","subconn-size":5,"address":"127.0.0.1:38193","old-state":"CONNECTING","new-state":"READY"} |
78c72ec
to
6a3da5e
Compare
clientv3/balancer/balancer_test.go
Outdated
available := make(map[string]struct{}) | ||
// cycle through all peers to indirectly verify that balancer subconn list is fully loaded | ||
// otherwise we can't reliably count switches in the next step | ||
for len(available) < tc.serverCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, technically we only need 2 different peers for switch counting logic to work
clientv3/balancer/balancer_test.go
Outdated
var picked string | ||
available := make(map[string]struct{}) | ||
// cycle through all peers to indirectly verify that balancer subconn list is fully loaded | ||
// otherwise we can't reliably count switches in the next step | ||
for len(available) < serverCount { | ||
picked, err = reqFunc(context.Background()) | ||
if err != nil { | ||
t.Fatalf("Unexpected failure %v", err) | ||
} | ||
available[picked] = struct{}{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see exact the same code multiple times, can you get them wrapped in a common function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've followed the same style as original tests, there is a bunch of duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand it. Since this is a low hang fruit, let's try to do a little better as we can. thx.
func waitSubconnReady(count int, reqFunc func(context) (?, error) ) map[string]struct{} {
}
6a3da5e
to
0b3a09e
Compare
Please fix the linux-amd64-fmt failure, and ignore Release/release failure for now. |
- added verification step to indirectly verify that all peers are in balancer subconn list Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
0b3a09e
to
185f203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you @lavacat
I saw another issue ,
https://github.com/etcd-io/etcd/runs/7293195456?check_suite_focus=true |
@ahrtr interesting, I took a quick look but could find the root cause. Seams like after stopping 1 peer subconn-size went from 5 to 1 to 0. I'll investigate more |
Just raised a new issue to track this |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.