-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ringhash: port e2e tests from c-core #7271
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall approach looks good. So, I will wait for you to move it out of draft before I give any further comments. Thanks.
I implemented 8 out of the 23 tests in https://github.com/grpc/grpc/blob/master/test/cpp/end2end/xds/xds_ring_hash_end2end_test.cc. I'd suggest we try to get those in and add more tests in a separate commit. Regarding potential flakyness: I ran those tests with -count 1000 which didn't yield any failure. |
I see that this is still assigned to you. Please assign it to me when you think it is ready for me to take a pass. Thanks. |
This is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass.
Many of these comments inside of the test apply to multiple tests instead of just the one where I've made the comment.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass.
Many of these comments inside of the test apply to multiple tests instead of just the one where I've made the comment.
Thanks.
I looked into this, and I think there is a race which is not obvious.
then Lines 843 to 848 in e40eb2e
However, since that happens in a separate goroutine, there's no guarantee that closing the listener will run before the first connections, later on in a test. I think the best way to work around this is to directly create listeners instead of creating stubservers. This avoids the asynchronous close behavior described above, while still reserving an available port and making sure it's not used by another server. I did that in the last changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of all the comments.
primaryClusterName := "new_cluster_1" | ||
primaryServiceName := "new_eds_service_1" | ||
secondaryClusterName := "new_cluster_2" | ||
secondaryServiceName := "new_eds_service_2" | ||
clusterName := "aggregate_cluster" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, should have caught this much earlier. Could we please use consts here, and in other tests as well. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed that in the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yay!!!
Thanks @atollena for taking care of this! |
Follow up to grpc#7271 to fix grpc#6072. This adds a dozen more end to end tests. There are tests that I did not port, specifically: - TestRingHash_TransientFailureSkipToAvailableReady was flaky when I ported it, so I removed it while investigating. - TestRingHash_SwitchToLowerPriorityAndThenBack was also flaky, I also removed it while investigating. - TestRingHash_ContinuesConnectingWithoutPicksOneSubchannelAtATime, I'm not sure we implement this behavior, and if we do, it's not working the same way as in c-core, where the order of subchannel connection attempts is based on the resolver address order rather than the ring order. I will follow up with fixes for each one of the remaining tests.
Follow up to grpc#7271 to fix grpc#6072. This adds a dozen more end to end tests. There are tests that I did not port, specifically: - TestRingHash_TransientFailureSkipToAvailableReady was flaky when I ported it, so I removed it while investigating. - TestRingHash_SwitchToLowerPriorityAndThenBack was also flaky, I also removed it while investigating. - TestRingHash_ContinuesConnectingWithoutPicksOneSubchannelAtATime, I'm not sure we implement this behavior, and if we do, it's not working the same way as in c-core, where the order of subchannel connection attempts is based on the resolver address order rather than the ring order. I will follow up with fixes for each one of the remaining tests.
Follow up to grpc#7271 to fix grpc#6072. This adds a dozen more end to end tests. There are tests that I did not port, specifically: - TestRingHash_TransientFailureSkipToAvailableReady was flaky when I ported it, so I removed it while investigating. - TestRingHash_SwitchToLowerPriorityAndThenBack was also flaky, I also removed it while investigating. - TestRingHash_ContinuesConnectingWithoutPicksOneSubchannelAtATime, I'm not sure we implement this behavior, and if we do, it's not working the same way as in c-core, where the order of subchannel connection attempts is based on the resolver address order rather than the ring order. I will follow up with fixes for each one of the remaining tests.
Fixes #6072.
RELEASE NOTES: none