Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdsbalancer: test cleanup part 3/N #6564

Merged
merged 6 commits into from
Aug 25, 2023

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Aug 18, 2023

This PR cleans up aggregate cluster tests in the cds LB policy. It makes the tests verify behavior rather than internal state. Cleaning up these tests helped identify the issue described in #6562.

#resource-agnostic-xdsclient-api

RELEASE NOTES: none

@easwars easwars requested a review from zasweq August 18, 2023 06:13
@easwars easwars added this to the 1.58 Release milestone Aug 18, 2023
@easwars
Copy link
Contributor Author

easwars commented Aug 18, 2023

@zasweq : Please ignore the one line change in xds/internal/balancer/clusterresolver/resource_resolver.go for now. Once #6563 is merged, I will undo it.

@easwars easwars changed the title cdsbalancer: cleanup aggregate cluster tests cdsbalancer: test cleanup part 3/N Aug 18, 2023
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, particularly about lost assertions.

xds/internal/balancer/cdsbalancer/cluster_handler_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/cdsbalancer/cluster_handler_test.go Outdated Show resolved Hide resolved
@@ -18,1078 +18,650 @@ package cdsbalancer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall comment on this file: I like how you rewrote the tests to expect a child config to the cluster_resolver corresponding to the leaf clusters. However, I feel like by switching the tests over we drop a lot of the functionality wrt the cds balancer not sending an update yet due to the aggregate cluster graph not being filled yet (i.e. all the roots). You have a case for this from A->A not sending but drop a lot of the verifications that it doesn't send in many tests. Maybe add them by only adding management server resources for different clusters when needed, and have assertions it doesn't send before adding the management server resources for a specific cluster (instead of filling out the full tree at the beginning of the test).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A specific example of a dropped assertion: https://github.com/grpc/grpc-go/pull/6564/files#diff-6eccd90237cb4345aac6d80e2e740c5d686f42cab9799b3eaf261e392785af54L288. I think these checks are important, but luckily we can plumb it into the e2e style way as well :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this check to the following tests:

  • TestAggregateClusterSuccess_ThenUpdateChildClusters
  • TestAggregatedClusterSuccess_DiamondDependency
  • TestAggregatedClusterSuccess_IgnoreDups

The first one checks the base case for aggregate clusters, while the next two test some other interesting cases.

xds/internal/balancer/cdsbalancer/cluster_handler_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/cdsbalancer/cluster_handler_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/cdsbalancer/cluster_handler_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/cdsbalancer/cluster_handler_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/cdsbalancer/cluster_handler_test.go Outdated Show resolved Hide resolved
Comment on lines -752 to -753
// This shouldn't cause an update to be written to the update buffer,
// as cluster C has not received a cluster update yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example of a scenario that is lost. This is a corner case but is important because cluster C gets a callback which is unknown before specifying the same D, and tests the flip from this aggregate cluster is ready to update since the whole tree has updates (including both B and C's children) vs. it's not yet ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, tests this complicated algorithm/data structure of clusters seen: https://github.com/grpc/grpc-go/blob/master/xds/internal/balancer/cdsbalancer/cluster_handler.go#L215.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zasweq zasweq assigned easwars and unassigned zasweq Aug 22, 2023
@easwars easwars assigned zasweq and unassigned easwars Aug 24, 2023
@easwars easwars force-pushed the cdsbalancer_test_cleanup_part_3 branch from eea79f9 to 9b1a660 Compare August 24, 2023 12:38
@easwars
Copy link
Contributor Author

easwars commented Aug 24, 2023

@zasweq : Had to force push since I had to rebase to master.

@dfawley dfawley modified the milestones: 1.58 Release, 1.59 Release Aug 24, 2023
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for adding back assertions. E2E tests look solid here. I remember writing the left red side two months into my tenure :). LGTM.

mgmtServer, nodeID, _, _, _, _, _ := setupWithManagementServer(t)

// Configure the management server with the aggregate cluster resource
// pointing to two child clusters. But don't include resource corresponding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify the child clusters, and mention you include EDS cluster. Applies to all aggregate cluster comments added in commit 3 explaining not send update yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOne.

Comment on lines 160 to 168
// Tests the case where the cluster resource requested by the cds LB policy is
// an aggregate cluster root pointing to two child clusters, one of type EDS and
// the other of type LogicalDNS. Verifies that the load balancing configuration
// pushed to the cluster_resolver LB policy is as expected. The test then
// updates the aggregate cluster to point to two child clusters, the same leaf
// cluster of type EDS and a different leaf cluster of type LogicalDNS and
// verifies that the load balancing configuration pushed to the cluster_resolver
// LB policy contains the expected discovery mechanisms corresponding to the
// child clusters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases where you added no cluster update sent verification, please add that to top level testing comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOne.

@zasweq zasweq assigned easwars and unassigned zasweq Aug 24, 2023
@easwars easwars merged commit 2ce7ecd into grpc:master Aug 25, 2023
1 check passed
@easwars easwars deleted the cdsbalancer_test_cleanup_part_3 branch August 25, 2023 02:21
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants