-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17381 SolrJ fix to fetch entire ClusterState if asked #2853
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: These params.set calls are passing "true" | "false" but there's a nice boolean (i.e. a "sugar" method) for the primitive. Could you change that please? It's a minor matter I know but you wrote this code not long ago.
I understand how this change fixes the bug but it feels like the solution optimizes for the smallest possible adequate fix. I look at the method's caller, fetchClusterState. By its name, it should be fetching the cluster's state, but it's kind of working in two modes -- everything or one collection. Can we have fetchClusterState do only the whole cluster (and pass null to submitClusterStateRequest for the request type to signify everything) and have a different/new fetchCollectionState method that only gets the state of one collection, returning a DocCollection? That will be the common code path, and this recommendation I propose would clarify what little work is pertinent to that code path. If this seems like too much, feel free to push back.
Also, I observe that the clusterProperties param is unused (always given null, I mean). Unsure if this is a recent bug/oversight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want my feedback to seen as blocking this fix; maybe this should be merged and we move on.
(notwithstanding "tidy" of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so much better!
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent
String collectionName = e.getKey(); | ||
Map<String, Object> collStateMap = (Map<String, Object>) e.getValue(); | ||
cs = | ||
cs.copyWith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling copyWith in a loop of collections is O(N^2)
(N=#collections); right? Not a problem introduced in this PR but I just want to confer with you on this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is indeed O(N^2).
* Use GenericSolrRequest not QueryRequest * Removed unused collection property loading
List<String> liveNodesList = (List<String>) cluster.get("live_nodes"); | ||
if (liveNodesList != null) { | ||
Set<String> liveNodes = new HashSet<>(liveNodesList); | ||
this.liveNodes = liveNodes; | ||
this.liveNodes = Set.copyOf(liveNodesList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small optimization
Map<String, DocCollection> collStateByName = new LinkedHashMap<>(collectionsNl.size()); | ||
for (Entry<String, NamedList<?>> entry : collectionsNl) { | ||
final var collStateMap = entry.getValue().asMap(10); | ||
final int zNodeVersion = (int) collStateMap.get("znodeVersion"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed a critical bug in your last version. We must use the ZK version for the DocCollection, not a bogus -1!
var collectionsNl = (NamedList<NamedList<?>>) cluster.get("collections"); | ||
|
||
Map<String, DocCollection> collStateByName = new LinkedHashMap<>(collectionsNl.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer using ClusterState.copyWith, solving a perf problem
if (clusterProperties != null) { | ||
Map<String, Object> properties = (Map<String, Object>) cluster.get("properties"); | ||
if (properties != null) { | ||
clusterProperties.putAll(properties); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was dead code; nobody calls this method with clusterProperties anymore
* remove useless singletonMap * liveNodes immutable
Test equivalency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ready, assuming tests pass. @aparnasuresh85 happy to review with you.
|
||
@SuppressWarnings({"rawtypes", "unchecked"}) | ||
private DocCollection fillPrs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inlined fillPrs
into getDocCollectionFromObjects
and added a bit more responsibility here so that the callers needn't handle zk version.
params.set("action", "CLUSTERSTATUS"); | ||
|
||
params.set("includeAll", false); // will flip flor CLUSTER_STATE | ||
switch (requestType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch is more idiomatic
@@ -117,4 +149,39 @@ private Instant getCreationTimeFromClusterStatus(String collectionName) | |||
Map<String, Object> collection = (Map<String, Object>) collections.get(collectionName); | |||
return Instant.ofEpochMilli((long) collection.get("creationTimeMillis")); | |||
} | |||
|
|||
@Test | |||
public void testClusterStateProvider() throws SolrServerException, IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the process of writing this test, I discovered some keys like "health" in the base ZkNodeProps that prevented equivalency with the ZK one. So I enhanced HttpCSP to remove these keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this test to a parameterized test to guarantee we test with both impls.
return new ZkClientClusterStateProvider(cluster.getZkStateReader()); | ||
} | ||
|
||
private final Supplier<ClusterStateProvider> cspSupplier; | ||
|
||
public ClusterStateProviderTest(String method) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by using a String arg that refers to a method, the printed test execution is clear as to what is being invoked. This uses reflection allows deferred execution of construction of the CSP, which is useful.
case FETCH_NODE_ROLES -> params.set("roles", true); | ||
} | ||
params.set("prs", true); | ||
var request = new GenericSolrRequest(METHOD.GET, "/admin/collections", params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formerly was QueryRequest which was wrong even though it apparently worked.
LGTM - could be merged when all test pass |
The test failure is known flaky |
#2853) When using the HTTP ClusterStateProvider (not ZK), getClusterState() wasn't working correctly; a regression from the first PR for this JIRA issue (not released). Also, * Optimization: fix O(N^2) algorithm to be O(N) for the number of collections when calling getClusterState * liveNodes is now immutable, and probably a non-sorted ordering * removed "health" and some other keys from DocCollection that aren't present when using ZK CSP Minor details: * clearly differentiate internal ClusterState processing from getting one DocCollection * Use GenericSolrRequest not QueryRequest * test the both CSPs better --------- Co-authored-by: David Smiley <dsmiley@salesforce.com> (cherry picked from commit d2045f6)
Description
Address a test failure in
ClusterStateProviderTest.testGetClusterStatus
whenClusterStateProvider
(CSP) chosen isHttp2ClusterStateProvider
. More importantly, whileClusterStateProvider.getClusterStatus()
is highly discouraged, it should still be functional since there are codepaths using it.Solution
If a call to get CLUSTERSTATUS intending to get collections data specifies a null value for collection parameter, proceed to fetch entire cluster status since fetching the remaining (live_nodes, cluster properties, roles etc) should not be time consuming anyway.
Tests
Ensure ClusterStateProviderTest.testGetClusterState() passes when chosen CSP is Http2ClusterStateProvider. (Currently the test fails with a NPE)
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.