Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16257: ZkStateReader changes to avoid race condition between collectionWatches and watchedCollectionStates #909

Conversation

patsonluk
Copy link
Contributor

@patsonluk patsonluk commented Jun 17, 2022

https://issues.apache.org/jira/browse/SOLR-16257

Description

As described in the Jira issue, ZkStateReader fields collectionWatches and watchedCollectionStates can run into race conditions.

And such condition might produce stale clusterState for some collection which can no longer be updated (until a new watch is registered for such collection)

The existing design is intended to synchronize the 2 map fields watchedCollectionStates and collectionWatches (ie if a collection is no longer watched in collectionWatches, then no entry for that collection should exist in watchedCollectionStates). However, it is hard to guarantee such "synchronization", one way to break it is demonstrated in "Steps to reproduce a race condition" within the Jira issue

Take note that even though in most cases watchedCollectionStates and collectionWatches should have the same keys, there seems to one valid case which if a collection to be created could be in collectionWatches but not yet in watchedCollectionStates.

Solution

Perhaps it's easier to simply eliminate watchedCollectionStates and add such state (as DocCollection) to the CollectionWatch itself (now added a new class DocCollectionWatch with contains a ref to the DocCollection) , such that we no longer need to worry about entry removed in collectionWatches somehow still remains in watchedCollectionStates.

Take note that there were 2 cases which used watchedCollectionStates.keySet(), that requires some extra consideration:

  1. In forciblyRefreshAllClusterStateSlow, which we are proposing to just replace watchedCollectionStates.keySet() with collectionWatches.keySet(), the reasoning is that we should forcibly update every collection that is registered in collectionWatches, even if previous fetch on such collection returned null (ie watchedCollectionStates would have no entry on it). This is a minor change of behavior but I think it is more "correct"? Would really need more experienced Solr dev to share their thoughts here 🙏🏼
  2. In getCurrentCollections, which we are proposing to add ONLY collections in collectionWatches with DocCollection currentDoc that is non null (ie, only collection that does exist from last update)

Tests

Added testForciblyRefreshAllClusterState, testGetCurrentCollections and testWatchRaceCondition to ZkStateReaderTest.

The first 2 are to verify the refactoring maintain the same behaviors for those 2 methods. And the last one is to test specifically race condition mentioned (hard to reproduce tho).

Since I am very new to this repo, I don't know the convention for junit testing on private methods (or is it a nono to test on them?). I usually would just make them package private for other projects but I see that ZkStateReaderTest is not within the same package of ZkStateReader hence I cannot do that. For now I changed those 2 methods to public, which feels wrong 😅

Please kindly advise the common approach for testing private methods 🙇🏼

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Remarks

The ./gradlew check failed once on test case TestTlogReplica with warning of

WARN  (ZkTestServer Run Thread) [] o.a.s.c.ZkTestServer Watch limit violations:
...

Re-running ./gradlew check, the test no longer fails. Going to spend a bit more time to see whether the failure is related to this change.

so far no luck reproducing it with gradlew :solr:core:test --tests "org.apache.solr.cloud.TestTlogReplica.testRebalanceLeaders" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=120m" -Ptests.seed=C004E5CD6B2D226F -Ptests.file.encoding=US-ASCII

@patsonluk
Copy link
Contributor Author

patsonluk commented Jun 17, 2022

The details of one test failure that I cannot reproduce: Probably a red herring, this is likely just a warning that get printed but is expected for our case. Since we repeatedly register/remove watches

  2> 1444230 INFO  (SUITE-TestTlogReplica-seed#[C004E5CD6B2D226F]-worker) [n:127.0.0.1:56946_solr] o.a.s.c.ZkTestServer Shutting down ZkTestServer.
  2> 1444438 WARN  (ZkTestServer Run Thread) [] o.a.s.c.ZkTestServer Watch limit violations: 
  2> Maximum concurrent create/delete watches above limit:
  2> 
  2> 	33	/solr/collections/tlog_replica_test_create_delete/terms/shard1
  2> 	33	/solr/configs/conf
  2> 	31	/solr/collections/tlog_replica_test_create_delete/terms/shard2
  2> 	20	/solr/collections/tlog_replica_test_remove_leader/terms/shard1
  2> 	20	/solr/collections/tlog_replica_test_create_delete/state.json
  2> 	19	/solr/collections/tlog_replica_test_recovery/terms/shard1
  2> 	18	/solr/collections/tlog_replica_test_real_time_get/terms/shard1
  2> 	17	/solr/collections/tlog_replica_test_basic_leader_election/terms/shard1
  2> 	17	/solr/collections/tlog_replica_test_kill_leader/terms/shard1
  2> 	15	/solr/collections/tlog_replica_test_add_docs/terms/shard1
  2> 	11	/solr/collections/tlog_replica_test_delete_by_id/terms/shard1
  2> 	11	/solr/collections/tlog_replica_test_only_leader_indexes/terms/shard1
  2> 	10	/solr/collections/tlog_replica_test_rebalance_leaders/terms/shard1
  2> 	9	/solr/collections/tlog_replica_test_out_of_order_db_qwith_in_place_updates/terms/shard1
  2> 	9	/solr/aliases.json
  2> 	9	/solr/clusterprops.json
  2> 	8	/solr/packages.json
  2> 	8	/solr/security.json
  2> 	8	/solr/collections/tlog_replica_test_add_remove_tlog_replica/terms/shard1
  2> 	8	/solr/collections/tlog_replica_test_add_remove_tlog_replica/terms/shard2
  2> 	8	/solr/collections/tlog_replica_test_real_time_get/state.json
  2> 	7	/solr/collections/tlog_replica_test_add_docs/state.json
  2> 	7	/solr/collections/tlog_replica_test_remove_leader/state.json
  2> 	7	/solr/collections/tlog_replica_test_add_remove_tlog_replica/state.json
  2> 	6	/solr/collections/tlog_replica_test_basic_leader_election/state.json
  2> 	6	/solr/collections/tlog_replica_test_only_leader_indexes/state.json
  2> 	5	/solr/collections/tlog_replica_test_out_of_order_db_qwith_in_place_updates/state.json
  2> 	5	/solr/collections/tlog_replica_test_kill_leader/state.json
  2> 	5	/solr/collections/tlog_replica_test_recovery/collectionprops.json
  2> 	5	/solr/collections/tlog_replica_test_rebalance_leaders/state.json
  2> 	5	/solr/collections/tlog_replica_test_delete_by_id/state.json
  2> 	4	/solr/collections/tlog_replica_test_recovery/state.json
  2> 	3	/solr/collections/tlog_replica_test_rebalance_leaders/collectionprops.json
  2> 	3	/solr/collections/tlog_replica_test_kill_leader/collectionprops.json
  2> 	3	/solr/collections/tlog_replica_test_basic_leader_election/collectionprops.json
  2> 	2	/solr/collections/tlog_replica_test_delete_by_id/collectionprops.json
  2> 	2	/solr/collections/tlog_replica_test_add_remove_tlog_replica/collectionprops.json
  2> 	2	/solr/collections/tlog_replica_test_out_of_order_db_qwith_in_place_updates/collectionprops.json
  2> 	2	/solr/collections/tlog_replica_test_add_docs/collectionprops.json
  2> 	2	/solr/collections/tlog_replica_test_remove_leader/collectionprops.json
  2> 	2	/solr/collections/tlog_replica_test_only_leader_indexes/collectionprops.json
  2> 	2	/solr/collections/tlog_replica_test_real_time_get/collectionprops.json
  2> 
  2> Maximum concurrent data watches above limit:
  2> 
  2> 	97	/solr/collections/tlog_replica_test_recovery/state.json
  2> 	79	/solr/collections/tlog_replica_test_create_delete/state.json
  2> 	51	/solr/collections/tlog_replica_test_kill_leader/state.json
  2> 	49	/solr/collections/tlog_replica_test_out_of_order_db_qwith_in_place_updates/state.json
  2> 	48	/solr/collections/tlog_replica_test_basic_leader_election/state.json
  2> 	44	/solr/collections/tlog_replica_test_remove_leader/state.json
  2> 	37	/solr/collections/tlog_replica_test_add_remove_tlog_replica/state.json
  2> 	21	/solr/collections/tlog_replica_test_rebalance_leaders/state.json
  2> 	20	/solr/collections/tlog_replica_test_real_time_get/state.json
  2> 	19	/solr/collections/tlog_replica_test_add_docs/state.json
  2> 	15	/solr/collections/tlog_replica_test_only_leader_indexes/state.json
  2> 	15	/solr/collections/tlog_replica_test_delete_by_id/state.json
  2> 	4	/solr/collections/tlog_replica_test_recovery/leader_elect/shard1/election/72082096014163974-core_node4-n_0000000000
  2> 	4	/solr/overseer_elect/election/72082096014163974-127.0.0.1:56945_solr-n_0000000001
  2> 	3	/solr/collections/tlog_replica_test_create_delete/leader_elect/shard1/election/72082096014163984-core_node11-n_0000000000
  2> 	2	/solr/collections/tlog_replica_test_create_delete/leader_elect/shard1/election/72082096014163984-core_node12-n_0000000000
  2> 	2	/solr/overseer_elect/election/72082096014163984-127.0.0.1:56946_solr-n_0000000005
  2> 	2	/solr/collections/tlog_replica_test_create_delete/leader_elect/shard2/election/72082096014163984-core_node9-n_0000000001
  2> 	2	/solr/collections/tlog_replica_test_create_delete/leader_elect/shard1/election/72082096014163988-core_node14-n_0000000001
  2> 
  2> Maximum concurrent children watches above limit:
  2> 
  2> 	87	/solr/collections
  2> 	52	/solr/live_nodes
  2> 	48	/solr/collections/tlog_replica_test_create_delete/state.json
  2> 	32	/solr/collections/tlog_replica_test_recovery/state.json
  2> 	22	/solr/collections/tlog_replica_test_kill_leader/state.json
  2> 	21	/solr/collections/tlog_replica_test_out_of_order_db_qwith_in_place_updates/state.json
  2> 	20	/solr/collections/tlog_replica_test_basic_leader_election/state.json
  2> 	20	/solr/collections/tlog_replica_test_remove_leader/state.json
  2> 	14	/solr/collections/tlog_replica_test_add_docs/state.json
  2> 	13	/solr/collections/tlog_replica_test_add_remove_tlog_replica/state.json
  2> 	13	/solr/collections/tlog_replica_test_real_time_get/state.json
  2> 	12	/solr/collections/tlog_replica_test_rebalance_leaders/state.json
  2> 	12	/solr/collections/tlog_replica_test_only_leader_indexes/state.json
  2> 	12	/solr/collections/tlog_replica_test_delete_by_id/state.json
  2> 	3	/solr/overseer/queue
  2> 	3	/solr/overseer/collection-queue-work
  2> 	3	/solr/overseer/queue-work
  2> 

…ntain consistent behavior with calls to `keySet` on the now refactored `watchedCollectionStates`
…tches instead of watched collection states, as newly created collection might not yet be reflected in watched collection states and should be refreshed.
…ntain consistent behavior with calls to `keySet` on the now refactored `watchedCollectionStates`
Forbidden method invocation: java.util.concurrent.Executors#newSingleThreadExecutor() [Spawns threads with vague names; use a custom thread factory (Lucene's NamedThreadFactory, Solr's SolrNamedThreadFactory) and name threads so that you can tell (by its name) which executor it is associated with]
  in org.apache.solr.cloud.overseer.ZkStateReaderTest (ZkStateReaderTest.java:359)
Forbidden method invocation: java.util.concurrent.Executors#newSingleThreadExecutor(java.util.concurrent.ThreadFactory) [Spawns threads without MDC logging context; use ExecutorUtil.newMDCAwareSingleThreadExecutor instead]
Copy link
Contributor

@HoustonPutman HoustonPutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this Patson!

I haven't gone over everything, but took a first-pass look. I think we should probably keep the helper methods instead of sub-classing ConcurrentHashMap. But that's a separate issue than what you are trying to fix really.

@patsonluk
Copy link
Contributor Author

patsonluk commented Jun 21, 2022

The original junit error was

 2> 1393351 INFO  (TEST-TestTlogReplica.testRebalanceLeaders-seed#[C004E5CD6B2D226F]) [] o.a.s.SolrTestCaseJ4 ###Ending testRebalanceLeaders
   >     java.lang.AssertionError: Can not find doc 1 in http://127.0.0.1:56945/solr
   >         at __randomizedtesting.SeedInfo.seed([C004E5CD6B2D226F:DEEBD73D53866E1F]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.junit.Assert.assertTrue(Assert.java:42)
   >         at org.junit.Assert.assertNotNull(Assert.java:713)
   >         at org.apache.solr.cloud.TestTlogReplica.checkRTG(TestTlogReplica.java:1138)
   >         at org.apache.solr.cloud.TestTlogReplica.testRebalanceLeaders(TestTlogReplica.java:839)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   >         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Seems to not happening anymore after the additional commits . Still happens occasionally with ./gradlew :solr:core:beast -Ptests.dups=10 --tests "org.apache.solr.cloud.TestTlogReplica.testRebalanceLeaders" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=120m" -Ptests.seed=C004E5CD6B2D226F -Ptests.file.encoding=US-ASCII

Not sure if it's related to this change. This failures seems to happen occasionally http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.TestTlogReplica.testRebalanceLeaders . Not sure if my mac book makes a diff tho 😓 (1 out of 3 X 10 tests.dups)

Fixed formatting
@patsonluk
Copy link
Contributor Author

patsonluk commented Jun 22, 2022

Thanks for tackling this Patson!

I haven't gone over everything, but took a first-pass look. I think we should probably keep the helper methods instead of sub-classing ConcurrentHashMap. But that's a separate issue than what you are trying to fix really.

Yes we can discuss about that, I don't have any strong opinion here.

My design of putting it into a separate class is based on the reasoning that updating the doc collection is tightly tied with the state of the of the watcher, especially that the collectionsWatched contains the DocCollection state now 😄 . Therefore I feel that might be reasonable encapsulation.

With the helper approach, it might be slightly harder to figure out the helper modifies the field by inspecting the method signature alone. It's more of a personal preference that I found methods with side effect a bit hard to track sometimes 😅 (even my current approach it still modifies the state ie has side effects too! But I think it's more obvious that the call/change is applied to the collectionsWatched object 😊 )

Thank you for the helpful code review!

@patsonluk
Copy link
Contributor Author

Friendly ping @HoustonPutman . Wondering if you have any extra thoughts about this PR? 😊

Copy link
Contributor

@HoustonPutman HoustonPutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's related to this change. This failures seems to happen occasionally http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.TestTlogReplica.testRebalanceLeaders . Not sure if my mac book makes a diff tho 😓 (1 out of 3 X 10 tests.dups)

Yeah if it's on that list you generally don't need to worry about your likely unrelated PR making it worse.

My design of putting it into a separate class is based on the reasoning that updating the doc collection is tightly tied with the state of the of the watcher, especially that the collectionsWatched contains the DocCollection state now 😄 . Therefore I feel that might be reasonable encapsulation.

Ok I think I'm on your side now. But I do think there is some confusion between having a null value in collectionsWatches and having a null currentDoc in the value. Can you explain that difference?

Also can we rename currentDoc to currentState? Also DocCollectionWatch would be better named StatefulCollectionWatch, because the important characteristic is that its saving the current state of the collection.

Thanks again for the work here!

@HoustonPutman
Copy link
Contributor

I have a commit to fix the return signature, could you check-off the Allow maintainers to commit option?

@cpoerschke
Copy link
Contributor

... could you check-off the Allow maintainers to commit option?

It seems that the PR might be from an organisation repo and possibly the isaacs/github#1681 might still apply then, just FYI in case it does.

@HoustonPutman
Copy link
Contributor

Ahh thanks for reminding me about that @cpoerschke!

@patsonluk
Copy link
Contributor Author

@madrob @HoustonPutman I added 2 commits to better reproduce the race condition (copied the test changes to main branch and confirmed that it does trigger the failure). https://github.com/apache/solr/compare/24a05df012f1354ab210fa8e9a593c249f8254ad..0feb4e61bd02afdfff282b4495bf2bf8cf5050d3

Seems like we have addressed all the review comments? If so, what are the next steps? Many thanks again! 😊

Comment on lines 333 to 335
currentCollections =
reader.getCurrentCollections(); // should detect both collections (c1 watched, c2 lazy
// loaded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is strange wrapping, separate the comment onto its own line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will try again, I ran both ./gradlew :solr:solrj:spotlessApply and ./gradlew :solr:core:spotlessApply and sometimes it produces some weird wrapping...

try {
log.info("Start: artificial delay for {}ms", delay);
Thread.sleep(delay);
log.info("Finish: artificial delay for {}ms", delay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this to a finally block so that we still log even if we were interrupted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point !

lazyCollectionStates.put(collection, new LazyCollectionRef(collection));
reconstructState.set(true);
CommonTestInjection.injectDelay(); // To unit test race condition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O missed that one. Thanks for catching!

Comment on lines 672 to 688
if (log.isDebugEnabled()) {
log.debug(
"clusterStateSet: interesting [{}] watched [{}] lazy [{}] total [{}]",
collectionWatches.keySet().size(),
watchedCollectionStates.keySet().size(),
collectionWatches.watchedCollections().size(),
collectionWatches.activeCollectionCount(),
lazyCollectionStates.keySet().size(),
clusterState.getCollectionStates().size());
}

if (log.isTraceEnabled()) {
log.trace(
"clusterStateSet: interesting [{}] watched [{}] lazy [{}] total [{}]",
collectionWatches.keySet(),
watchedCollectionStates.keySet(),
collectionWatches.watchedCollections(),
collectionWatches.activeCollections(),
lazyCollectionStates.keySet(),
clusterState.getCollectionStates());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still technically a race condition here where the watched collections and active collections could change between the successive calls?

Copy link
Contributor Author

@patsonluk patsonluk Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it could change but it should be okay.

As far as I understand, we only need a strong guarantee/synchronization on the write ops. As for read op, it appears none of the read operation requires such guarantee, ie even if the value read has changed or is inconsistent with the watchedCollectionStates, it should not cause any serious issues. (it might print value that does not reflect the latest set or fetch/not fetch something - but these are not new behaviors)

The core of this PR is to remove the race conditions between write operations of 2 separate maps, which can cause stale state. And now since all write ops are on the same map with sync, we should be safe 😊 . As for read ops, it's pretty much the same as before, which does not need to have such strong guarantee I suppose?

Comment on lines +330 to +339
int oldCVersion =
oldState.getPerReplicaStates() == null
? -1
: oldState.getPerReplicaStates().cversion;
int newCVersion =
newState.getPerReplicaStates() == null
? -1
: newState.getPerReplicaStates().cversion;
if (oldState.getZNodeVersion() < newState.getZNodeVersion()
|| oldCVersion < newCVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that creates c1, gets the ref, deletes c1, creates c1 again, and then checks that the new ref is correct? The node version for both instances of c1 would be 0, but the cversion should be larger, so I think your logic still works, but I'd like to see a unit test for it. Or maybe this is already covered and I missed where it is in the tests? Thanks! Similar to what we saw in SOLR-16143.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madrob Yes I think a test case to verify different version handling makes sense!

Added a new test case for that. And since we are testing cversion, i added PRS into the unit test (otherwise it will always be -1).

Does this address your concern?
https://github.com/apache/solr/pull/909/files#diff-c92a116c89a724226fb6efbea4c4f90ab70b40a779e1c50155c8eca646520358R227-R342

@patsonluk
Copy link
Contributor Author

patsonluk commented Jul 18, 2022

@HoustonPutman Thanks for triggering the workflow (I assume it was you? 😅 ), Now that all the checks have passed, what are the next steps? (trying to read up on https://cwiki.apache.org/confluence/display/SOLR/Git+commit+process and https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-Contributingyourwork)

There's no rush for this PR, but just want to make sure I am not holding off the process because of any missing items from me.

Big thank you again @HoustonPutman and @madrob !

@HoustonPutman
Copy link
Contributor

@patsonluk I think this is good to go! I'll add an entry in the changelog, then commit and backport this! (Gonna run the full test suite first though)

@HoustonPutman
Copy link
Contributor

Actually I can't push to your PR.

Can you add this at the end of the 9.1.0 "Improvements" section:

* SOLR-16257: Improve ZkStateReader to avoid race condition between collectionWatches and watchedCollectionStates
  (Patson Luk, Houston Putman, Mike Drob)

@HoustonPutman
Copy link
Contributor

Tests pass, this is good to go when we get the Changelog entry

@patsonluk
Copy link
Contributor Author

@HoustonPutman done! (hopefully this is the right place 😅 https://github.com/apache/solr/pull/909/files#diff-cdd1ae1b418ea74ca689c6e68e61f4e5b4c6f77f6408a4e6cf76ed8421b24d09)

@HoustonPutman HoustonPutman merged commit 97458a5 into apache:main Jul 20, 2022
@HoustonPutman
Copy link
Contributor

Thanks so much for the work here @patsonluk , it was a long one, but I think very beneficial to the ZkStateReader!

HoustonPutman pushed a commit that referenced this pull request Jul 20, 2022
Race condition existed between collectionWatches and watchedCollectionStates.

(cherry picked from commit 97458a5)
@patsonluk
Copy link
Contributor Author

patsonluk commented Jul 20, 2022

Thank you so much for all the great reviews and code suggestions! @HoustonPutman @madrob 🎉

Hopefully such fix can eliminate occasional issue with Solr cluster state having the stale state, we do run into it once in a while in our prod and just once earlier this morning 😓

@risdenk
Copy link
Contributor

risdenk commented Aug 8, 2022

FYI this has been causing NPE in ZkStateReaderTest - #966

@patsonluk
Copy link
Contributor Author

FYI this has been causing NPE in ZkStateReaderTest - #966

Thanks for fixing it! 🙇🏼

patsonluk added a commit to fullstorydev/lucene-solr that referenced this pull request Aug 17, 2022
hiteshk25 pushed a commit to fullstorydev/lucene-solr that referenced this pull request Sep 6, 2022
…d collection states (#176)

* Changes to avoid race condition in ZkStateReader on watched collection states

Downstream changes from apache/solr#909
and apache/solr#966

* Resolve merge conflict
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants