New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

JAVA-3077: ListenersIT intermittently failing with: Wanted but not in… #1670

Merged

hhughes merged 1 commit into 4.x from JAVA-3077

Jul 20, 2023

Contributor

hhughes commented Jun 30, 2023

…voked: schemaListener1.onSessionReady

ListenersIT.java:

Add comment explaining what methods are called from which thread
add 500ms timeout to mockito verify to wait for slow callbacks
add latches to wait for methods on custom listener subclasses

absurdfarce self-requested a review

June 30, 2023 19:58

absurdfarce reviewed

View reviewed changes

Contributor

absurdfarce left a comment

I think the changes here will resolve the issue identified by the ticket. But I'm afraid we're doing quite a bit more as well, not all of which is necessary given the current structure of the driver code. That isn't necessarily a huge problem, but in this case I'd argue that some of these changes make the test a modest amount harder to read, which in turn obfuscates it's intent a bit. I'd prefer to avoid that if we possibly can.

Full disclosure: I haven't confirmed all of my assumptions against the code so it's quite possible I'm misunderstanding something.

integration-tests/src/test/java/com/datastax/oss/driver/core/session/ListenersIT.java Outdated

-                    inOrder1.verify(nodeListener1).onSessionReady(session);
-                    inOrder1.verify(nodeListener1).onUp(nodeCaptor1.capture());
+                    inOrder1.verify(nodeListener1, timeout(500).times(1)).onSessionReady(session);
+                    inOrder1.verify(nodeListener1, timeout(500).times(1)).onUp(nodeCaptor1.capture());

Contributor

absurdfarce Jul 5, 2023

The timeout argument shouldn't be necessary for NodeStateListener.onUp() here. The test is built so that all NSLs are actually SafeInitNodeStateListeners which handle the possible out-of-order calls of on*() in relation to onSessionReady(). As a result any on*() method will get replayed once the session is available.

Contributor

absurdfarce Jul 6, 2023

As I think about this I don't think the analysis above is right. SafeInitNodeStateListener will make sure we don't lose any event calls but it won't handle the concern about whether another thread has called this method when we aren't doing anything to ensure synchronization with that thread.

There is a big part of me that wishes we could just explicitly block until the admin thread has had a chance to notify all its listeners. I'm not sure what such an implementation would look like (and we wouldn't want to connect it too strongly to driver internals) but it feels like we're dancing around that kind of solution here without actually embracing it.

Contributor Author

hhughes Jul 6, 2023

NodeStateListener#onSessionReady is called before the session's initFuture is completed so you right that both #onSessionReady and #onUp (called in SafeInitNodeStateListener#onSessionReady) will have been called by the time we get to this point in the test

integration-tests/src/test/java/com/datastax/oss/driver/core/session/ListenersIT.java Outdated

                     InOrder inOrder1 = inOrder(nodeListener1);
-                    inOrder1.verify(nodeListener1).onSessionReady(session);
-                    inOrder1.verify(nodeListener1).onUp(nodeCaptor1.capture());
+                    inOrder1.verify(nodeListener1, timeout(500).times(1)).onSessionReady(session);

Contributor

absurdfarce Jul 5, 2023

We shouldn't need a timeout for this call, right? I haven't validated against the code but won't this be called before session initialization is completed?

integration-tests/src/test/java/com/datastax/oss/driver/core/session/ListenersIT.java Outdated

+                    verify(schemaListener1, timeout(500).times(1)).onSessionReady(session);
+                    verify(schemaListener2, timeout(500).times(1)).onSessionReady(session);
+                    verify(requestTracker1, timeout(500).times(1)).onSessionReady(session);

Contributor

absurdfarce Jul 5, 2023

RequestTracker only has the single callback method (so no interleaving between threads) and is called by the thread handling session initialization. Like my comment above: do we need the timeout here or are we guaranteed that this will always be called before the session is returned?

integration-tests/src/test/java/com/datastax/oss/driver/core/session/ListenersIT.java Outdated

+                    assertThat(
+                            Uninterruptibles.awaitUninterruptibly(
+                                MyRequestTracker.onSessionReadyLatch, 500, TimeUnit.MILLISECONDS))
+                        .isTrue();

Contributor

absurdfarce Jul 5, 2023

Similar argument for the instances declared via config. Since MyNodeStateListener already extends SafeInitNodeStateListener we should be okay here (as long as onSessionReady() is checked before onUp() for NSL ops), right?

Similar arguments for the other listeners.

integration-tests/src/test/java/com/datastax/oss/driver/core/session/ListenersIT.java Outdated

+                  assertThat(
+                          Uninterruptibles.awaitUninterruptibly(
+                              MyRequestTracker.closeLatch, 500, TimeUnit.MILLISECONDS))
+                      .isTrue();

Contributor

absurdfarce Jul 5, 2023

Do we think the close() methods are a problem yet? This seems like a lot of code to address that case. I guess I'm wondering if a single timeout check (or even a CountDownLatch perhaps) would do the job here.

I say all that having not actually looked at the relevant code yet.

Contributor Author

hhughes Jul 6, 2023

close for all listeners is called before the session's closeFuture is completed so you are right and there were no issues with how close was being checked previously in the test, I'll revert those changes

integration-tests/src/test/java/com/datastax/oss/driver/core/session/ListenersIT.java

-                    assertThat(MySchemaChangeListener.onSessionReadyCalled).isTrue();
-                    assertThat(MyRequestTracker.onSessionReadyCalled).isTrue();
+                    verify(schemaListener1, timeout(500).times(1)).onSessionReady(session);

Contributor

absurdfarce Jul 5, 2023

This seems like the heart of the problem; an event that's called from a thread totally unrelated to the thread governing where we're at in this code. Seems like this is the truly essential change we need here.

Contributor Author

hhughes Jul 6, 2023

Thank you for the review! You are correct, only the schema listener's callbacks are not being waited for by other mechanisms (session init/close) so these should be the only ones that need to change. I think we also need to keep the code that waits for MySchemaChangeListener#onSessionReady because the bg thread may not have gone through all the schema listeners, even if we verfied one was called earlier. I'll revert all the unnecessary synchronization/waiting and update the comments with my learnings here.

hhughes force-pushed the JAVA-3077 branch 2 times, most recently from 93dc3b4 to 13542ff Compare

July 7, 2023 00:03

hhughes requested a review from absurdfarce

July 7, 2023 00:04

absurdfarce approved these changes

View reviewed changes

Contributor

absurdfarce left a comment

Nice, I like where this landed!


          JAVA-3077: ListenersIT intermittently failing with: Wanted but not in…

2ed0d58

…voked: schemaListener1.onSessionReady

ListenersIT.java:
- Add 500ms wait on SchemaListener#onSessionReady call verification
- Add latch to wait for MySchemaChangeListener#onSessionReady
- Add some comments around which listeners/methods need synchronizations and why/not

hhughes force-pushed the JAVA-3077 branch from 13542ff to 2ed0d58 Compare

July 20, 2023 19:06

hhughes merged commit e8b25ab into 4.x

2 of 3 checks passed

hhughes deleted the JAVA-3077 branch

July 26, 2023 19:03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet