Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAVA-3077: ListenersIT intermittently failing with: Wanted but not in… #1670

Merged
merged 1 commit into from
Jul 20, 2023

Conversation

hhughes
Copy link
Contributor

@hhughes hhughes commented Jun 30, 2023

…voked: schemaListener1.onSessionReady

ListenersIT.java:

  • Add comment explaining what methods are called from which thread
  • add 500ms timeout to mockito verify to wait for slow callbacks
  • add latches to wait for methods on custom listener subclasses

@absurdfarce absurdfarce self-requested a review June 30, 2023 19:58
Copy link
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes here will resolve the issue identified by the ticket. But I'm afraid we're doing quite a bit more as well, not all of which is necessary given the current structure of the driver code. That isn't necessarily a huge problem, but in this case I'd argue that some of these changes make the test a modest amount harder to read, which in turn obfuscates it's intent a bit. I'd prefer to avoid that if we possibly can.

Full disclosure: I haven't confirmed all of my assumptions against the code so it's quite possible I'm misunderstanding something.

inOrder1.verify(nodeListener1).onSessionReady(session);
inOrder1.verify(nodeListener1).onUp(nodeCaptor1.capture());
inOrder1.verify(nodeListener1, timeout(500).times(1)).onSessionReady(session);
inOrder1.verify(nodeListener1, timeout(500).times(1)).onUp(nodeCaptor1.capture());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout argument shouldn't be necessary for NodeStateListener.onUp() here. The test is built so that all NSLs are actually SafeInitNodeStateListeners which handle the possible out-of-order calls of on*() in relation to onSessionReady(). As a result any on*() method will get replayed once the session is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think about this I don't think the analysis above is right. SafeInitNodeStateListener will make sure we don't lose any event calls but it won't handle the concern about whether another thread has called this method when we aren't doing anything to ensure synchronization with that thread.

There is a big part of me that wishes we could just explicitly block until the admin thread has had a chance to notify all its listeners. I'm not sure what such an implementation would look like (and we wouldn't want to connect it too strongly to driver internals) but it feels like we're dancing around that kind of solution here without actually embracing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeStateListener#onSessionReady is called before the session's initFuture is completed so you right that both #onSessionReady and #onUp (called in SafeInitNodeStateListener#onSessionReady) will have been called by the time we get to this point in the test

InOrder inOrder1 = inOrder(nodeListener1);
inOrder1.verify(nodeListener1).onSessionReady(session);
inOrder1.verify(nodeListener1).onUp(nodeCaptor1.capture());
inOrder1.verify(nodeListener1, timeout(500).times(1)).onSessionReady(session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need a timeout for this call, right? I haven't validated against the code but won't this be called before session initialization is completed?

verify(schemaListener1, timeout(500).times(1)).onSessionReady(session);
verify(schemaListener2, timeout(500).times(1)).onSessionReady(session);

verify(requestTracker1, timeout(500).times(1)).onSessionReady(session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestTracker only has the single callback method (so no interleaving between threads) and is called by the thread handling session initialization. Like my comment above: do we need the timeout here or are we guaranteed that this will always be called before the session is returned?

assertThat(
Uninterruptibles.awaitUninterruptibly(
MyRequestTracker.onSessionReadyLatch, 500, TimeUnit.MILLISECONDS))
.isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar argument for the instances declared via config. Since MyNodeStateListener already extends SafeInitNodeStateListener we should be okay here (as long as onSessionReady() is checked before onUp() for NSL ops), right?

Similar arguments for the other listeners.

assertThat(
Uninterruptibles.awaitUninterruptibly(
MyRequestTracker.closeLatch, 500, TimeUnit.MILLISECONDS))
.isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think the close() methods are a problem yet? This seems like a lot of code to address that case. I guess I'm wondering if a single timeout check (or even a CountDownLatch perhaps) would do the job here.

I say all that having not actually looked at the relevant code yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close for all listeners is called before the session's closeFuture is completed so you are right and there were no issues with how close was being checked previously in the test, I'll revert those changes

assertThat(MySchemaChangeListener.onSessionReadyCalled).isTrue();

assertThat(MyRequestTracker.onSessionReadyCalled).isTrue();
verify(schemaListener1, timeout(500).times(1)).onSessionReady(session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the heart of the problem; an event that's called from a thread totally unrelated to the thread governing where we're at in this code. Seems like this is the truly essential change we need here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! You are correct, only the schema listener's callbacks are not being waited for by other mechanisms (session init/close) so these should be the only ones that need to change. I think we also need to keep the code that waits for MySchemaChangeListener#onSessionReady because the bg thread may not have gone through all the schema listeners, even if we verfied one was called earlier. I'll revert all the unnecessary synchronization/waiting and update the comments with my learnings here.

@hhughes hhughes force-pushed the JAVA-3077 branch 2 times, most recently from 93dc3b4 to 13542ff Compare July 7, 2023 00:03
@hhughes hhughes requested a review from absurdfarce July 7, 2023 00:04
Copy link
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like where this landed!

…voked: schemaListener1.onSessionReady

ListenersIT.java:
- Add 500ms wait on SchemaListener#onSessionReady call verification
- Add latch to wait for MySchemaChangeListener#onSessionReady
- Add some comments around which listeners/methods need synchronizations and why/not
@hhughes hhughes merged commit e8b25ab into 4.x Jul 20, 2023
2 of 3 checks passed
@hhughes hhughes deleted the JAVA-3077 branch July 26, 2023 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants