-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JAVA-3077: ListenersIT intermittently failing with: Wanted but not in… #1670
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes here will resolve the issue identified by the ticket. But I'm afraid we're doing quite a bit more as well, not all of which is necessary given the current structure of the driver code. That isn't necessarily a huge problem, but in this case I'd argue that some of these changes make the test a modest amount harder to read, which in turn obfuscates it's intent a bit. I'd prefer to avoid that if we possibly can.
Full disclosure: I haven't confirmed all of my assumptions against the code so it's quite possible I'm misunderstanding something.
inOrder1.verify(nodeListener1).onSessionReady(session); | ||
inOrder1.verify(nodeListener1).onUp(nodeCaptor1.capture()); | ||
inOrder1.verify(nodeListener1, timeout(500).times(1)).onSessionReady(session); | ||
inOrder1.verify(nodeListener1, timeout(500).times(1)).onUp(nodeCaptor1.capture()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout argument shouldn't be necessary for NodeStateListener.onUp() here. The test is built so that all NSLs are actually SafeInitNodeStateListeners which handle the possible out-of-order calls of on*() in relation to onSessionReady(). As a result any on*() method will get replayed once the session is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think about this I don't think the analysis above is right. SafeInitNodeStateListener will make sure we don't lose any event calls but it won't handle the concern about whether another thread has called this method when we aren't doing anything to ensure synchronization with that thread.
There is a big part of me that wishes we could just explicitly block until the admin thread has had a chance to notify all its listeners. I'm not sure what such an implementation would look like (and we wouldn't want to connect it too strongly to driver internals) but it feels like we're dancing around that kind of solution here without actually embracing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeStateListener#onSessionReady is called before the session's initFuture is completed so you right that both #onSessionReady and #onUp (called in SafeInitNodeStateListener#onSessionReady) will have been called by the time we get to this point in the test
InOrder inOrder1 = inOrder(nodeListener1); | ||
inOrder1.verify(nodeListener1).onSessionReady(session); | ||
inOrder1.verify(nodeListener1).onUp(nodeCaptor1.capture()); | ||
inOrder1.verify(nodeListener1, timeout(500).times(1)).onSessionReady(session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need a timeout for this call, right? I haven't validated against the code but won't this be called before session initialization is completed?
verify(schemaListener1, timeout(500).times(1)).onSessionReady(session); | ||
verify(schemaListener2, timeout(500).times(1)).onSessionReady(session); | ||
|
||
verify(requestTracker1, timeout(500).times(1)).onSessionReady(session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RequestTracker only has the single callback method (so no interleaving between threads) and is called by the thread handling session initialization. Like my comment above: do we need the timeout here or are we guaranteed that this will always be called before the session is returned?
assertThat( | ||
Uninterruptibles.awaitUninterruptibly( | ||
MyRequestTracker.onSessionReadyLatch, 500, TimeUnit.MILLISECONDS)) | ||
.isTrue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar argument for the instances declared via config. Since MyNodeStateListener already extends SafeInitNodeStateListener we should be okay here (as long as onSessionReady() is checked before onUp() for NSL ops), right?
Similar arguments for the other listeners.
assertThat( | ||
Uninterruptibles.awaitUninterruptibly( | ||
MyRequestTracker.closeLatch, 500, TimeUnit.MILLISECONDS)) | ||
.isTrue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think the close() methods are a problem yet? This seems like a lot of code to address that case. I guess I'm wondering if a single timeout check (or even a CountDownLatch perhaps) would do the job here.
I say all that having not actually looked at the relevant code yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close for all listeners is called before the session's closeFuture is completed so you are right and there were no issues with how close was being checked previously in the test, I'll revert those changes
assertThat(MySchemaChangeListener.onSessionReadyCalled).isTrue(); | ||
|
||
assertThat(MyRequestTracker.onSessionReadyCalled).isTrue(); | ||
verify(schemaListener1, timeout(500).times(1)).onSessionReady(session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the heart of the problem; an event that's called from a thread totally unrelated to the thread governing where we're at in this code. Seems like this is the truly essential change we need here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review! You are correct, only the schema listener's callbacks are not being waited for by other mechanisms (session init/close) so these should be the only ones that need to change. I think we also need to keep the code that waits for MySchemaChangeListener#onSessionReady because the bg thread may not have gone through all the schema listeners, even if we verfied one was called earlier. I'll revert all the unnecessary synchronization/waiting and update the comments with my learnings here.
93dc3b4
to
13542ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like where this landed!
…voked: schemaListener1.onSessionReady ListenersIT.java: - Add 500ms wait on SchemaListener#onSessionReady call verification - Add latch to wait for MySchemaChangeListener#onSessionReady - Add some comments around which listeners/methods need synchronizations and why/not
…voked: schemaListener1.onSessionReady
ListenersIT.java: