Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Intermittent test failures in ProxyPublishConsumeTest.socketTest #253

Merged
merged 1 commit into from
Mar 1, 2017

Conversation

rdhabalia
Copy link
Contributor

Motivation

#237 sometime WebSocketClient.stop() gets stuck and it cause the test-case failure. So, wait for 2-seconds for clients to be closed and then ignore it.

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
    at org.eclipse.jetty.io.ManagedSelector$CloseEndPoints.await(ManagedSelector.java:725)
    at org.eclipse.jetty.io.ManagedSelector.doStop(ManagedSelector.java:114)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
    at org.eclipse.jetty.io.SelectorManager.doStop(SelectorManager.java:290)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
    at org.eclipse.jetty.websocket.client.io.ConnectionManager.doStop(ConnectionManager.java:160)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
    at org.eclipse.jetty.websocket.client.WebSocketClient.doStop(WebSocketClient.java:288)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
    at com.yahoo.pulsar.websocket.proxy.ProxyPublishConsumeTest.socketTest(ProxyPublishConsumeTest.java:105)

@rdhabalia rdhabalia added this to the 1.17 milestone Feb 28, 2017
@rdhabalia rdhabalia self-assigned this Feb 28, 2017
} catch (Exception e) {
log.error(e.getMessage());
}
newFixedThreadPool(1).submit(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The executor thread will be leaked in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, updated it.

@rdhabalia rdhabalia force-pushed the socket_test branch 2 times, most recently from cea3e06 to 1988a30 Compare February 28, 2017 21:59
}
});
// let's wait for clients to be stopped
Thread.sleep(2000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of always sleeping 2 seconds, we could use the future returned by executor.submit() and do a get() with 2sec timeout.

Also, can you do the same change in ProxyPublishConsumeWithoutZKTest and ProxyPublishConsumeTls ?

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rdhabalia rdhabalia requested review from saandrews and removed request for saandrews March 1, 2017 00:46
@rdhabalia rdhabalia merged commit 8ec61b2 into apache:master Mar 1, 2017
@rdhabalia rdhabalia deleted the socket_test branch June 21, 2017 18:55
sijie pushed a commit to sijie/pulsar that referenced this pull request Mar 4, 2018
hangc0276 pushed a commit to hangc0276/pulsar that referenced this pull request May 26, 2021
Fixes apache#247

The original implementation of message publishing cannot guarantee that a topic partition's pending writes are completed in order. This PR is to fix the issue by refactoring the handler for Produce requests.

The basic steps of handling a single partition's produce request are:

Get PersistentTopic from topic manager;
Convert Kafka's MemoryRecords to Pulsar's ByteBuf;
Call PersistentTopic#publishMessages to write ByteBuf to BK asynchronously.
This PR adds a PendingProduce class to compose pending step 1 and 2 to a CompletableFuture, then add the PendingProduce object to a queue (PendingProduceQueue) which is associated with the partition name in a map named pendingProduceQueueMap. When the step 1 and 2 are completed, the queue will try to remove all ready PendingProduce objects in the head and call PersistentTopic#publishMessages. We use synchronized keyword to make the remove operation thread safe.

Therefore, the first two steps can be executed in parallel, but the third step is executed in order.

In addition, this PR fixes the current message order test by using different batch size. In the original test, all messages are batched to a single batch, so the disorder never happens because there's only one batch.


* Refactor handleProduceRequest to fix message disorder issue

* Fix existed message order test

* Synchronize PendingProduce#publishMessages

* Test message order with different batch.size config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants