-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][test] Optimize TransactionEndToEndTest #18522
[improve][test] Optimize TransactionEndToEndTest #18522
Conversation
## Motivation 1. fix flaky test #18466 cause by txn async send method 2. decrease run time by optimize receive method ## Modification 1. modify `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();` to `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();` 2. modify ` Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS); Assert.assertNull(message);` to ` Message<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS); Assert.assertNull(message);` 2. modify `message = consumer.receive();` to `message = consumer.receive(5, TimeUnit.SECONDS);`
Codecov Report
@@ Coverage Diff @@
## master #18522 +/- ##
============================================
+ Coverage 43.96% 47.31% +3.35%
+ Complexity 10358 9256 -1102
============================================
Files 757 618 -139
Lines 72773 58568 -14205
Branches 7818 6093 -1725
============================================
- Hits 31993 27714 -4279
+ Misses 37104 27828 -9276
+ Partials 3676 3026 -650
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what's the root cause of the TransactionEndToEndTest
flaky test?
This test failed because of out of time limit (300 seconds), so I guess it was caused by the receive
with infinite timeout:
pulsar/pulsar-broker/src/test/java/org/apache/pulsar/client/impl/TransactionEndToEndTest.java
Lines 163 to 167 in 79d1e52
int receiveCnt = 0; | |
for (int i = 0; i < txnMessageCnt; i++) { | |
message = consumer.receive(); | |
Assert.assertNotNull(message); | |
receiveCnt ++; |
Could you please explain why changing sendAsync
to send
can fix this flaky test?
@labuladong The root cause is the logic of sequence ID and serializeAndSendMessage command not being contained in the same synchronized block. And then make the message drop due to deduplication. |
I suggest keeping the same timeout. |
@nodece I have already tried to unify the timeout time. But this will cause some test instability. Some tests may have special considerations and need to increase the timeout time, while some tests require a very short timeout time. So for those stable tests with a timeout, we should not modify it. |
@liangyepianzhou Please check the CI failed. |
## Motivation 1. fix flaky test apache#18466 caused by txn async send method 2. decrease run time by optimizing receive method ## Modification 1. fix flaky test * modify `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).sendAsync();` to `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();` This also can be resolved by apache#17836 and apache#18486 later. 2. decrease run time by optimizing receive method * modify ` Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS); Assert.assertNull(message);` to ` Message<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS); Assert.assertNull(message);` * modify `message = consumer.receive();` to `message = consumer.receive(5, TimeUnit.SECONDS);` * keep other `consumer.receive(x, y)` no change.
## Motivation 1. fix flaky test apache#18466 caused by txn async send method 2. decrease run time by optimizing receive method ## Modification 1. fix flaky test * modify `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).sendAsync();` to `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();` This also can be resolved by apache#17836 and apache#18486 later. 2. decrease run time by optimizing receive method * modify ` Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS); Assert.assertNull(message);` to ` Message<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS); Assert.assertNull(message);` * modify `message = consumer.receive();` to `message = consumer.receive(5, TimeUnit.SECONDS);` * keep other `consumer.receive(x, y)` no change.
Motivation
Modification
producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).sendAsync();
toproducer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();
This also can be resolved by [fix][client] moving get sequenceId into the sync code segment #17836 and [fix][client] Avoid redelivering duplicated messages when batching is enabled #18486 later.
Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS); Assert.assertNull(message);
toMessage<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS); Assert.assertNull(message);
message = consumer.receive();
tomessage = consumer.receive(5, TimeUnit.SECONDS);
consumer.receive(x, y)
no change.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: liangyepianzhou#12