Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][test] Optimize TransactionEndToEndTest #18522

Merged
merged 2 commits into from
Nov 24, 2022
Merged

[improve][test] Optimize TransactionEndToEndTest #18522

merged 2 commits into from
Nov 24, 2022

Conversation

liangyepianzhou
Copy link
Contributor

@liangyepianzhou liangyepianzhou commented Nov 17, 2022

Motivation

  1. fix flaky test Flaky-test: TransactionEndToEndTest.produceCommitTest #18466 caused by txn async send method
  2. decrease run time by optimizing receive method

Modification

  1. fix flaky test
  2. decrease run time by optimizing receive method
    • modify
      Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS); Assert.assertNull(message); to
      Message<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS); Assert.assertNull(message);
    • modify message = consumer.receive(); to message = consumer.receive(5, TimeUnit.SECONDS);
    • keep other consumer.receive(x, y) no change.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: liangyepianzhou#12

## Motivation
1. fix flaky test #18466 cause by txn async send method
2. decrease run time by optimize receive method
## Modification
1. modify `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();` to `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();`
2. modify `        Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS);
                   Assert.assertNull(message);` to
                   `  Message<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS);
                            Assert.assertNull(message);`

2. modify `message = consumer.receive();` to `message = consumer.receive(5, TimeUnit.SECONDS);`
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 17, 2022
@liangyepianzhou liangyepianzhou requested review from congbobo184, gaoran10 and codelipenghui and removed request for congbobo184 November 17, 2022 15:38
@liangyepianzhou liangyepianzhou self-assigned this Nov 18, 2022
@Technoboy- Technoboy- added this to the 2.12.0 milestone Nov 18, 2022
@Technoboy- Technoboy- closed this Nov 18, 2022
@Technoboy- Technoboy- reopened this Nov 18, 2022
@codecov-commenter
Copy link

codecov-commenter commented Nov 18, 2022

Codecov Report

Merging #18522 (4b67634) into master (545f33f) will increase coverage by 3.35%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #18522      +/-   ##
============================================
+ Coverage     43.96%   47.31%   +3.35%     
+ Complexity    10358     9256    -1102     
============================================
  Files           757      618     -139     
  Lines         72773    58568   -14205     
  Branches       7818     6093    -1725     
============================================
- Hits          31993    27714    -4279     
+ Misses        37104    27828    -9276     
+ Partials       3676     3026     -650     
Flag Coverage Δ
unittests 47.31% <ø> (+3.35%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../apache/pulsar/broker/admin/impl/PackagesBase.java 54.12% <0.00%> (-13.77%) ⬇️
...ce/schema/validator/StructSchemaDataValidator.java 52.38% <0.00%> (-9.53%) ⬇️
.../org/apache/pulsar/broker/service/PulsarStats.java 80.53% <0.00%> (-6.20%) ⬇️
...pache/pulsar/broker/admin/v2/PersistentTopics.java 71.68% <0.00%> (-2.14%) ⬇️
...rg/apache/pulsar/broker/web/PulsarWebResource.java 56.43% <0.00%> (-1.87%) ⬇️
.../org/apache/pulsar/broker/admin/v2/Namespaces.java 57.65% <0.00%> (-1.64%) ⬇️
.../org/apache/pulsar/client/impl/ConnectionPool.java 37.43% <0.00%> (-1.03%) ⬇️
...pache/pulsar/broker/admin/impl/NamespacesBase.java 63.04% <0.00%> (-0.59%) ⬇️
.../org/apache/pulsar/broker/admin/AdminResource.java 65.43% <0.00%> (-0.47%) ⬇️
...sar/broker/loadbalance/impl/LoadManagerShared.java 43.85% <0.00%> (-0.44%) ⬇️
... and 198 more

@Technoboy- Technoboy- changed the title [improve][test]Optimize TransactionEndToEndTest [improve][test] Optimize TransactionEndToEndTest Nov 19, 2022
Copy link
Contributor

@labuladong labuladong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what's the root cause of the TransactionEndToEndTest flaky test?

This test failed because of out of time limit (300 seconds), so I guess it was caused by the receive with infinite timeout:

int receiveCnt = 0;
for (int i = 0; i < txnMessageCnt; i++) {
message = consumer.receive();
Assert.assertNotNull(message);
receiveCnt ++;

Could you please explain why changing sendAsync to send can fix this flaky test?

@liangyepianzhou
Copy link
Contributor Author

I'm wondering what's the root cause of the TransactionEndToEndTest flaky test?

This test failed because of out of time limit (300 seconds), so I guess it was caused by the receive with infinite timeout:

int receiveCnt = 0;
for (int i = 0; i < txnMessageCnt; i++) {
message = consumer.receive();
Assert.assertNotNull(message);
receiveCnt ++;

Could you please explain why changing sendAsync to send can fix this flaky test?

@labuladong The root cause is the logic of sequence ID and serializeAndSendMessage command not being contained in the same synchronized block. And then make the message drop due to deduplication.
Https://github.com/apache/pulsar/pull/17836 has already fixed this.

@nodece
Copy link
Member

nodece commented Nov 21, 2022

I suggest keeping the same timeout.

@liangyepianzhou
Copy link
Contributor Author

I suggest keeping the same timeout.

@nodece I have already tried to unify the timeout time. But this will cause some test instability. Some tests may have special considerations and need to increase the timeout time, while some tests require a very short timeout time. So for those stable tests with a timeout, we should not modify it.

@nodece
Copy link
Member

nodece commented Nov 21, 2022

@liangyepianzhou Please check the CI failed.

@liangyepianzhou liangyepianzhou merged commit f3ac2e6 into apache:master Nov 24, 2022
lifepuzzlefun pushed a commit to lifepuzzlefun/pulsar that referenced this pull request Dec 9, 2022
## Motivation
1. fix flaky test apache#18466 caused by txn async send method
2. decrease run time by optimizing receive method 
## Modification
1. fix flaky test
   * modify `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).sendAsync();` to `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();` 
This also can be resolved by apache#17836 and apache#18486 later.
2. decrease run time by optimizing receive method 
    * modify
 `    Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS);
                   Assert.assertNull(message);` to
                   `  Message<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS);
                            Assert.assertNull(message);`
   * modify `message = consumer.receive();` to `message = consumer.receive(5, TimeUnit.SECONDS);`
   * keep other `consumer.receive(x, y)` no change.
lifepuzzlefun pushed a commit to lifepuzzlefun/pulsar that referenced this pull request Jan 10, 2023
## Motivation
1. fix flaky test apache#18466 caused by txn async send method
2. decrease run time by optimizing receive method 
## Modification
1. fix flaky test
   * modify `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).sendAsync();` to `producer.newMessage(txn1).value(("Hello Txn - " + i).getBytes(UTF_8)).send();` 
This also can be resolved by apache#17836 and apache#18486 later.
2. decrease run time by optimizing receive method 
    * modify
 `    Message<byte[]> message = consumer.receive(5, TimeUnit.SECONDS);
                   Assert.assertNull(message);` to
                   `  Message<byte[]> message = consumer.receive(300, TimeUnit.MILLISECONDS);
                            Assert.assertNull(message);`
   * modify `message = consumer.receive();` to `message = consumer.receive(5, TimeUnit.SECONDS);`
   * keep other `consumer.receive(x, y)` no change.
@liangyepianzhou liangyepianzhou deleted the xiangying/improve/test/optimizeTransactionEndToEndTest branch February 27, 2023 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants