Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flake in C++ negative acknowledgement tests #7099

Merged
merged 1 commit into from
May 31, 2020

Conversation

merlimat
Copy link
Contributor

Motivation

Negative acknowledgement runs in the background on a consumer and
triggers redelivery of messages. The tests verify a that messages do
indeed get redelivered, and which messages they are, for the base
case, batching and partitioned consumer.

There's a fundamental dependency on timing in the base case. If 100ms
pass between consumer creation and receiving the last message in first
receive loop, redelivery will be triggered and the order of messages,
as asserted by the test will fail.

This first case can be fixed by moving the negative ack to run after
all messages have been received. However, this can also then fail for
the batch case.

If the negative ack tracker kicks off during the loop to negatively
ack the messages, then the redelivery will happen twice (and possibly
more times depending on how many time it manages to run).

For this reason, if we want the test to be deterministic, we need to
disable the tracker from kicking off redelivery while we send mark the
messages as negatively acked.

Negative acknowledgement runs in the background on a consumer and
triggers redelivery of messages. The tests verify a that messages do
indeed get redelivered, and which messages they are, for the base
case, batching and partitioned consumer.

There's a fundamental dependency on timing in the base case. If 100ms
pass between consumer creation and receiving the last message in first
receive loop, redelivery will be triggered and the order of messages,
as asserted by the test will fail.

This first case can be fixed by moving the negative ack to run after
all messages have been received. However, this can also then fail for
the batch case.

If the negative ack tracker kicks off during the loop to negatively
ack the messages, then the redelivery will happen twice (and possibly
more times depending on how many time it manages to run).

For this reason, if we want the test to be deterministic, we need to
disable the tracker from kicking off redelivery while we send mark the
messages as negatively acked.
@merlimat merlimat merged commit ae324b1 into apache:master May 31, 2020
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request Jun 1, 2020
Negative acknowledgement runs in the background on a consumer and
triggers redelivery of messages. The tests verify a that messages do
indeed get redelivered, and which messages they are, for the base
case, batching and partitioned consumer.

There's a fundamental dependency on timing in the base case. If 100ms
pass between consumer creation and receiving the last message in first
receive loop, redelivery will be triggered and the order of messages,
as asserted by the test will fail.

This first case can be fixed by moving the negative ack to run after
all messages have been received. However, this can also then fail for
the batch case.

If the negative ack tracker kicks off during the loop to negatively
ack the messages, then the redelivery will happen twice (and possibly
more times depending on how many time it manages to run).

For this reason, if we want the test to be deterministic, we need to
disable the tracker from kicking off redelivery while we send mark the
messages as negatively acked.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request Jun 1, 2020
Negative acknowledgement runs in the background on a consumer and
triggers redelivery of messages. The tests verify a that messages do
indeed get redelivered, and which messages they are, for the base
case, batching and partitioned consumer.

There's a fundamental dependency on timing in the base case. If 100ms
pass between consumer creation and receiving the last message in first
receive loop, redelivery will be triggered and the order of messages,
as asserted by the test will fail.

This first case can be fixed by moving the negative ack to run after
all messages have been received. However, this can also then fail for
the batch case.

If the negative ack tracker kicks off during the loop to negatively
ack the messages, then the redelivery will happen twice (and possibly
more times depending on how many time it manages to run).

For this reason, if we want the test to be deterministic, we need to
disable the tracker from kicking off redelivery while we send mark the
messages as negatively acked.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request Jun 12, 2020
Negative acknowledgement runs in the background on a consumer and
triggers redelivery of messages. The tests verify a that messages do
indeed get redelivered, and which messages they are, for the base
case, batching and partitioned consumer.

There's a fundamental dependency on timing in the base case. If 100ms
pass between consumer creation and receiving the last message in first
receive loop, redelivery will be triggered and the order of messages,
as asserted by the test will fail.

This first case can be fixed by moving the negative ack to run after
all messages have been received. However, this can also then fail for
the batch case.

If the negative ack tracker kicks off during the loop to negatively
ack the messages, then the redelivery will happen twice (and possibly
more times depending on how many time it manages to run).

For this reason, if we want the test to be deterministic, we need to
disable the tracker from kicking off redelivery while we send mark the
messages as negatively acked.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
Negative acknowledgement runs in the background on a consumer and
triggers redelivery of messages. The tests verify a that messages do
indeed get redelivered, and which messages they are, for the base
case, batching and partitioned consumer.

There's a fundamental dependency on timing in the base case. If 100ms
pass between consumer creation and receiving the last message in first
receive loop, redelivery will be triggered and the order of messages,
as asserted by the test will fail.

This first case can be fixed by moving the negative ack to run after
all messages have been received. However, this can also then fail for
the batch case.

If the negative ack tracker kicks off during the loop to negatively
ack the messages, then the redelivery will happen twice (and possibly
more times depending on how many time it manages to run).

For this reason, if we want the test to be deterministic, we need to
disable the tracker from kicking off redelivery while we send mark the
messages as negatively acked.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants