-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pubsub): set default stream ACK deadline to subscriptions' #9268
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@pradn Assigned to you, but feel free to delegate to anybody else from the team that will take over the PubSub lib. Also, feel free to ping me for any information about the inner workings of the streaming pull manager. |
googlebot
added
the
cla: yes
This human has signed the Contributor License Agreement.
label
Sep 23, 2019
tseaver
approved these changes
Sep 23, 2019
plamut
changed the title
PubSub: Set default stream ACK deadline to subscriptions'
fix(pubsub): set default stream ACK deadline to subscriptions'
Sep 24, 2019
plamut
added
the
kokoro:force-run
Add this label to force Kokoro to re-run the tests.
label
Sep 24, 2019
yoshi-kokoro
removed
the
kokoro:force-run
Add this label to force Kokoro to re-run the tests.
label
Sep 24, 2019
plamut
force-pushed
the
iss-9252
branch
2 times, most recently
from
September 24, 2019 07:20
86c50cd
to
dfc21f1
Compare
pradn
approved these changes
Sep 25, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
nit: spelling in commit message ("streamimng")
pradn
reviewed
Sep 25, 2019
When subscribing, it makes sense to use the configured subscription's maximum ACK deadline for the streaming pull, instead of an optimistic minimum of 10 seconds. Using an optimistic deadline affects messages that are put on hold and are not lease managed, because by the time the client dispatches them to the user's callback, the optimistic ACK deadline could have already been missed, resulting in the backend unnecessary re-sending those messages, even if the subscription's ACK deadline has not been hit yet.
pradn
approved these changes
Sep 27, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
api: pubsub
Issues related to the Pub/Sub API.
cla: yes
This human has signed the Contributor License Agreement.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #9252.
This PR mitigates the reported issue by using the subscription's max ACK deadline when establishing a streaming pull, instead of the previous optimistic ACK deadline of 10 seconds.
(when creating a streaming pull, the ACK timing histogram has no data yet, thus its 99 % value defaults to 10 seconds... also see the commit message for some extra info)
Additionally, it adds a debug log message with the ACK deadline used for the streaming pull, as that important piece of information was missing previously.
How to test
Run the example from the issue description, and verify that the backend does not send any duplicate messages, and the subscription's ACK deadline is respected (300 s in that particular case).
Details
The cause of the bug was that the client overrode the max ACK deadline on the subscription with the default minimum deadline of 10 seconds.
This became problematic for the received messages that exceeded the
FlowControl.max_messages
setting, and were thus put on hold in a buffer. Since these messages were not lease-managed, and their ACK deadlines were not being extended, the overridden ACK deadline of 10 seconds was easily hit, resulting in re-delivery.There are several options on how to deal with these on hold messages:
Discard them immediately, and set their ACK deadlines to 0.
This would cause the backend to re-send them immediately, which would work well with multiple subscribers. However, if using a single subscriber, low
max_messages
setting, and processing each message takes a lot of time, large published message batches would cause a lot of unnecessary traffic.The excessive messages in a batch (those above
max_messages
) would be discarded and re-delivered over and over again, because each time the client would dispatch only a small portion of the batch to user's callbacks.Add all received messages to lease management, including those put on hold.
This would avoid ACK timeouts and message re-delivery, but could shoot the leaser's load well over
1.0
. When user code would process and acknowledge some of the messages, the client would remove them from the lease management, but the load could still stay above1.0
Since the leaser would still be overloaded, no additional messages would be dispatched to the callbacks, and the stream would not be resumed --> deadlock (there would be nothing to trigger resuming the streaming pull).
While there could be a way around that, it would likely require rewriting significant chunks of the streaming pull logic, risking new bugs.
Ignoring the messages on hold until the load allows processing them.
This is the path chosen. Only the messages that are not put on hold are added to the lease management, while the rest sit in the buffer, waiting to be released when the load allows for it. If they are released soon enough, the auto-lease management steps in.
While these messages could still remain in the buffer for too long, missing the ACK deadline, those messages would be re-delivered to other subscribers, as the overloaded subscriber's stream would be paused in the meantime (the subscriber does not resume the stream, if it has any messages on hold).
In the case of a single subscriber client, the messages would indeed be re-delivered to it, but on the other hand the ACK histogram times would pick that up and gradually extend the ACK deadlines that the auto-leaser sets for the messages.