PubSub: check batch max bytes against request byte size #7108

relud · 2019-01-09T20:45:13Z

Is your feature request related to a problem? Please describe.

google.cloud.pubsub_v1.publisher._batch.thread.Batch enforces max bytes on the sum of PubsubMessage.ByteSize for each message in the batch, but the PublishRequest created by Batch.client.publish is larger than that. As a result I need to specify a non default max bytes to guarantee batches create valid requests.

Describe the solution you'd like

Enforce max bytes on the size of the PublishRequest created by Batch.client.publish:

diff --git a/pubsub/google/cloud/pubsub_v1/publisher/_batch/thread.py b/pubsub/google/cloud/pubsub_v1/publisher/_batch/thread.py
index f187024b7c..67fc04841a 100644
--- a/pubsub/google/cloud/pubsub_v1/publisher/_batch/thread.py
+++ b/pubsub/google/cloud/pubsub_v1/publisher/_batch/thread.py
@@ -76,7 +76,7 @@ class Batch(base.Batch):
         # any writes to them use the "state lock" to remain atomic.
         self._futures = []
         self._messages = []
-        self._size = 0
+        self._size = types.PublishRequest(topic=topic, messages=[]).ByteSize()
         self._status = base.BatchStatus.ACCEPTING_MESSAGES

         # If max latency is specified, start a thread to monitor the batch and
@@ -281,7 +281,7 @@ class Batch(base.Batch):
             if not self.will_accept(message):
                 return future

-            new_size = self._size + message.ByteSize()
+            new_size = self._size + types.PublishRequest(messages=[message]).ByteSize()
             new_count = len(self._messages) + 1
             overflow = (
                 new_size > self.settings.max_bytes

Describe alternatives you've considered

I can set max bytes to guarantee sufficient overhead, but I feel like it would be better if I didn't need to and this may result in fewer batches.

Additional context

see also #7107, which suggests adding an option to enforce this setting even when the batch is empty.

The text was updated successfully, but these errors were encountered:

kir-titievsky · 2019-05-08T19:29:41Z

At the very least, we should ensure that the default settings do not result in requests sizes that are too large.

sduskis · 2019-05-20T18:44:48Z

@kir-titievsky, it looks like the Python client currently behaves like the Java client. This feature request would constitute a departure in the logic between clients. It's certainly something to consider, but I'm going to remove the "triaged for GA" label for now.

kir-titievsky · 2019-05-20T19:01:22Z

hm... I would venture to say this is a bug in both Java and Python, rather than FR, in that the client's batching API does not take into consideration a known property of the service. But I agree that this is not GA blocking.

tseaver · 2019-05-20T19:47:11Z

@kir-titievsky Based on my somewhat rusty memory, we actually did raise early on when a single message was larger than max_bytes (#4872), and re-worked it (#5343) per request (#4608): see @theacodes comment that we needed to match Java / Go.

sduskis · 2019-05-21T12:35:57Z

@tseaver, this issue is about how we calculate the size of a PublishRequest, and not what we do when the size is too large.

The Java equivalent of types.PublishRequest(topic=topic, messages=[]).ByteSize() is 0. The smallest value of types.PublishRequest(messages=[message]).ByteSize() (assuming message is empty) is 2, and it goes up by 2 + messageSize for every message.

If that holds up in all languages, then the current calculations are off by len(messages) * 2 bytes. @kir-titievsky, I personally think that we'd have to consider this a "new feature" for all languages, and that this ought not be a GA blocking feature.

kir-titievsky · 2019-05-21T12:47:13Z

Agreed on not blocking GA on this.

…

On Tue, May 21, 2019, 8:36 AM Solomon Duskis ***@***.***> wrote: @tseaver <https://github.com/tseaver>, this issue is about how we calculate the size of a PublishRequest, and not what we do when the size is too large. The Java equivalent of types.PublishRequest(topic=topic, messages=[]).ByteSize() is 0. The smallest value of types.PublishRequest(messages=[message]).ByteSize() (assuming message is empty) is 2, and it goes up by 2 + messageSize for every message. If that holds up in all languages, then the current calculations are off by len(messages) * 2 bytes. @kir-titievsky <https://github.com/kir-titievsky>, I personally think that we'd have to consider this a "new feature" for all languages, and that this ought not be a GA blocking feature. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7108?email_source=notifications&email_token=AENMYFT5EWNUTY2KBALAACTPWPUD5A5CNFSM4GPBDJB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV3YNQI#issuecomment-494372545>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AENMYFUJ6P7J3JKVPRX5B3DPWPUD5ANCNFSM4GPBDJBQ> .

evanj · 2019-06-07T14:31:33Z

The calculations can be off by more than len(messages)*2 since protobuf length encoding is variable sized. E.g. I have a test program that adds messages of length 999996. The total length of the request goes up by message.ByteSize() + 4 for each message, since it takes 3 bytes to encode the length of the embedded message in the protobuf representation of a repeated field.

evanj · 2019-06-07T15:47:35Z

FWIW: It appears to me that Java uses an extremely low default batch size threshold of 1000 bytes, if I'm understanding the code correctly: https://github.com/googleapis/google-cloud-java/blob/67668c1411169338374b050eae50ed650e318c54/google-cloud-clients/google-cloud-pubsub/src/main/java/com/google/cloud/pubsub/v1/Publisher.java#L451

This would definitely help avoid the problem, since it will call Publish for basically every message.

hnykda · 2019-11-29T09:03:11Z

Hey. We just bumped into this. Here is an example of an error, as you can see, the request size is a few bytes over 10MB, I suspect the request data. It's very rare though (I estimated in our usecase something like 0.06%). I think we "solved" (workaround?) that by setting MAX_SIZE in BatchSettings to 9.5MB to reserve more space for the request data.

plamut · 2019-11-29T09:34:21Z

@hnykda Thanks for the report. It might be that the default MAX_SIZE setting is too high, or that there is a bug in computing the total message size (or both), resulting in the behavior reported.

It's good to hear that a (seemingly working) workaround exists, but it will still be worth taking another look at this.

hnykda · 2019-11-29T10:31:53Z

Happy to help, I definitely agree this should get fixed 👍 , doesn't seem to be super complex on the first sight.

Thanks for the work!

plamut · 2019-11-29T10:34:39Z

Since there is a size check in the code, but it does not prevent the "request_size is too large" error in all cases, I am re-classifying this as a bug.

plamut · 2019-12-02T16:48:25Z

The total publish request size increases by N + message.ByteSize(), where N depends on the length of the message:

>>> from google.cloud.pubsub import types
>>> req = types.PublishRequest(topic=b"", messages=[])
>>> req.ByteSize()
0
>>> msg = types.PubsubMessage(data=b"foo")
>>> msg.ByteSize()
5
>>> req.messages.append(msg)
>>> req.ByteSize()  # increased by 2 + msg size
7
>>> big_msg = types.PubsubMessage(data=b"x" * 999996)
>>> big_msg.ByteSize()
1000000
>>> req.messages.append(big_msg)
>>> req.ByteSize()  # increased by 4 + big_msg size
1000011

Currently, the current batch overflow logic does not take this overhead into account, it only considers the total message count and size.

Let's check the upper bound on the overhead of a single message:

>>> SERVER_MAX_SIZE = 10_000_000
>>> huge_msg = types.PubsubMessage(data=b"x" * (SERVER_MAX_SIZE - 10)
>>> huge_msg.ByteSize()
9999995
>>> req = types.PublishRequest(topic=b"", messages=[])  # ByteSize() == 0
>>> req.messages.append(huge_msg)
>>> req.ByteSize()  # 5 + huge_msg size
10000000

The maximum publish request size will still be accepted by the backend is 10e6, which means that the maximum byte size of a message to still fit into that is 9_999_995. At this size, the message length overhead consumes 5 bytes in the types.PublishRequest message that is sent to the backend.

As an approximation, the batch size computation logic can be adjusted to add 5 for each added message, which is enough to account for the size overhead that causes this issue. A more sophisticated logic could also compute the exact overhead.

Update:
Size increases in bytes for a message depending on the message's byte size (for messages up to byte size of 10e6):

msg.ByteSize()	overhead in PublishRequest
< 127	2
>= 127, < 16384	3
>= 16384, < 2_097_152	4
>= 2_097_152	5

A possibly slower alternative (needs profiling) is computing the message's size contribution directly as already mentioned in the issue description:

new_size = self._size + types.PublishRequest(messages=[message]).ByteSize()

The benefit of this approach is that any future changes to the internal protobuf encoding are less likely to break it, thus it should be preferred if its computational overhead is acceptable.

relud changed the title ~~PubSub: check BatchSettings.max_bytes against PublishRequest.ByteSize~~ PubSub: check batch max bytes against request byte size Jan 9, 2019

tseaver added api: pubsub Issues related to the Pub/Sub API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jan 9, 2019

sduskis added the triaged for GA label May 17, 2019

sduskis assigned plamut May 17, 2019

sduskis removed the triaged for GA label May 20, 2019

sduskis mentioned this issue Jun 7, 2019

pubsub: bundle size does not include request overhead googleapis/google-cloud-go#1453

Closed

plamut removed their assignment Aug 15, 2019

yoshi-automation added the 🚨 This issue needs some love. label Nov 29, 2019

plamut self-assigned this Nov 29, 2019

plamut removed the 🚨 This issue needs some love. label Nov 29, 2019

yoshi-automation added the 🚨 This issue needs some love. label Nov 29, 2019

plamut mentioned this issue Dec 3, 2019

fix(pubsub): include request overhead when computing publish batch size overflow #9911

Merged

4 tasks

plamut closed this as completed in #9911 Dec 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PubSub: check batch max bytes against request byte size #7108

PubSub: check batch max bytes against request byte size #7108

relud commented Jan 9, 2019 •

edited

Loading

kir-titievsky commented May 8, 2019

sduskis commented May 20, 2019

kir-titievsky commented May 20, 2019

tseaver commented May 20, 2019

sduskis commented May 21, 2019

kir-titievsky commented May 21, 2019 via email

evanj commented Jun 7, 2019

evanj commented Jun 7, 2019

hnykda commented Nov 29, 2019 •

edited

Loading

plamut commented Nov 29, 2019

hnykda commented Nov 29, 2019

plamut commented Nov 29, 2019

plamut commented Dec 2, 2019 •

edited

Loading

PubSub: check batch max bytes against request byte size #7108

PubSub: check batch max bytes against request byte size #7108

Comments

relud commented Jan 9, 2019 • edited Loading

kir-titievsky commented May 8, 2019

sduskis commented May 20, 2019

kir-titievsky commented May 20, 2019

tseaver commented May 20, 2019

sduskis commented May 21, 2019

kir-titievsky commented May 21, 2019 via email

evanj commented Jun 7, 2019

evanj commented Jun 7, 2019

hnykda commented Nov 29, 2019 • edited Loading

plamut commented Nov 29, 2019

hnykda commented Nov 29, 2019

plamut commented Nov 29, 2019

plamut commented Dec 2, 2019 • edited Loading

relud commented Jan 9, 2019 •

edited

Loading

hnykda commented Nov 29, 2019 •

edited

Loading

plamut commented Dec 2, 2019 •

edited

Loading