-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] store: reuse buffers for serializing Series() responses #4535
Conversation
Compared with v1: * Uses less CPU because Size() is called only once just as before; * Uses `protoc-go-inject-field` to inject extra fields into SeriesResponse which are used for marshaling the responses; * Uses []byte instead of bytes.Buffer because bytes.Buffer allocates 2x amount of memory and in our cases the responses are more or less of the same size so less memory is wasted; I think this version is more maintainable since the details of the allocation are hidden from the Series() function itself, it all happens in the Marshal() function. Benchmarks: ``` name old time/op new time/op delta BucketSeries/1000000SeriesWith1Samples/1of1000000-16 97.8ms ± 6% 89.4ms ± 8% -8.63% (p=0.000 n=49+45) BucketSeries/1000000SeriesWith1Samples/10of1000000-16 92.7ms ±12% 88.6ms ± 7% -4.42% (p=0.000 n=48+47) BucketSeries/1000000SeriesWith1Samples/1000000of1000000-16 1.14s ± 4% 1.11s ± 3% -2.89% (p=0.000 n=50+49) BucketSeries/100000SeriesWith100Samples/1of10000000-16 6.58ms ± 2% 6.70ms ± 9% ~ (p=0.125 n=46+50) BucketSeries/100000SeriesWith100Samples/100of10000000-16 6.73ms ± 5% 6.82ms ± 9% ~ (p=0.575 n=49+50) BucketSeries/100000SeriesWith100Samples/10000000of10000000-16 123ms ± 5% 119ms ± 7% -2.92% (p=0.000 n=48+50) BucketSeries/1SeriesWith10000000Samples/1of10000000-16 131µs ± 6% 129µs ± 6% -0.85% (p=0.027 n=47+47) BucketSeries/1SeriesWith10000000Samples/100of10000000-16 129µs ± 2% 129µs ± 8% ~ (p=0.358 n=44+49) BucketSeries/1SeriesWith10000000Samples/10000000of10000000-16 38.6ms ± 9% 34.7ms ±10% -10.11% (p=0.000 n=49+50) name old alloc/op new alloc/op delta BucketSeries/1000000SeriesWith1Samples/1of1000000-16 62.0MB ± 0% 62.0MB ± 0% ~ (p=0.529 n=47+50) BucketSeries/1000000SeriesWith1Samples/10of1000000-16 62.1MB ± 0% 62.1MB ± 0% ~ (p=0.334 n=49+50) BucketSeries/1000000SeriesWith1Samples/1000000of1000000-16 1.37GB ± 0% 1.27GB ± 0% -7.03% (p=0.000 n=49+50) BucketSeries/100000SeriesWith100Samples/1of10000000-16 4.85MB ± 0% 4.85MB ± 0% ~ (p=0.365 n=50+50) BucketSeries/100000SeriesWith100Samples/100of10000000-16 4.86MB ± 0% 4.86MB ± 0% ~ (p=0.579 n=50+50) BucketSeries/100000SeriesWith100Samples/10000000of10000000-16 157MB ± 4% 130MB ± 5% -16.99% (p=0.000 n=50+50) BucketSeries/1SeriesWith10000000Samples/1of10000000-16 213kB ± 0% 213kB ± 0% +0.14% (p=0.000 n=50+48) BucketSeries/1SeriesWith10000000Samples/100of10000000-16 213kB ± 0% 213kB ± 0% +0.14% (p=0.000 n=50+50) BucketSeries/1SeriesWith10000000Samples/10000000of10000000-16 115MB ± 0% 62MB ± 8% -45.98% (p=0.000 n=49+50) name old allocs/op new allocs/op delta BucketSeries/1000000SeriesWith1Samples/1of1000000-16 9.69k ± 0% 9.70k ± 1% ~ (p=0.143 n=49+50) BucketSeries/1000000SeriesWith1Samples/10of1000000-16 9.79k ± 0% 9.79k ± 0% ~ (p=0.845 n=49+49) BucketSeries/1000000SeriesWith1Samples/1000000of1000000-16 11.0M ± 0% 10.0M ± 0% -9.06% (p=0.000 n=49+50) BucketSeries/100000SeriesWith100Samples/1of10000000-16 1.10k ± 0% 1.10k ± 0% +0.27% (p=0.000 n=50+50) BucketSeries/100000SeriesWith100Samples/100of10000000-16 1.14k ± 0% 1.14k ± 0% ~ (p=0.622 n=50+50) BucketSeries/100000SeriesWith100Samples/10000000of10000000-16 1.10M ± 0% 1.00M ± 0% -9.04% (p=0.000 n=49+50) BucketSeries/1SeriesWith10000000Samples/1of10000000-16 200 ± 0% 202 ± 0% +1.00% (p=0.000 n=48+50) BucketSeries/1SeriesWith10000000Samples/100of10000000-16 200 ± 0% 202 ± 0% +1.00% (p=0.000 n=49+50) BucketSeries/1SeriesWith10000000Samples/10000000of10000000-16 167k ± 0% 167k ± 0% +0.00% (p=0.000 n=50+50) ``` CPU usage is +/- the same when you take the variance into account, the allocated memory is the real difference and, as expected, the benchmarks show the most improvement on the cases with lots of callers. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
cbd90e9
to
a459162
Compare
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
23c18e2
to
cda699e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing! Awesome work!
I wonder how hard it's going to be to migrate to https://github.com/planetscale/vtprotobuf. It already supports pooling for responses, right? This is more or less the same thing we achieve with gogo?
|
||
// The following were copied/pasted from gogoprotobuf generated code with changes | ||
// to make it work with sync.Pool / []byte slice. | ||
func (m *SeriesResponse) Marshal() (dAtA []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a copy-pasta from generated code but this makes my eyes bleed
func (m *SeriesResponse) Marshal() (dAtA []byte, err error) { | |
func (m *SeriesResponse) Marshal() (data []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I like the attempt of hiding complexity but there are a couple of worrying parts in terms of readability and maintainability of this code:
**[]byte
is something we can definitely avoid. I guess we do that to have extra info if this was pooled or not but this goes away if we fix next problem:- We have mixed responsibility on who is suppose to get bytes from pool. 1 -> user of
SeriesResponse
has to get before and put after send. 2 -> SeriesResponse itself has to use pool to resize the object.
I can see the reason behind it. But wonder if we can simplify this, by:
- Making sure pool is only used by SeriesResponse. We "get" buf on Marshal and resize if needed
- Add "Close" method to series response which puts off the buf when we say "it is closed"?
This way the whole thing is encapsulated: Pool is used only here and we never pass any outside bytes and have this weird double pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it the way it is right now is because ideally the same buffer would be shared between marshaling different responses:
- Get a buffer from sync.Pool
- Pass it to Marshal() via SeriesResponse which can make it larger if necessary
- The original caller i.e. Series() function puts it back - it doesn't know if the underlying slice has been changed
This seems completely natural to me because multiple SeriesResponse structs are for different responses, only the buffer & pool is shared for the actual serialization.
If I understood you correctly, you are proposing moving all of the logic into SeriesResponse itself. Then, we would have something like:
var resp storepb.SeriesResponse
..
resp = storepb.NewSeriesResponse(newSeries, pool)
And then in a loop, we would have to do:
resp.Result = newSeries
As a result, a connection between the SeriesResponse members and a higher-level function, Series(), would appear. We could add a function to "hide" it but it would be very shallow. I'm not sure if this 2nd option is really better 🤔
Given that we will remove this code in a few months hopefully it's probably not worth spending so much energy on minute details such as this 😄
If you really think this should be the way then I can make this change but because you've approved this I assume that such a minor possible change isn't a blocker to merge this PR
Yep, but only "on the other side". It generates more or less the same methods for marshaling that we could reuse here once we move to it: So, the "glue" code in this PR will still remain similar once we move to it AFAICT. For unmarshaling it is much nicer: https://github.com/planetscale/vtprotobuf/blob/5a02622d1e2a726cf08df370eccdca6844f11e70/features/pool/pool.go#L86-L95. It's going to be lots of work to move to it IMHO as evidenced by the Vitess PR vitessio/vitess#8075 😄 |
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I have two bigger suggestions, otherwise LGTM!
I need to agree on top of @kakkoyun - https://github.com/planetscale/vtprotobuf can do pooling too - even in exactly similar way as my suggestion 🙈
Still, the path for gogo proto removal is far, so we could merge this in. But this code will removed when we switch to new vtprotobuf yes
pkg/store/bucket.go
Outdated
@@ -396,6 +399,14 @@ func NewBucketStore( | |||
enableCompatibilityLabel: enableCompatibilityLabel, | |||
postingOffsetsInMemSampling: postingOffsetsInMemSampling, | |||
enableSeriesResponseHints: enableSeriesResponseHints, | |||
respPool: sync.Pool{ | |||
New: func() interface{} { | |||
// TODO(GiedriusS): we could calibrate the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO(GiedriusS): we could calibrate the default | |
// TODO(GiedriusS): We could calibrate the default |
|
||
// The following were copied/pasted from gogoprotobuf generated code with changes | ||
// to make it work with sync.Pool / []byte slice. | ||
func (m *SeriesResponse) Marshal() (dAtA []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I like the attempt of hiding complexity but there are a couple of worrying parts in terms of readability and maintainability of this code:
**[]byte
is something we can definitely avoid. I guess we do that to have extra info if this was pooled or not but this goes away if we fix next problem:- We have mixed responsibility on who is suppose to get bytes from pool. 1 -> user of
SeriesResponse
has to get before and put after send. 2 -> SeriesResponse itself has to use pool to resize the object.
I can see the reason behind it. But wonder if we can simplify this, by:
- Making sure pool is only used by SeriesResponse. We "get" buf on Marshal and resize if needed
- Add "Close" method to series response which puts off the buf when we say "it is closed"?
This way the whole thing is encapsulated: Pool is used only here and we never pass any outside bytes and have this weird double pointer
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
ab8e34a
to
837e270
Compare
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
837e270
to
e3b2055
Compare
@bwplotka I have made changes according to your suggestions:
Benchmarks are even more impressive now:
Please take a look :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go!!! 💪🏽
Thanks for this!
@@ -1092,8 +1096,17 @@ func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_Serie | |||
s.metrics.seriesGetAllDuration.Observe(stats.getAllDuration.Seconds()) | |||
s.metrics.seriesBlocksQueried.Observe(float64(stats.blocksQueried)) | |||
} | |||
|
|||
var resp *storepb.SeriesResponse | |||
defer func(r **storepb.SeriesResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👁️ 🩸
} | ||
} else { | ||
if cap(*m.respBuf) < size { | ||
if m.respPool != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we have to put too small one back -if it was too small, most likely it will be too small for other responses too.
Not a blocker, it should work both ways, just guessing here a bit.
…hanos-io#4535)" This reverts commit 7a8d189. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
* Revert "store: fix marshaling with sync.Pool (#4593)" This reverts commit 8b4c3c9. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Revert "[v2] store: reuse buffers for serializing Series() responses (#4535)" This reverts commit 7a8d189. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
v1 can be found here: #4512
Compared with v1:
Uses less CPU because Size() is called only once just as before;
Uses
protoc-go-inject-field
to inject extra fields into SeriesResponsewhich are used for marshaling the responses;
Uses []byte instead of bytes.Buffer because bytes.Buffer allocates 2x
amount of memory and in our cases the responses are more or less of the
same size so less memory is wasted;
I think this version is more maintainable since the details of the
allocation are hidden from the Series() function itself, it all happens
in the Marshal() function.
Benchmarks:
CPU usage is +/- the same when you take the variance into account, the
allocated memory is the real difference and, as expected, the benchmarks
show the most improvement on the cases with lots of callers.
I have added a
$(MAKE) format
step after$(MAKE) proto
so that the*.pb.go
would havea consistent style and CI would pass.
Signed-off-by: Giedrius Statkevičius giedrius.statkevicius@vinted.com