Store: smaller block size for snappy decoding #6463

MichaHoffmann · 2023-06-21T19:01:55Z

wants to address #6434

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

use a smaller block size for snappy readers when stream decoding postings
duplicates some code from extsnappy to ensure the contract that postings were encoded with a valid block size

Verification

passes previous unittests

Benchmarks

pkg: github.com/thanos-io/thanos/pkg/store
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                                                         │  main.bench   │              pr.bench               │
                                                         │    sec/op     │    sec/op     vs base               │
PostingsEncodingDecoding/10000/snappyStreamed/encode-8      900.7µ ± ∞ ¹   905.5µ ± ∞ ¹        ~ (p=1.000 n=5)
PostingsEncodingDecoding/10000/snappyStreamed/decode-8     17.804µ ± ∞ ¹   1.485µ ± ∞ ¹  -91.66% (p=0.008 n=5)
PostingsEncodingDecoding/100000/snappyStreamed/encode-8     7.514m ± ∞ ¹   8.899m ± ∞ ¹  +18.43% (p=0.008 n=5)
PostingsEncodingDecoding/100000/snappyStreamed/decode-8    17.039µ ± ∞ ¹   1.377µ ± ∞ ¹  -91.92% (p=0.008 n=5)
PostingsEncodingDecoding/1000000/snappyStreamed/encode-8    74.93m ± ∞ ¹   86.94m ± ∞ ¹  +16.03% (p=0.008 n=5)
PostingsEncodingDecoding/1000000/snappyStreamed/decode-8   18.290µ ± ∞ ¹   1.280µ ± ∞ ¹  -93.00% (p=0.008 n=5)
geomean                                                     375.7µ         110.6µ        -70.56%
¹ need >= 6 samples for confidence interval at level 0.95

                                                         │   main.bench   │               pr.bench               │
                                                         │      B/op      │     B/op       vs base               │
PostingsEncodingDecoding/10000/snappyStreamed/encode-8      16.51Ki ± ∞ ¹   17.50Ki ± ∞ ¹   +6.00% (p=0.008 n=5)
PostingsEncodingDecoding/10000/snappyStreamed/decode-8     73.364Ki ± ∞ ¹   2.484Ki ± ∞ ¹  -96.61% (p=0.008 n=5)
PostingsEncodingDecoding/100000/snappyStreamed/encode-8     147.9Ki ± ∞ ¹   152.8Ki ± ∞ ¹   +3.33% (p=0.008 n=5)
PostingsEncodingDecoding/100000/snappyStreamed/decode-8    73.364Ki ± ∞ ¹   2.484Ki ± ∞ ¹  -96.61% (p=0.008 n=5)
PostingsEncodingDecoding/1000000/snappyStreamed/encode-8    1.424Mi ± ∞ ¹   1.477Mi ± ∞ ¹   +3.73% (p=0.008 n=5)
PostingsEncodingDecoding/1000000/snappyStreamed/decode-8   73.364Ki ± ∞ ¹   2.484Ki ± ∞ ¹  -96.61% (p=0.008 n=5)
geomean                                                     105.8Ki         19.90Ki        -81.20%
¹ need >= 6 samples for confidence interval at level 0.95

                                                         │ main.bench  │                pr.bench                 │
                                                         │  allocs/op  │  allocs/op    vs base                   │
PostingsEncodingDecoding/10000/snappyStreamed/encode-8     10.00 ± ∞ ¹    32.00 ± ∞ ¹   +220.00% (p=0.008 n=5)
PostingsEncodingDecoding/10000/snappyStreamed/decode-8     5.000 ± ∞ ¹    5.000 ± ∞ ¹          ~ (p=1.000 n=5) ²
PostingsEncodingDecoding/100000/snappyStreamed/encode-8    25.00 ± ∞ ¹   193.00 ± ∞ ¹   +672.00% (p=0.008 n=5)
PostingsEncodingDecoding/100000/snappyStreamed/decode-8    5.000 ± ∞ ¹    5.000 ± ∞ ¹          ~ (p=1.000 n=5) ²
PostingsEncodingDecoding/1000000/snappyStreamed/encode-8   125.0 ± ∞ ¹   1806.0 ± ∞ ¹  +1344.80% (p=0.008 n=5)
PostingsEncodingDecoding/1000000/snappyStreamed/decode-8   5.000 ± ∞ ¹    5.000 ± ∞ ¹          ~ (p=1.000 n=5) ²
geomean                                                    12.55          33.42         +166.34%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg/store/postings_codec.go

yeya24 · 2023-06-21T19:35:10Z

Tests failure seems related.

MichaHoffmann · 2023-06-21T20:31:23Z

Tests failure seems related.

fixed and adapted benchmark results!

GiedriusS · 2023-06-22T07:52:32Z

pkg/store/postings_codec.go

+var (
+	readersPool = sync.Pool{
+		New: func() interface{} {
+			return s2.NewReader(nil, s2.ReaderMaxBlockSize(4<<10))


I thought the other day that perhaps we could not specify a New function and for gRPC we could use the default block size whereas for postings we could use the average length of the last 10 inputs, for example. What do you think?

The problem is that reader max block size needs to be larger than writer block size; so if the average goes down we cannot decode anymore with that approach i think ( i might be wrong )? Using the minimum 4kb feels more traceable to me somehow.

This change still feels backwards incompatible since i could have cached postings somewhere that were written with larger 64kb block size which might be problematic.

I think the s2 reader ensures that the buffer is enlarged if needed https://github.com/klauspost/compress/blob/master/s2/reader.go#L152 so the only downside AFAICT is that there could be 1 extra allocation. The only complication I see here is that ideally, there should be a way to access the capacity of the internal buffer so that we could pre-allocate a more appropriately sized buffer.

klauspost/compress#832 trying to push something forward here to allow us better estimate how big the buffer should be

Ok, in the latest change i added some capacity tracking to the reader pool and use the running average of the capacity of the 10 last returned readers as an estimate for new readers.

Had to revert my changes again, if we dont bound max block sizes the readers will just grow until 64kb again, which would suffer the same issues eventually. I think we can only bound the block sizes.

Or is it because the benchmarks load up a lot of postings whereas the problematic case is about small postings lists?

That might be the issue with the benchmarks but we dont limit the block size it might easily happen in someones prod too; right?

I think it doesn't matter because the other choice is allocating []byte for the whole decoded response which in this case will be much bigger than https://github.com/google/snappy/blob/main/framing_format.txt#L106. If suddenly smaller postings come in then it's also OK to reuse the readers with bigger reader buffers because they will be freed eventually. The block size is capped in snappy.

pkg/store/postings_codec.go

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

MichaHoffmann · 2023-06-27T11:41:50Z

Better left to the experts: https://github.com/thanos-io/thanos/pull/6475/files

pull-request-size bot added the size/M label Jun 21, 2023

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch from 75bd55e to 1f6643e Compare June 21, 2023 19:02

MichaHoffmann requested a review from GiedriusS June 21, 2023 19:02

sonatype-lift bot reviewed Jun 21, 2023

View reviewed changes

pkg/store/postings_codec.go Outdated Show resolved Hide resolved

sonatype-lift bot reviewed Jun 21, 2023

View reviewed changes

pkg/store/postings_codec.go Outdated Show resolved Hide resolved

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch from 1f6643e to 9c0476f Compare June 21, 2023 19:10

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch from 9c0476f to 864c9ad Compare June 21, 2023 20:30

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch from 864c9ad to b3ce522 Compare June 22, 2023 06:01

MichaHoffmann changed the title ~~Store: smaller block size for snappy decoding postings~~ Store: smaller block size for snappy decoding Jun 22, 2023

GiedriusS reviewed Jun 22, 2023

View reviewed changes

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch from b3ce522 to e3ddddc Compare June 22, 2023 10:27

klauspost reviewed Jun 22, 2023

View reviewed changes

pkg/store/postings_codec.go Outdated Show resolved Hide resolved

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch 3 times, most recently from 7626751 to 5a077c6 Compare June 22, 2023 17:15

pull-request-size bot added size/L and removed size/M labels Jun 22, 2023

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch 4 times, most recently from 734d145 to 4a93844 Compare June 22, 2023 17:48

MichaHoffmann requested a review from GiedriusS June 22, 2023 17:51

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch 2 times, most recently from 0fcdfb9 to d02390a Compare June 22, 2023 17:57

sonatype-lift bot reviewed Jun 22, 2023

View reviewed changes

pkg/store/postings_codec.go Outdated Show resolved Hide resolved

sonatype-lift bot reviewed Jun 22, 2023

View reviewed changes

pkg/store/postings_codec.go Outdated Show resolved Hide resolved

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch from d02390a to 9003bae Compare June 22, 2023 18:20

Store: smaller block size for snappy decoding

251f15d

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

MichaHoffmann force-pushed the mhoffm-smaller-blocksize-for-streamed-postings branch from 9003bae to 251f15d Compare June 22, 2023 18:58

pull-request-size bot added size/M and removed size/L labels Jun 22, 2023

MichaHoffmann closed this Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store: smaller block size for snappy decoding #6463

Store: smaller block size for snappy decoding #6463

MichaHoffmann commented Jun 21, 2023 •

edited

Loading

yeya24 commented Jun 21, 2023

MichaHoffmann commented Jun 21, 2023

GiedriusS Jun 22, 2023

mhoffm-aiven Jun 22, 2023

mhoffm-aiven Jun 22, 2023

GiedriusS Jun 22, 2023 •

edited

Loading

GiedriusS Jun 22, 2023

MichaHoffmann Jun 22, 2023

MichaHoffmann Jun 22, 2023

GiedriusS Jun 23, 2023

mhoffm-aiven Jun 23, 2023

GiedriusS Jun 23, 2023 •

edited

Loading

MichaHoffmann commented Jun 27, 2023

Store: smaller block size for snappy decoding #6463

Store: smaller block size for snappy decoding #6463

Conversation

MichaHoffmann commented Jun 21, 2023 • edited Loading

Changes

Verification

Benchmarks

yeya24 commented Jun 21, 2023

MichaHoffmann commented Jun 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GiedriusS Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GiedriusS Jun 23, 2023 • edited Loading

Choose a reason for hiding this comment

MichaHoffmann commented Jun 27, 2023

MichaHoffmann commented Jun 21, 2023 •

edited

Loading

GiedriusS Jun 22, 2023 •

edited

Loading

GiedriusS Jun 23, 2023 •

edited

Loading