Streaming store-gateway Series() #3355

dimitarvdimitrov · 2022-11-02T10:12:35Z

~~This is a POC and is not meant to be merged yet.~~

Overview

Instead of loading all series (label sets and chunks) in memory before
responding to a Series() RPC, we can batch them and load X at a time.
This gives more predictability on the memory utilization of the
store-gateway. The tradeoff is having to do one trip to the index cache
and the bucket for each batch, which will affect overall latency of requests.

How to use

This change disables batch series loading by default and adds two flags
to control this - whether it's enabled via
-blocks-storage.bucket-store.batched-series-loading=false and how many
series go into each batch via
-blocks-storage.bucket-store.batch-series-size=65536.

Limiting

Ideally we want ot put a limit on the number of bytes that we want to
load in each batch instead of the number of series. For now limiting
the number of series should still give us some resilience against "big"
requests, while still being vulnerable to a flurry of many requests.

Testing

I've changed all tests within pkg/storegateway to use this new loading
strategy. This should give confidence that it is producing correct
results. Further work should improve testing around resource utilization
(i.e. batches are indeed freed one after the other)
and should test both batched and non-batched strategies.

This commit has TODOs, which should be addressed before merging this.

Signed-off-by: Dimitar Dimitrov dimitar.dimitrov@grafana.com

pracucci

Thanks for working on this! I didn't review every single line of code for correctness, cause I want to focus on the overall design first.

I'm leaving here a feedback reported on Slack too. Let's keep talking on Slack cause discussing through comments here is harder and slower.

My main feedback is that with this design the query will underperform. Let's say we query 100 blocks. We preload 1 batch from 1 block at the time. Everything is serialized. I would like to keep having parallelization but be in control of it (e.g. max memory allocated).

pkg/storegateway/batch_series.go

pkg/storegateway/bucket.go

dimitarvdimitrov · 2022-11-07T19:41:18Z

i gave this a try and memory does indeed flatten. In this graph I did the change to zone-a of the backend component of a read-write deployment mode cluster at around 19:20. Below is the average working set memory in bytes per pod in each zone.

dimitarvdimitrov · 2022-11-09T17:40:46Z

I went back to this to see how latency looked like during that period. We expected to see increased latency. But that wasn't the case. The p99 for the Series() rpc of zone-a during that time also lost its spikes in comparison with the other two zones.

For context, the batch size was set to 10K, and my queries were touching around 550K series over 12h and were sharded 32 ways.

This was a very ad-hoc experiment, and it's probably not as good as it seems. The plan for improvements is to

add concurrency to this approach to further reduce latency and
control concurrency based on the memory usage - we can have many small requests in parallel or few big ones
move to global batching (currently batching is per block per request)

I'm now trying to make the code a bit easier to undergo the 3 changes above - defining more interfaces, breaking up the index and chunk readers

dimitarvdimitrov · 2022-11-09T21:39:02Z

593d0a4 is my proposal for the interfaces. This commit should help you imagine where the whole PR would go with concurrency, bytes-limits, elegant cleanups and theoretically offloading to disk. It shows the different components (via interfaces) and their dependencies (via factories).

The interfaces are sometimes intentionally brief and omit things like errors handling, contexts, loggers, metrics, stats. I believe these can be added later without disruption.

pkg/storage/tsdb/config.go

pkg/storegateway/batch_series_parallel.go

pracucci

You're doing an amazing work! I'm super excited about it. The design is way more clear than the initial one. I haven't reviewed every single bit (PR too big, we'll need to split into smaller PRs to get proper reviews, I will help you doing it) but I looked at the overall design which looks pretty solid to me.

pkg/storage/tsdb/config.go

pkg/storegateway/bucket.go

pkg/storegateway/batch_series.go

pkg/storegateway/batch_series_test.go

replay · 2022-11-29T14:48:48Z

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

This is a POC and is not meant to be merged yet. Overview Instead of loading all series (label sets and chunks) in memory before responding to a Series() RPC, we can batch them and load X at a time. This gives more predictability on the memory utilization of the store-gateway. The tradeoff is having to do one trip to the index cache and the bucket for each batch, which will affect overall latency of requests. How to use This change disables batch series loading by default and adds two flags to control this - whether it's enabled via `-blocks-storage.bucket-store.batched-series-loading=false` and how many series go into each batch via `-blocks-storage.bucket-store.batch-series-size=65536`. Limiting Ideally we want ot put a limit on the number of bytes that we want to load in each batch instead of the number of series. For now limiting the number of series should still give us some resilience against "big" requests, while still being vulnerable to a flurry of many requests. Testing I've changed all tests within pkg/storegateway to use this new loading strategy. This should give confidence that it is producing correct results. Further work should improve testing around resource utilization (i.e. batches are indeed freed one after the other) and should test both batched and non-batched strategies. This commit has TODOs, which should be addressed before merging this. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> Fix which context we use for openBlockSeriesChunkRefsSetsIterator Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Streaming store-gateway: Split up blockSeriesChunkRefsSetIterator (#3641) Merge main into dimitar/store-gateway-async-series (#3642) Streaming store-gateway: Move seriesHasher (#3643) Move test limiter Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> streaming store-gateway: move and rename loadingBatchSet (#3650) Renames `loadingBatchSet` to `loadingSeriesChunksSetIterator` and moves it to `series_chunks.go` Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> streaming store-gateway: add preloading for series without chunks (#3649) * streaming store-gateway: add preloading for series without chunks This PR * adds preloading to the seriesSet without chunks * changes the implementation of `preloadingSeriesChunkSetIterator` to be a generic one, so we can reuse it for preloading `seriesChunkRefsSet`s too. * moves some tests of `preloadingSeriesChunkSetIterator` from `batch_series_test.go` to `series_chunks_test.go` * moves `newSeriesSetWithChunks` to `series_chunks.go` Question/note to reviewers: should this iterator be in a new file? I think putting it in preload.go and tests in preload_test.go sounds ok since it now applies to both series_chunks.go and series_refs.go. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> streaming store-gateway: Polish batchSetsForBlocks (#3651) * Polish batchSetsForBlocks * rename to batchedSeriesSetForBlocks * remove unused cleanups * move to bucket.go * some formatting * also remove cleanups from synchronousSeriesSet Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Rename batchedSeriesSetForBlocks to streamingSeriesSetForBlocks Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Merge main into dimitar/store-gateway-async-series (#3655) Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

…3656) Use a simpler test setup. Locally this reduces the time it takes to run the test from 8.417s to 0.351s This also allows to do better assertions on the chunk refs that we load the from the storage Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* store-gateway: bucketStore tests with and without streaming This PR also removes the streaming implementation from * `TestSeries_BlockWithMultipleChunks`, which I think is ok because we already have unit tests for multiple chunks in our units * `TestLabelNamesAndValuesHints` label values aren't using streaming still anyway * `TestSeries_ErrorUnmarshallingRequestHints` this should be independent of how series are fetched (streaming or not) pkg/storegateway now takes 109.121s to run on my machine compared to the 94.677s it took before Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Reduce batch size for streaming e2e tests Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Fix test case names Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Run benchmarks with 1K and 10K series Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Move cleanup operations Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Move setupStoreForHintsTest() inside the test run in TestSeries_RequestAndResponseHints Signed-off-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov · 2022-12-07T11:51:20Z

pkg/storegateway/bucket.go

+	}
+
+	begin := time.Now()
+	err := g.Wait()


This means that we effectively block until we fetch all postings for all blocks before starting to load series and merging them. I think we can do better and fetch them asynchronously.

agreed offline to do this change after merging this PR

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pracucci

Good job, LGTM! I reviewed all this code already, so no big surprises to me. I noticed few last things I would suggest to take a look at.

I think this work deserve a CHANGELOG entry. Remember to add all PR numbers of previously merged PRs too.

Can you also double check we haven't left any nolint:unused?

pkg/storage/tsdb/config.go

pkg/storegateway/bucket_chunk_reader.go

pkg/storegateway/series_chunks_test.go

pracucci · 2022-12-07T14:51:53Z

pkg/storegateway/bucket.go

 	var (
-		resMtx   sync.Mutex
-		res      []storepb.SeriesSet
-		cleanups []func()


Note to reviewers: removed because was not used.

pkg/storegateway/bucket.go

pracucci · 2022-12-07T14:55:50Z

pkg/storegateway/bucket.go

+		if err == nil {
+			return
+		}
+		code := codes.Aborted


The main difference of moving this error wrapping logic here is that now it's applied to all errors returned, while previously it was only applied to some of them. Problem is that I don't know what's the correct one, because I don't remember what is this used for. Do you? Can you check how the error is handled in the querier (where this function is called via gRPC)?

i think it kind of depends. i had to do this because some of the errors which were previously sruficing here

mimir/pkg/storegateway/bucket.go

Lines 1023 to 1029 in ad526f8

if err != nil {

code := codes.Aborted

if s, ok := status.FromError(errors.Cause(err)); ok {

code = s.Code()

}

return nil, cleanup, status.Error(code, err.Error())

}

are now surfacing here

mimir/pkg/storegateway/bucket.go

Lines 951 to 954 in 78ed13f

if seriesSet.Err() != nil {

err = errors.Wrap(seriesSet.Err(), "expand series set")

return

}

so with the collective changes we are removing this case where we return Unknown

mimir/pkg/storegateway/bucket.go

Lines 910 to 913 in ad526f8

if set.Err() != nil {

err = status.Error(codes.Unknown, errors.Wrap(set.Err(), "expand series set").Error())

return

}

i couldn't find a place which looks for the Unknown code.

to answer your question on the querier. In the querier we just ignore the errors unless it's EOF

mimir/pkg/querier/blocks_store_queryable.go

Lines 737 to 744 in 3592c25

resp, err := stream.Recv()

if errors.Is(err, io.EOF) {

break

}

if err != nil {

level.Warn(spanLog).Log("msg", "failed to receive series", "remote", c.RemoteAddress(), "err", err)

return nil

}

Thanks for the details. I also can't see a reason why this change could break any error handling, given the querier is not checking it and io.EOF comes from the client, not the server.

pkg/storegateway/bucket.go

pracucci · 2022-12-07T14:58:54Z

pkg/storegateway/bucket.go

+		var readers *chunkReaders
+		if !req.SkipChunks {
+			readers = newChunkReaders(chunkr)
+		}


[nit] What if, for simplicity, we always call newChunkReaders() regardless req.SkipChunks?

pracucci · 2022-12-07T15:02:40Z

pkg/storegateway/bucket.go

-		mergeDuration := time.Since(begin)
-		mergeStats.mergeDuration += mergeDuration
-		s.metrics.seriesMergeDuration.Observe(mergeDuration.Seconds())


Has this changed intentionally? Logic is the same, but I've the feeling this has changed unintentionally.

i was changed and then changed back.. and in the meantime it got a new form. i will revert it

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pkg/storegateway/series_refs.go

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pracucci

3.2.1.... LGTM!

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

The store-gateway can now stream results back to the querier instead of buffering them. This is expected to greatly reduce peak memory consumption while keeping latency the same. You can enable this feature by setting `-blocks-storage.bucket-store.batch-series-size` to a value in the high thousands (5000-10000). This is still an experimental feature and is subject to a changing API and instability. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

dimitarvdimitrov requested review from osg-grafana and a team as code owners November 2, 2022 10:12

dimitarvdimitrov marked this pull request as draft November 2, 2022 10:12

dimitarvdimitrov mentioned this pull request Nov 2, 2022

Store-gateway: series streaming #3348

Closed

38 tasks

dimitarvdimitrov force-pushed the dimitar/store-gateway-async-series branch from 6e8f461 to e763037 Compare November 2, 2022 10:24

pracucci self-requested a review November 7, 2022 09:32

pracucci reviewed Nov 7, 2022

View reviewed changes

pkg/storegateway/batch_series.go Outdated Show resolved Hide resolved

pkg/storegateway/batch_series.go Outdated Show resolved Hide resolved

pracucci reviewed Nov 7, 2022

View reviewed changes

pkg/storegateway/bucket.go Outdated Show resolved Hide resolved

pracucci mentioned this pull request Nov 7, 2022

Refactoring: do not store per-query data in bucketIndexReader #3396

Merged

3 tasks

dimitarvdimitrov force-pushed the dimitar/store-gateway-async-series branch from 8f90712 to 241e66e Compare November 7, 2022 14:14

56quarters reviewed Nov 10, 2022

View reviewed changes

pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved

pracucci reviewed Nov 10, 2022

View reviewed changes

dimitarvdimitrov mentioned this pull request Nov 23, 2022

Store-gateway: move chunk bytes pooling outside of chunk reader #3506

Merged

dimitarvdimitrov force-pushed the dimitar/store-gateway-async-series branch 2 times, most recently from b5d3ed5 to 4efee8a Compare November 23, 2022 22:57

pracucci reviewed Nov 24, 2022

View reviewed changes

dimitarvdimitrov commented Nov 28, 2022

View reviewed changes

pkg/storegateway/batch_series_test.go Outdated Show resolved Hide resolved

This was referenced Nov 28, 2022

Fix preloadingBatchSet race condition and add unit tests #3545

Merged

Improve TestMergedBatchSet #3547

Merged

dimitarvdimitrov force-pushed the dimitar/store-gateway-async-series branch from 25084f2 to 7091e1e Compare November 29, 2022 13:52

replay added the release/notified-changelog-cut label Nov 29, 2022

pracucci mentioned this pull request Nov 29, 2022

Rename loaded structs #3562

Merged

3 tasks

pracucci force-pushed the dimitar/store-gateway-async-series branch 3 times, most recently from 5c980ac to 3aada56 Compare December 1, 2022 15:20

dimitarvdimitrov force-pushed the dimitar/store-gateway-async-series branch from 2c85113 to 625521c Compare December 1, 2022 16:19

dimitarvdimitrov force-pushed the dimitar/store-gateway-async-series branch from 402d122 to 98063ee Compare December 6, 2022 11:42

dimitarvdimitrov and others added 5 commits December 6, 2022 14:21

Rename variable

7808c1a

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Explain why concurrent access is ok

54d2eca

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Reword CLI option

a66873d

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov commented Dec 7, 2022

View reviewed changes

Revert useless style refactor

d470053

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pracucci reviewed Dec 7, 2022

View reviewed changes

dimitarvdimitrov added 5 commits December 7, 2022 16:25

Add an iterator which measures durations of Next calls (#3661)

ad6f9be

Mention experimental feature

e4e1568

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Rename and document bucketChunkReaders

5bcee11

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Rename newFakeChunkReaderWithSeries

4ea7648

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Remove explicit error handling

78ed13f

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pracucci reviewed Dec 7, 2022

View reviewed changes

pkg/storegateway/series_refs.go Outdated Show resolved Hide resolved

dimitarvdimitrov added 4 commits December 7, 2022 16:44

Revert stylistic change

311eaa9

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Remove unused logger

cd9e084

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Add changelog entry

0b7da56

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Fix godoc

10ab3bc

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

pracucci approved these changes Dec 7, 2022

View reviewed changes

pracucci marked this pull request as ready for review December 7, 2022 16:20

pracucci requested a review from a team as a code owner December 7, 2022 16:20

pracucci and others added 2 commits December 7, 2022 17:29

make doc

44b9310

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fix whitespace

40c7d9c

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov merged commit c44a07d into main Dec 7, 2022

dimitarvdimitrov deleted the dimitar/store-gateway-async-series branch December 7, 2022 16:54

dimitarvdimitrov mentioned this pull request Dec 12, 2022

store-gateway: fix expanded postings duration histogram #3697

Merged

dimitarvdimitrov mentioned this pull request Mar 27, 2023

store-gateway: merged series from different blocks concurrently #4596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming store-gateway Series() #3355

Streaming store-gateway Series() #3355

dimitarvdimitrov commented Nov 2, 2022 •

edited

Loading

pracucci left a comment

dimitarvdimitrov commented Nov 7, 2022 •

edited

Loading

dimitarvdimitrov commented Nov 9, 2022

dimitarvdimitrov commented Nov 9, 2022 •

edited

Loading

pracucci left a comment

replay commented Nov 29, 2022

dimitarvdimitrov Dec 7, 2022

dimitarvdimitrov Dec 7, 2022

pracucci left a comment

pracucci Dec 7, 2022

pracucci Dec 7, 2022

dimitarvdimitrov Dec 7, 2022

pracucci Dec 7, 2022

pracucci Dec 7, 2022

pracucci Dec 7, 2022

dimitarvdimitrov Dec 7, 2022

pracucci left a comment

	if err != nil {
	code := codes.Aborted
	if s, ok := status.FromError(errors.Cause(err)); ok {
	code = s.Code()
	}
	return nil, cleanup, status.Error(code, err.Error())
	}

	if seriesSet.Err() != nil {
	err = errors.Wrap(seriesSet.Err(), "expand series set")
	return
	}

	if set.Err() != nil {
	err = status.Error(codes.Unknown, errors.Wrap(set.Err(), "expand series set").Error())
	return
	}

	resp, err := stream.Recv()
	if errors.Is(err, io.EOF) {
	break
	}
	if err != nil {
	level.Warn(spanLog).Log("msg", "failed to receive series", "remote", c.RemoteAddress(), "err", err)
	return nil
	}

Streaming store-gateway Series() #3355

Streaming store-gateway Series() #3355

Conversation

dimitarvdimitrov commented Nov 2, 2022 • edited Loading

Overview

How to use

Limiting

Testing

pracucci left a comment

Choose a reason for hiding this comment

dimitarvdimitrov commented Nov 7, 2022 • edited Loading

dimitarvdimitrov commented Nov 9, 2022

dimitarvdimitrov commented Nov 9, 2022 • edited Loading

pracucci left a comment

Choose a reason for hiding this comment

replay commented Nov 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pracucci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pracucci left a comment

Choose a reason for hiding this comment

dimitarvdimitrov commented Nov 2, 2022 •

edited

Loading

dimitarvdimitrov commented Nov 7, 2022 •

edited

Loading

dimitarvdimitrov commented Nov 9, 2022 •

edited

Loading