Add store-gateway components #701

cyriltovena · 2023-05-16T07:46:09Z

This PR adds a store-gateway component similar to what exists in Mimir. https://github.com/grafana/mimir/blob/main/pkg/storegateway/gateway.go

This is still very early, so far I've decided to depends on Mimir to benefits from the shuffle sharding and replication strategy.

I'm not planning to provide any block persistence for now only in-memory. In the future we should use memcached for symbols and tsdb index. The store gateway open block for each tenant within 24h - (now -2h).

Since we don't have a compactor a single gateway can download data duplicated by the replication factor of ingesters. This means we need to deduplicate blocks data while streaming now. To speed this up I've implemented a BufferedIterator that helps merging multiple iterator in parallel. In the future we should probably only download compacted blocks.

For now we always deduplicate even in the ingester code even though no duplication happens there, the PR was big enough and I don't think it's a big concern right now.

The store-gateway also replicate data for high availability which means more duplicates are sent to queriers. We should consider sending block ULID and request querier to select the one to consider.

To keep thing simple for the query path right now, we split the queries using the queryStoreAfter configuration. Ultimately if we dedupe by blocks we should be able to remove that configuration and select blocks that needs to be querier from the querier directly.

Ultimately I think we should have the store-gateway only serve blockID, index and metadata cc @kolesnikovae

Moves forward #717

Block Page on store-gateway.

Trace example:

Remaining items left for later:

This is a POC on how to gather block metadata quickly from bucket by listing by prefix. This will also cache the metadata in a local memcache if available. To run it against the ops bucket you can: ``` $ docker run --name phlare-memcache -p 11211:11211 -d memcached $ go run ./pkg/querier/bucket/ level=info msg="created remote cache" level=info msg="block prefixes to query" prefixes="[]string{\"01GXR\", \"01GXS\", \"01GXT\", \"01GXV\", \"01GXW\", \"01GXX\", \"01GXY\", \"01GXZ\", \"01GY0\", \"01GY1\", \"01GY2\", \"01GY3\", \"01GY4\", \"01GY5\", \"01GY6\", \"01GY7\", \"01GY8\", \"01GY9\", \"01GYA\"}" level=info msg="found block" prefix=01GXR block_count=95 sample_count=19126470198 series_count=355343 level=info msg="found block" prefix=01GXS block_count=81 sample_count=14836423557 series_count=174551 level=info msg="found block" prefix=01GXT block_count=82 sample_count=17183446330 series_count=248154 level=info msg="found block" prefix=01GXV block_count=80 sample_count=16174008761 series_count=237720 level=info msg="found block" prefix=01GXW block_count=81 sample_count=15844525155 series_count=230524 level=info msg="found block" prefix=01GXX block_count=82 sample_count=17148220671 series_count=233622 level=info msg="found block" prefix=01GXY block_count=81 sample_count=16076945767 series_count=192073 level=info msg="found block" prefix=01GXZ block_count=82 sample_count=16933257482 series_count=250945 level=info msg="found block" prefix=01GY0 block_count=84 sample_count=16886889138 series_count=246416 level=info msg="found block" prefix=01GY1 block_count=101 sample_count=18281724672 series_count=197294 level=info msg="found block" prefix=01GY2 block_count=81 sample_count=14864911834 series_count=136578 level=info msg="found block" prefix=01GY3 block_count=82 sample_count=14822618559 series_count=135249 level=info msg="found block" prefix=01GY4 block_count=82 sample_count=14315337998 series_count=141371 level=info msg="found block" prefix=01GY5 block_count=81 sample_count=15112318609 series_count=159509 level=info msg="found block" prefix=01GY6 block_count=82 sample_count=15164174250 series_count=184329 level=info msg="found block" prefix=01GY7 block_count=81 sample_count=16876956095 series_count=237297 level=info msg="found block" prefix=01GY8 block_count=82 sample_count=16662724596 series_count=223123 level=info msg="found block" prefix=01GY9 block_count=82 sample_count=17089011818 series_count=221571 level=info msg="found block" prefix=01GYA block_count=61 sample_count=15023932010 series_count=201735 ```

…ore-gateway-component

…teway-component

pkg/phlaredb/block_querier.go

pkg/querier/querier.go

pkg/querier/ingester_querier.go

kolesnikovae

LGTM

Awesome work!

This PR adds a store-gateway component similar to what exists in Mimir. https://github.com/grafana/mimir/blob/main/pkg/storegateway/gateway.go This is still very early, so far I've decided to depends on Mimir to benefits from the shuffle sharding and replication strategy. I'm not planning to provide any block persistence for now only in-memory. In the future we should use memcached for symbols and tsdb index. The store gateway open block for each tenant within 24h - (now -2h). Since we don't have a compactor a single gateway can download data duplicated by the replication factor of ingesters. This means we need to deduplicate blocks data while streaming now. To speed this up I've implemented a BufferedIterator that helps merging multiple iterator in parallel. In the future we should probably only download compacted blocks. For now we always deduplicate even in the ingester code even though no duplication happens there, the PR was big enough and I don't think it's a big concern right now. The store-gateway also replicate data for high availability which means more duplicates are sent to queriers. We should consider sending block ULID and request querier to select the one to consider. To keep thing simple for the query path right now, we split the queries using the queryStoreAfter configuration. Ultimately if we dedupe by blocks we should be able to remove that configuration and select blocks that needs to be querier from the querier directly. --------- Co-authored-by: Christian Simon <simon@swine.de>

simonswine and others added 30 commits April 18, 2023 18:06

Download metadata asynchronously

391a85a

Add Go module mimir/store-gateway dependency

24cc7f5

Merge bucket listing branch

776446c

Add store-gateway skeleton

ffe1c80

Resolve conflicts in dependencies

eb17c54

Resolve lint issues

babcfe0

Add users listing and fixes yaml dependency

2f59938

Fixes config files parsing and tsdb tests

ec177db

Hook into Pyroscope modules and disabled ip6

d88fc0d

Fixes tenants listing

a9c545f

Adding initial code to list blocks

69de899

Implement the initial block listing without caching.

786c706

Speed up block listing

f465bb9

load from disk

5c1f3a9

Experiment with ristretto cache

e7c0016

fix block path

d1f3cd4

Revert dskit overrides now that the required update is merged

e9dd0bd

Merge remote-tracking branch 'origin/store-gateway-component' into st…

048acee

…ore-gateway-component

Fixes opening blocks from remote storage

7926215

Merge branch 'cache-store-gateway' into store-gateway-component

0637dd1

Add the API skeleton for the storegateway

7ad32a0

Implement the merge API in the query path

2b2e1a8

Implement k-way merge of blocks and refactor sort iterator

63d522a

Refactor querier query replication

2ef8e75

Initialize the store-gateway querier

a9586a1

Merge remote-tracking branch 'origin/main' into store-gateway-component

738e295

Fixes the make generate target

9f19111

Merge commit '9f191111512ad0a7f45a042ca5d33dfe399bf2db' into store-ga…

e250daf

…teway-component

Regenerate docs

b417de0

cyriltovena added 7 commits May 26, 2023 08:32

Fixes the store gateway querier initialization

189cf76

Fixes bucket sync on other providers

aee9089

Fixes wrong cancellation

12a0986

Implement Select Series API

91e53b2

Fixes lint issues.

f02b165

Merge remote-tracking branch 'origin/main' into store-gateway-component

4d17815

Improve tracing instrumentation

cbfbb9d

cyriltovena marked this pull request as ready for review May 30, 2023 09:06

cyriltovena requested review from simonswine and kolesnikovae and removed request for simonswine May 30, 2023 09:06

cyriltovena added 3 commits May 30, 2023 11:19

generate help

b03dab8

make fmt

1301f38

make helm

23b896b

kolesnikovae reviewed May 30, 2023

View reviewed changes

pkg/phlaredb/block_querier.go Outdated Show resolved Hide resolved

pkg/querier/querier.go Show resolved Hide resolved

pkg/querier/ingester_querier.go Outdated Show resolved Hide resolved

pkg/querier/ingester_querier.go Outdated Show resolved Hide resolved

cyriltovena added 8 commits May 31, 2023 12:06

Fixes deduplications in store-gateway

b7b32a0

Setting parquet read buffer size to 2MB

9c71811

Merge remote-tracking branch 'origin/main' into store-gateway-component

c4159ef

make helm

4a3e4a6

Uses CloneVT from vtproto

888415b

Fixes objstore metrics

355726b

make fmt

10649c3

Fixes filesystem Bucket implementation

8e09dad

kolesnikovae approved these changes May 31, 2023

View reviewed changes

cyriltovena added 2 commits June 1, 2023 09:32

Merge remote-tracking branch 'origin/main' into store-gateway-component

9036c3f

Fixes the help text

a6bc302

cyriltovena merged commit 8262108 into main Jun 1, 2023

cyriltovena deleted the store-gateway-component branch June 1, 2023 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add store-gateway components #701

Add store-gateway components #701

cyriltovena commented May 16, 2023 •

edited

Loading

kolesnikovae left a comment

Add store-gateway components #701

Add store-gateway components #701

Conversation

cyriltovena commented May 16, 2023 • edited Loading

kolesnikovae left a comment

Choose a reason for hiding this comment

cyriltovena commented May 16, 2023 •

edited

Loading