Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Add store-gateway components #701

Merged
merged 52 commits into from
Jun 1, 2023
Merged

Add store-gateway components #701

merged 52 commits into from
Jun 1, 2023

Conversation

cyriltovena
Copy link
Collaborator

@cyriltovena cyriltovena commented May 16, 2023

This PR adds a store-gateway component similar to what exists in Mimir. https://github.com/grafana/mimir/blob/main/pkg/storegateway/gateway.go

This is still very early, so far I've decided to depends on Mimir to benefits from the shuffle sharding and replication strategy.

I'm not planning to provide any block persistence for now only in-memory. In the future we should use memcached for symbols and tsdb index. The store gateway open block for each tenant within 24h - (now -2h).

Since we don't have a compactor a single gateway can download data duplicated by the replication factor of ingesters. This means we need to deduplicate blocks data while streaming now. To speed this up I've implemented a BufferedIterator that helps merging multiple iterator in parallel. In the future we should probably only download compacted blocks.

For now we always deduplicate even in the ingester code even though no duplication happens there, the PR was big enough and I don't think it's a big concern right now.

The store-gateway also replicate data for high availability which means more duplicates are sent to queriers. We should consider sending block ULID and request querier to select the one to consider.

To keep thing simple for the query path right now, we split the queries using the queryStoreAfter configuration. Ultimately if we dedupe by blocks we should be able to remove that configuration and select blocks that needs to be querier from the querier directly.

Ultimately I think we should have the store-gateway only serve blockID, index and metadata cc @kolesnikovae

Moves forward #717

Block Page on store-gateway.

image

Trace example:

image

Remaining items left for later:

simonswine and others added 30 commits April 18, 2023 18:06
This is a POC on how to gather block metadata quickly from bucket by
listing by prefix. This will also cache the metadata in a local memcache
if available.

To run it against the ops bucket you can:

```
$ docker run --name phlare-memcache -p 11211:11211 -d memcached

$ go run ./pkg/querier/bucket/
level=info msg="created remote cache"
level=info msg="block prefixes to query" prefixes="[]string{\"01GXR\", \"01GXS\", \"01GXT\", \"01GXV\", \"01GXW\", \"01GXX\", \"01GXY\", \"01GXZ\", \"01GY0\", \"01GY1\", \"01GY2\", \"01GY3\", \"01GY4\", \"01GY5\", \"01GY6\", \"01GY7\", \"01GY8\", \"01GY9\", \"01GYA\"}"
level=info msg="found block" prefix=01GXR block_count=95 sample_count=19126470198 series_count=355343
level=info msg="found block" prefix=01GXS block_count=81 sample_count=14836423557 series_count=174551
level=info msg="found block" prefix=01GXT block_count=82 sample_count=17183446330 series_count=248154
level=info msg="found block" prefix=01GXV block_count=80 sample_count=16174008761 series_count=237720
level=info msg="found block" prefix=01GXW block_count=81 sample_count=15844525155 series_count=230524
level=info msg="found block" prefix=01GXX block_count=82 sample_count=17148220671 series_count=233622
level=info msg="found block" prefix=01GXY block_count=81 sample_count=16076945767 series_count=192073
level=info msg="found block" prefix=01GXZ block_count=82 sample_count=16933257482 series_count=250945
level=info msg="found block" prefix=01GY0 block_count=84 sample_count=16886889138 series_count=246416
level=info msg="found block" prefix=01GY1 block_count=101 sample_count=18281724672 series_count=197294
level=info msg="found block" prefix=01GY2 block_count=81 sample_count=14864911834 series_count=136578
level=info msg="found block" prefix=01GY3 block_count=82 sample_count=14822618559 series_count=135249
level=info msg="found block" prefix=01GY4 block_count=82 sample_count=14315337998 series_count=141371
level=info msg="found block" prefix=01GY5 block_count=81 sample_count=15112318609 series_count=159509
level=info msg="found block" prefix=01GY6 block_count=82 sample_count=15164174250 series_count=184329
level=info msg="found block" prefix=01GY7 block_count=81 sample_count=16876956095 series_count=237297
level=info msg="found block" prefix=01GY8 block_count=82 sample_count=16662724596 series_count=223123
level=info msg="found block" prefix=01GY9 block_count=82 sample_count=17089011818 series_count=221571
level=info msg="found block" prefix=01GYA block_count=61 sample_count=15023932010 series_count=201735
```
@cyriltovena cyriltovena marked this pull request as ready for review May 30, 2023 09:06
@cyriltovena cyriltovena requested review from simonswine and kolesnikovae and removed request for simonswine May 30, 2023 09:06
pkg/phlaredb/block_querier.go Outdated Show resolved Hide resolved
pkg/querier/querier.go Show resolved Hide resolved
pkg/querier/ingester_querier.go Outdated Show resolved Hide resolved
pkg/querier/ingester_querier.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Awesome work!

@cyriltovena cyriltovena merged commit 8262108 into main Jun 1, 2023
@cyriltovena cyriltovena deleted the store-gateway-component branch June 1, 2023 08:00
simonswine added a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
This PR adds a store-gateway component similar to what exists in Mimir. https://github.com/grafana/mimir/blob/main/pkg/storegateway/gateway.go

This is still very early, so far I've decided to depends on Mimir to benefits from the shuffle sharding and replication strategy.

I'm not planning to provide any block persistence for now only in-memory. In the future we should use memcached for symbols and tsdb index. The store gateway open block for each tenant within 24h - (now -2h).

Since we don't have a compactor a single gateway can download data duplicated by the replication factor of ingesters. This means we need to deduplicate blocks data while streaming now. To speed this up I've implemented a BufferedIterator that helps merging multiple iterator in parallel. In the future we should probably only download compacted blocks.

For now we always deduplicate even in the ingester code even though no duplication happens there, the PR was big enough and I don't think it's a big concern right now.

The store-gateway also replicate data for high availability which means more duplicates are sent to queriers. We should consider sending block ULID and request querier to select the one to consider.

To keep thing simple for the query path right now, we split the queries using the queryStoreAfter configuration. Ultimately if we dedupe by blocks we should be able to remove that configuration and select blocks that needs to be querier from the querier directly.

---------

Co-authored-by: Christian Simon <simon@swine.de>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants