Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main into dimitar/store-gateway-async-series #3655

Conversation

dimitarvdimitrov
Copy link
Contributor

This pulls in the changes from #3645 and #3652

pracucci and others added 30 commits November 28, 2022 22:04
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
)

* Improve networking dashboards to work with read-write deployment mode

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed panel title

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
…ilities (#3531)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
#3518)

* Hide TSDB block ranges period config from doc and mark it experimental

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Remove CLI flag mention from Helm migration doc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
This doesn't accurately count the dropped samples. For example
if a single metric with multiple samples is faulty, we get a single error
rather than an error per sample.

But I believe its the best best-effort measurement.

Before we used to do `DatapointCount() - samplesInMap()`

The problem is the following:
1. target_info is a synthetic metric added in Prometheus, so the final samples could higher.
2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
…or release (#3550)

* fix up changelog in preparation for release

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* remove non-user-facing line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* reclassify some docs enhancements as bugfix

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add new release section in changelog

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* apply PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* fixes according to PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit ce49b04)

Co-authored-by: Marco Pracucci <marco@pracucci.com>
…yment mode (#3574)

* Improve networking panels to work with read write deployment mode (#3519)

* Improve networking dashboards to work with read-write deployment mode

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed panel title

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit 5064c64)

* add an empty space

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "add an empty space"

This reverts commit 8136c97.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
…ilities (#3531) (#3578)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit b0a390b)

Co-authored-by: Marco Pracucci <marco@pracucci.com>
#3518) (#3575)

* Hide TSDB block ranges period config from doc and mark it experimental

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Remove CLI flag mention from Helm migration doc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit 56b6e35)

Co-authored-by: Marco Pracucci <marco@pracucci.com>
This doesn't accurately count the dropped samples. For example
if a single metric with multiple samples is faulty, we get a single error
rather than an error per sample.

But I believe its the best best-effort measurement.

Before we used to do `DatapointCount() - samplesInMap()`

The problem is the following:
1. target_info is a synthetic metric added in Prometheus, so the final samples could higher.
2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
(cherry picked from commit 2ad7eb4)

Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
* update screenshots (#3553)

(cherry picked from commit e2fa561)

* empty space

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty space"

This reverts commit c8616a8.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
… preparation for release (#3579)

* add missing entries and release section to changelog in preparation for release (#3550)

* fix up changelog in preparation for release

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* remove non-user-facing line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* reclassify some docs enhancements as bugfix

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add new release section in changelog

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* apply PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* fixes according to PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
(cherry picked from commit a61ea75)

* empty line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty line"

This reverts commit 6610a9d.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
* mark version in changelog as rc (#3572)

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
(cherry picked from commit 0f29293)

* empty line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty line"

This reverts commit 25fc74b.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
* update version to 2.5.0-rc.0

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* empty line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty line"

This reverts commit 2ef9237.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
…3621)

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
…ices are not yet running. (#3625)

* Fix "unsupported value type" logged when calling /ready and some services are not yet running.

Without this, we get log messages like:
level=debug ts=2022-12-02T01:52:02.411627554Z caller=mimir.go:859 msg="some services are not Running" services="unsupported value type"

* Add changelog entry.
…3630)

Fix and add documentation comments on storepb.StoreGatewayServer
implementation StoreGateway and storepb.StoreServer implementation BucketStores.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
)

This is the next step to introducing our changes from PR 3355. This
commit adds the iterators (and tests for them) used over the data
structures we added in PR 3587.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
andyasp and others added 12 commits December 5, 2022 09:58
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* alerts: Fix MimirCompactorSkippedBlocksWithOutOfOrderChunks

There `component` label does not exist. However, there is no need to
filter on component as
cortex_compactor_blocks_marked_for_no_compaction_total is only sent by
the compactor.

* Update CHANGELOG
* Remove 'Fingerprint' function; it was only used in tests

Replace attempt to check collision with values that actually do collide.

* Remove 'FastFingerprint' function

In QueryLimiter replace it with Labels.Hash(), which is a better
hash algorithm and avoids us maintaining extra code.

Co-authored-by: Bryan <bryan@Bryans-MacBook-Air.local>
* Distributor: add test for Push with unsorted labels

To demonstrate that they get sorted on entry.

Also clarify a comment which mentions unsorted labels.

* Validation: stop checking series have sorted labels

Distributor sorts labels before this check, so it is never triggered.

* Distributor: check result of label-sorting test
…nents (#3626)

* alerts: Update MimirMemoryMapAreasTooHigh to include read-write components

* Update changelog

Co-authored-by: Marco Pracucci <marco@pracucci.com>
in the streaming store-gateway work we need more control over what
series and samples are written to a test block. This PR makes it
possible for each test that creates a test block to write arbitrary
series and samples.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
The current e2e testing framework allows to inject a bucket
implementation. Each end-to-end test then repeats mostly the same code
to set up the testing suite.

This PR moves the testing setup further up and allows each test to
either immediately get a ready test suite with sensible defaults _or_
to customize the test suite with the testing config.

This change will be needed for the streaming store-gateway work where we
want to run the whole e2e test suite also for the streaming
implementation.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* distributor: remove labels with empty values

Empty labels are stripped when the series reaches TSDB, so we should
remove them earlier to keep things consistent.

* Added more test cases

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
…3645)

* Streaming store-gateway: add iterators for loading series from index

We are adding three iterators:
* `loadingSeriesChunkRefsSetIterator` is responsible for loading chunks
from the index
* `postingsSetsIterator` is responsible for batching postings so loading
 series from the index doesn't load all labels and chunk refs from the
 index in one go
* `limitingSeriesChunkRefsSetIterator` is responsible for limiting
 series and chunks within a request

We are also adding a function to set up these three iterators for an
incoming Series() request for a block.

Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Update nolint directives

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Fix e2e testing interface used

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
# Conflicts:
#	CHANGELOG.md
#	pkg/storegateway/series_chunks.go
#	pkg/storegateway/series_chunks_test.go
#	pkg/storegateway/series_refs.go
#	pkg/storegateway/series_refs_test.go
@dimitarvdimitrov dimitarvdimitrov requested review from a team as code owners December 6, 2022 11:09
@dimitarvdimitrov dimitarvdimitrov merged this pull request into dimitar/store-gateway-async-series Dec 6, 2022
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/store-gateway-async-series-merge-main branch December 6, 2022 11:11
dimitarvdimitrov added a commit that referenced this pull request Dec 6, 2022
This is a POC and is not meant to be merged yet.

Overview

Instead of loading all series (label sets and chunks) in memory before
responding to a Series() RPC, we can batch them and load X at a time.
This gives more predictability on the memory utilization of the
store-gateway. The tradeoff is having to do one trip to the index cache
and the bucket for each batch, which will affect overall latency of requests.

How to use

This change disables batch series loading by default and adds two flags
to control this - whether it's enabled via
`-blocks-storage.bucket-store.batched-series-loading=false` and how many
series go into each batch via
`-blocks-storage.bucket-store.batch-series-size=65536`.

Limiting

Ideally we want ot put a limit on the number of bytes that we want to
load in each batch instead of the number of series. For now limiting
the number of series should still give us some resilience against "big"
requests, while still being vulnerable to a flurry of many requests.

Testing

I've changed all tests within pkg/storegateway to use this new loading
strategy. This should give confidence that it is producing correct
results. Further work should improve testing around resource utilization
(i.e. batches are indeed freed one after the other)
and should test both batched and non-batched strategies.

This commit has TODOs, which should be addressed before merging this.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fix which context we use for openBlockSeriesChunkRefsSetsIterator

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Streaming store-gateway: Split up blockSeriesChunkRefsSetIterator (#3641)

Merge main into dimitar/store-gateway-async-series (#3642)

Streaming store-gateway: Move seriesHasher (#3643)

Move test limiter

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

streaming store-gateway: move and rename loadingBatchSet (#3650)

Renames `loadingBatchSet` to `loadingSeriesChunksSetIterator` and moves
it to `series_chunks.go`

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

streaming store-gateway: add preloading for series without chunks (#3649)

* streaming store-gateway: add preloading for series without chunks

This PR
* adds preloading to the seriesSet without chunks
* changes the implementation of `preloadingSeriesChunkSetIterator`
to be a generic one, so we can reuse it for preloading
`seriesChunkRefsSet`s too.
* moves some tests of `preloadingSeriesChunkSetIterator` from
`batch_series_test.go` to `series_chunks_test.go`
* moves `newSeriesSetWithChunks` to `series_chunks.go`

Question/note to reviewers: should this iterator be in a new file? I
think putting it in preload.go and tests in preload_test.go sounds ok
since it now applies to both series_chunks.go and series_refs.go.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

streaming store-gateway: Polish batchSetsForBlocks (#3651)

* Polish batchSetsForBlocks

* rename to batchedSeriesSetForBlocks
* remove unused cleanups
* move to bucket.go
* some formatting
* also remove cleanups from synchronousSeriesSet

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Rename batchedSeriesSetForBlocks to streamingSeriesSetForBlocks

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>

Merge main into dimitar/store-gateway-async-series (#3655)

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.