Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main into dimitar/store-gateway-async-series #3642

Conversation

dimitarvdimitrov
Copy link
Contributor

merges main. Conflicts were primarily in adding more code to

  • pkg/storegateway/series_chunks.go
  • pkg/storegateway/series_chunks_test.go
  • pkg/storegateway/series_refs.go
  • pkg/storegateway/series_refs_test.go

pracucci and others added 30 commits November 28, 2022 22:04
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
)

* Improve networking dashboards to work with read-write deployment mode

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed panel title

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
…ilities (#3531)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
#3518)

* Hide TSDB block ranges period config from doc and mark it experimental

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Remove CLI flag mention from Helm migration doc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
This doesn't accurately count the dropped samples. For example
if a single metric with multiple samples is faulty, we get a single error
rather than an error per sample.

But I believe its the best best-effort measurement.

Before we used to do `DatapointCount() - samplesInMap()`

The problem is the following:
1. target_info is a synthetic metric added in Prometheus, so the final samples could higher.
2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
…or release (#3550)

* fix up changelog in preparation for release

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* remove non-user-facing line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* reclassify some docs enhancements as bugfix

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add new release section in changelog

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* apply PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* fixes according to PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit ce49b04)

Co-authored-by: Marco Pracucci <marco@pracucci.com>
…yment mode (#3574)

* Improve networking panels to work with read write deployment mode (#3519)

* Improve networking dashboards to work with read-write deployment mode

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed panel title

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit 5064c64)

* add an empty space

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "add an empty space"

This reverts commit 8136c97.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
…ilities (#3531) (#3578)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit b0a390b)

Co-authored-by: Marco Pracucci <marco@pracucci.com>
#3518) (#3575)

* Hide TSDB block ranges period config from doc and mark it experimental

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Remove CLI flag mention from Helm migration doc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
(cherry picked from commit 56b6e35)

Co-authored-by: Marco Pracucci <marco@pracucci.com>
This doesn't accurately count the dropped samples. For example
if a single metric with multiple samples is faulty, we get a single error
rather than an error per sample.

But I believe its the best best-effort measurement.

Before we used to do `DatapointCount() - samplesInMap()`

The problem is the following:
1. target_info is a synthetic metric added in Prometheus, so the final samples could higher.
2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
(cherry picked from commit 2ad7eb4)

Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
* update screenshots (#3553)

(cherry picked from commit e2fa561)

* empty space

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty space"

This reverts commit c8616a8.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
… preparation for release (#3579)

* add missing entries and release section to changelog in preparation for release (#3550)

* fix up changelog in preparation for release

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* remove non-user-facing line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* reclassify some docs enhancements as bugfix

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add new release section in changelog

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* apply PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* fixes according to PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
(cherry picked from commit a61ea75)

* empty line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty line"

This reverts commit 6610a9d.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
* mark version in changelog as rc (#3572)

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
(cherry picked from commit 0f29293)

* empty line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty line"

This reverts commit 25fc74b.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
* update version to 2.5.0-rc.0

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* empty line

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* Revert "empty line"

This reverts commit 2ef9237.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
…3621)

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
…ices are not yet running. (#3625)

* Fix "unsupported value type" logged when calling /ready and some services are not yet running.

Without this, we get log messages like:
level=debug ts=2022-12-02T01:52:02.411627554Z caller=mimir.go:859 msg="some services are not Running" services="unsupported value type"

* Add changelog entry.
…3630)

Fix and add documentation comments on storepb.StoreGatewayServer
implementation StoreGateway and storepb.StoreServer implementation BucketStores.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
)

This is the next step to introducing our changes from PR 3355. This
commit adds the iterators (and tests for them) used over the data
structures we added in PR 3587.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
andyasp and others added 6 commits December 5, 2022 09:58
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>
* alerts: Fix MimirCompactorSkippedBlocksWithOutOfOrderChunks

There `component` label does not exist. However, there is no need to
filter on component as
cortex_compactor_blocks_marked_for_no_compaction_total is only sent by
the compactor.

* Update CHANGELOG
* Remove 'Fingerprint' function; it was only used in tests

Replace attempt to check collision with values that actually do collide.

* Remove 'FastFingerprint' function

In QueryLimiter replace it with Labels.Hash(), which is a better
hash algorithm and avoids us maintaining extra code.

Co-authored-by: Bryan <bryan@Bryans-MacBook-Air.local>
* Distributor: add test for Push with unsorted labels

To demonstrate that they get sorted on entry.

Also clarify a comment which mentions unsorted labels.

* Validation: stop checking series have sorted labels

Distributor sorts labels before this check, so it is never triggered.

* Distributor: check result of label-sorting test
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

# Conflicts:
#	pkg/storegateway/series_chunks.go
#	pkg/storegateway/series_chunks_test.go
#	pkg/storegateway/series_refs.go
#	pkg/storegateway/series_refs_test.go
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner December 5, 2022 12:50
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed only changes to pkg/storegateway and LGTM

@dimitarvdimitrov dimitarvdimitrov merged commit f3c733e into dimitar/store-gateway-async-series Dec 5, 2022
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/store-gateway-async-series-merge-main branch December 5, 2022 13:37
dimitarvdimitrov added a commit that referenced this pull request Dec 6, 2022
This is a POC and is not meant to be merged yet.

Overview

Instead of loading all series (label sets and chunks) in memory before
responding to a Series() RPC, we can batch them and load X at a time.
This gives more predictability on the memory utilization of the
store-gateway. The tradeoff is having to do one trip to the index cache
and the bucket for each batch, which will affect overall latency of requests.

How to use

This change disables batch series loading by default and adds two flags
to control this - whether it's enabled via
`-blocks-storage.bucket-store.batched-series-loading=false` and how many
series go into each batch via
`-blocks-storage.bucket-store.batch-series-size=65536`.

Limiting

Ideally we want ot put a limit on the number of bytes that we want to
load in each batch instead of the number of series. For now limiting
the number of series should still give us some resilience against "big"
requests, while still being vulnerable to a flurry of many requests.

Testing

I've changed all tests within pkg/storegateway to use this new loading
strategy. This should give confidence that it is producing correct
results. Further work should improve testing around resource utilization
(i.e. batches are indeed freed one after the other)
and should test both batched and non-batched strategies.

This commit has TODOs, which should be addressed before merging this.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fix which context we use for openBlockSeriesChunkRefsSetsIterator

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Streaming store-gateway: Split up blockSeriesChunkRefsSetIterator (#3641)

Merge main into dimitar/store-gateway-async-series (#3642)

Streaming store-gateway: Move seriesHasher (#3643)

Move test limiter

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

streaming store-gateway: move and rename loadingBatchSet (#3650)

Renames `loadingBatchSet` to `loadingSeriesChunksSetIterator` and moves
it to `series_chunks.go`

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

streaming store-gateway: add preloading for series without chunks (#3649)

* streaming store-gateway: add preloading for series without chunks

This PR
* adds preloading to the seriesSet without chunks
* changes the implementation of `preloadingSeriesChunkSetIterator`
to be a generic one, so we can reuse it for preloading
`seriesChunkRefsSet`s too.
* moves some tests of `preloadingSeriesChunkSetIterator` from
`batch_series_test.go` to `series_chunks_test.go`
* moves `newSeriesSetWithChunks` to `series_chunks.go`

Question/note to reviewers: should this iterator be in a new file? I
think putting it in preload.go and tests in preload_test.go sounds ok
since it now applies to both series_chunks.go and series_refs.go.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

streaming store-gateway: Polish batchSetsForBlocks (#3651)

* Polish batchSetsForBlocks

* rename to batchedSeriesSetForBlocks
* remove unused cleanups
* move to bucket.go
* some formatting
* also remove cleanups from synchronousSeriesSet

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Rename batchedSeriesSetForBlocks to streamingSeriesSetForBlocks

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>

Merge main into dimitar/store-gateway-async-series (#3655)

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.