-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: label_values: fetch less postings #7814
store: label_values: fetch less postings #7814
Conversation
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Good catch.
Do we need to update the comment on L2148?
// Should never be empty since we added labelName!="" matcher to the list of matchers.
This is not necessarily true now if we don't add the non empty matcher if it has metric name?
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
pkg/store/bucket.go
Outdated
@@ -2033,7 +2042,8 @@ func (s *BucketStore) LabelValues(ctx context.Context, req *storepb.LabelValuesR | |||
|
|||
// If we have series matchers and the Label is not an external one, add <labelName> != "" matcher | |||
// to only select series that have given label name. | |||
if len(reqSeriesMatchersNoExtLabels) > 0 && !b.extLset.Has(req.Label) { | |||
// We don't need such matcher if matchers already contain __name__ matcher. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand this comment. Why it is specific to the metric name label?
It is just a tradeoff of fetching more postings or series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my experience, if __name__
is specified, it means, user knows, that the metric contains requested label. The method will select all series with such __name__
and they 99% will have the requested label.
As for random queries, where there can be results with the __name__
but without specified labels, normally, they should be rare. Users still can make queries like label_values({}, pod)
. They will work, but will fetch alot of data.
So the point is, if the user knows what they need and specifies __name__
, we don't need to save them from fetching some extra series. But we do save them from fetching all references to all kube_state_metrics from object storage for example.
Other popular labels may be service
, application
, job
, instance
. If __name__
is specified, it is guaranteed it'll be less data, but all the label values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not against this. Just hope we have a better way to cover more labels because we have those information from posting cardinality and series size.
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Changes
This change optimizes
label_values
request to store.Grafana lets you define template variables. You can define
pod
variable with query like this:label_values(kube_pod_info{}, pod)
.In older versions of grafana this query will lead to
series
request to store. In newer versions and with particular datasource settings this query will lead tolabel_values
request. End result will be the same, but it considerably differs in data being downloaded from object storage.Investigations show that in case of
label_values
thanos store fetches alot of postings. This is due to matcher<labelName> != ""
being added inlabel_values
request.Analyzing debug logs of query stat we can see, that after the change we can download considerably less data from object storage.
This change checks, if matchers contain
__name__
matcher. This matcher, if present, lessens the number of data needed to be downloaded from storage. Because matchers are evalueated separately duringblockClient.ExpandPostings()
call, matcher<labelName> != ""
will select all references to series which have the label. In case of popular labels such aspod
in example above this will selecet alot of references.The change won't affect queries like
label_values(pod)
orlabel_values({some_label="some_value"}, pod)
, where__name__
matcher isn't specified.Verification