-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QFE: new middleware to force query statistics collection #7854
QFE: new middleware to force query statistics collection #7854
Conversation
6d956d0
to
e441d55
Compare
Can you share an example log line? Just want to see how it looks like |
Formatted for presentation:
|
6317abf
to
551f368
Compare
@yeya24 would you mind reviewing the PR, pls? We use similar log line in our production environment and would like to upstream this. This provides a single place to look for heavy queries and investigate issues coming from the read path. |
551f368
to
34ae446
Compare
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
34ae446
to
6b13da0
Compare
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
6b13da0
to
b69c9bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think it makes sense to collect the stats and log them. Just some questions about the implementation detail
func (s statsMiddleware) Do(ctx context.Context, r Request) (Response, error) { | ||
if s.forceStats { | ||
r = r.WithStats("all") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand why we need a middleware here. Why we cannot do that in query frontend ServeHTTP
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are right. That is much simpler, will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I started to refactor this I remembered why we have to keep this in a middleware. The QFE http handler only has access to the raw http.Request
that means that if we want to change the value of the stats
field, we will have to parse the request twice.
Same thing for the statistics, the response (which is can be quite expensive to parse), is only available in its decoded state in the tripperware, the http handler just writes back the response from the tripperware.
|
||
return atomic.LoadInt64(&s.TotalLoadedSamples) | ||
} | ||
|
||
// Merge the provide Stats into this one. | ||
func (s *Stats) Merge(other *Stats) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we are not using this function to add stats samples. And why this funciton is not used anywhere.
This is where Cortex tracks and merges stats https://github.com/cortexproject/cortex/blob/master/pkg/querier/stats/stats.go#L346
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that is Cortex "private" way of tracking statistics, but in Thanos we chose to abide to Prometheus "simpler" stats interface using only Samples Total and Peak Samples, which is implemented all the way into the thanos promql engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see so we are not calling AddFetchedSeries
, AddFetchedChunkBytes
, etc in Thanos querier.
This is fine. But I don't think collecting this data is conflicting of collecting the standard query stats. We can still do both.
Anyway, it is not blocking this PR so we can change if we feel useful
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
1265ac9
to
457b861
Compare
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
f52f085
to
3d47cda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
||
return atomic.LoadInt64(&s.TotalLoadedSamples) | ||
} | ||
|
||
// Merge the provide Stats into this one. | ||
func (s *Stats) Merge(other *Stats) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see so we are not calling AddFetchedSeries
, AddFetchedChunkBytes
, etc in Thanos querier.
This is fine. But I don't think collecting this data is conflicting of collecting the standard query stats. We can still do both.
Anyway, it is not blocking this PR so we can change if we feel useful
I can try working on this as a follow up. |
Changes
Verification