-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui: latency graphs reporting 0 latency incorrectly #12998
Comments
What cluster was this? |
There's only one line there, but the 99th percentile graph shows a line for every node (unless you are looking at a single node). Is it possible that this node has no requests? Edit: I took a look, yeah something might not be getting recorded correctly on the backend. |
This was a 1-node cluster so it was definitely receiving requests. I've seen this happen multiple times. |
After reading the code, i'm surprised it ever has non-zero values; the sliding histogram is only advanced when the histogram values are read. This means that, immediately before returning values, the histogram might rotate all previously recorded data out of the histogram, resulting in a zero value. Two possible fixes:
|
What's the current status on this issue? |
While a "fix" was prepared to create non-zero values, we are looking into prioritizing a much bigger change to the time series system that will allow us to handle histograms correctly (see #7896). This will allow for data which is far more intuitive (no odd "window" sizes), as well as the correct computation of quantiles across the cluster. |
Adds a derived configuration value "HistogramWindowDuration" to the server config, which specifies the approximate duration for which individual samples are retained in our sliding window histograms. Previously, histogram windows were initialized using the MetricSampleWindow value. However, this resulted in cockroachdb#12998; this is caused because the histogram window is rotated when querying a histogram, not when recording a sample. In many cases, the timing is such that all samples recorded are discarded right before the histogram is queried, resulting in a zero value. Because of this, and additionally issue cockroachdb#12835, we are lengthening the histogram window duration for all histogram metrics. This includes all existing histograms that were using MetricSampleWindow, and a few histograms which were using constant values of 1 minute. The new value is arbitrarily chosen to be six times the MetricSampleWindow configuration setting; under default settings, that equates to a window of one minute. The work here can be considered a temporary fix to alleviate the problems in a couple of issues; the histogram system is going to be greatly overhauled in issue cockroachdb#7896, part of which will be the removal of sliding-window histograms. Fixes cockroachdb#12835 Fixes cockroachdb#12998
The above is from a cluster that is performing ~2k ops/sec. It is impossible for the latency to be 0.
The text was updated successfully, but these errors were encountered: