Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Port udf execution metrics to be bucketed based on time (#32048)
The dashboard provides simple timeseries data for different counters (function counts, cache hit rates, and rows read or written to a table) and histograms (percentiles for function execution time). These timeseries are stored naively: We keep the original data as-is and store a circular buffer of the past 1000 samples per metric. This leads to confusing behavior on a few fronts: very active functions can effectively have very little data in the dashboard, and different timeseries may have different validity windows. This PR switches in-memory metrics to use a different approach: - We default to storing timeseries data at 1m granularity and storing a hour's worth of data. The data structure stores buckets sparsely: only buckets that have a sample take up memory. We could store more, but we currently reset metrics on backend restart, so only showing the past hour makes this less disruptive. - For counters (e.g. database rows read), this is 8 bytes * 60 = ~0.5KB of data per metric. We log ~5 metrics per function and ~2 metrics per table => we shouldn't use more than 5MB of RAM in the worst case. - For histograms (e.g. function latency), we use an HDR histogram configured to roughly 1.5KB per bucket = 90KB of data per metric. With ~1000 active functions, this will be at most 90MB of memory. Eventually, we'd like to set up a victoriametrics cluster for customers and just use that, but this will unlock a few more analyses for the insights project. (For example, we can efficiently compute the top K functions for a given metric.) It's API compatible with the old stuff, so we shouldn't need to change the dashboard. GitOrigin-RevId: 08a9de42d4263b3b4a2b07d3366a2275a80d7df4
- Loading branch information