-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a more flexable histogram / tdigest #6440
Comments
I've learned that we have a t-digest library for Go that we are using in Flux: github.com/influxdata/tdigest. It would make a lot of sense to share this library, it is fast, we would have completely consistent behavior with Flux, and it would be less code to maintain. I'm not sure that it would be possible to access the centroids currently, but I think this could probably be added, we can work out the details with @goller.
Should be able to use
This should be done outside of the plugin, I think that it can use the
I don't entirely understand this but it should probably be separate from this plugin. It may be possible to do this with clever use of the rename processor. If the "atom" tag is not set then rename the "az" tag. It appears there are two modes for the aggregator. In the "local" mode a set of quantiles are requested and in the "central" mode the centroid data is sent to the server so that the t-digest algorithm can be calculated later with different quantile values? Here is how I suggest we layout the "local" mode aggregations:
Same in line protocol:
For "central" mode metrics, something like this based on what data we need to save:
Same in line protocol:
This would align the storage model with our plans for other histogram/quantile data such as in #4415 or the histogram aggregator, and I believe it would be more friendly for frontends like Grafana/Chronograf. |
Single TDigest libraryThe only reason I didn't use any library that was published was the fact that they all seem to hide the centroids and that is necessary for sending the data to a central aggregator. Using the same library as Flux could benefit people sending data to InfluxDB and will likely have no adverse impact on anyone else. I think the conversation around the central output format could likely influence how that library exposes the data. Tag manipulation outside of the pluginThe tag manipulation that is done within the plugin was implemented there to support multiple bucketing configurations. The most common example is that you want to aggregate data once for each host that is running your service and then again for the service as a whole, excluding the host info.
We did not want to go the route of multiple instances of the plugin to reduce duplication of common tasks between configurations. Local aggregation output formatThe field name "quantile" is inaccurrate but that could be changed to "aggregation" or something similar. Central aggregation output formatBreaking up the centroid objects is going to be a bad idea. The numeric pairs represent a value and a weight. These values are dynamically calculated as values are added to the histogram and neither weights or counts from any two histograms will have any logical mapping. Even in the simple example you listed, if cpu_idle had values 98 and 99 then cpu_nice would end up with values 2 and 1. In a more real world example of http timers, the odds of two histograms sharing values is quite small. Second and potentially more imporant, the more the centroid is broken up, the more pieces that will have to be put together. The recombining of the histograms from multiple hosts that must be done to create central aggregations is already a computationally intensive task. The task also has a time limit. Because new histograms are generated every 60s, every batch MUST be finished in less than that time or you will get a perpetually increasing delay for metrics processing. I am not opposed to alterring the output format but in addition to performance, it needs to be ingestable by libraries other than what was used to encode the source histogram. Our central aggregation is not even written in GoLang, for example. |
Tag manipulation outside of the plugin I haven't looked at the implementation but the best and expected way to do bucketing is to create a bucket for every measurement+tagset. The user can then use tagexclude to remove tags, which byt there removal will result in the buckets be merged. I can show an example if my explanation doesn't make sense. Local aggregation output format
I must have misunderstood the way this works, doesn't the local mode work by calulating one or more estimated quantiles? With a range from 0-1, the median aggregation is the 0.5 quantile, and min is the 0 quantile and the max is the 1 quantile?
This is essentially the difference between older systems like graphite and newer tagged time series databases. By now almost all TS databases have adopted tags because they make querying easier and the style is more extensible. Our rule of thumb is to not write multiple values to a single column. Central aggregation output format Storing the centroid JSON as multiple fields allows InfluxDB to store the data in an efficient manner. With higher compression settings this JSON will become to long and will cause issues on many InfluxDB installs that have a limited write payload and line size. Also it will be completely unwritable to a fair number of other outputs that don't support a string type (prometheus, graphite, etc). I do think we need to make at least one change from my proposal above, since the centroid could have so many values and they would be presumably changing over time, this wouldn't give proper identity to the items and would have poor cardinality:
Using the centroid number instead as the tag would probably make more sense, there would never be more than
|
Quantile would not include count or sum which are necessary for me to support. I will be attending the Flux training at InfluxDays SF, perhaps there is less processing time sensitive method for merging histograms from multiple sources. |
Just a quick update for those watching this issue. @PhoenixRion, @goller, and I have been discussing how exactly this plugin could come together. Our main goal in Telegraf is to generate a data model for storing tdigests in a format that will allow post collection merging and summarization. The data format should be well documented and applicable to should be possible to use with multiple output plugins. We would like to follow this up with functions for merging the digests and estimating quantiles using Flux. The next items we are planning to work on are:
|
Update to proposed format:
|
Feature Request
Groupon needed a more flexible histogram aggregation. Support for central re-aggregation is also needed. Central aggregations is currently accomplished outside of telegraf.
Proposal: https://github.com/PhoenixRion/telegraf/tree/master/plugins/aggregators/tdigestagg
Configuration:
Local Aggregation Output:
Central Aggregation Output:
Current behavior:
Statically defined bucket boundaries
Aggregations not generated based on histogram
Desired behavior:
Dynamic histogram buckets
Ability to emit histogram for central aggregation
Arbitrary list of aggregation buckets
Use case
Mathematically accurate percentiles for metrics across multiple sources.
The text was updated successfully, but these errors were encountered: