-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lens] Expression function to scale numbers to a specific time interval #77692
Comments
Pinging @elastic/kibana-app (Team:KibanaApp) |
Feedback on the format: functional specifications for calculations would be useful (this and further comments written on request of @flash1293) These are github issues in a conversational format that link to other github issues, PRs etc. Neither the description nor the combined thread cannot be construed as specification:
Wiki pages or some kind of live docs would be a better place than conversational github issues; docs can be revised, questions can still be asked. Eg. in Google Docs, comments on the margin directly link to the words in question. While everything under the Sun does not need written specification, I think calculation functions do. Time series functions may have a surprising amount of subtlety. There'd be multiple uses of jotting down a functional specification that's comprehensive, focuses on the what, inputs and outputs, while refraining from looping in implementation detail(*) which would be noise when reading a spec.
As currently there are no functional specs yet, it's hard to even ask clarifying questions, would be good to have a baseline. I already posted some questions and may add more. (*) remarks for the implementors, eg. efficiency hints, can still be looped in, in clearly marked, separated points or callouts with different indentation, font etc., so there's total clarity as to where each implementation note begins and ends |
These are things a spec might address; many questions go away with certain simplifying conditions if they're made explicit in a spec. Also, it's totally not a functional wishlist, the purpose is to merely see the scope, maybe with the odd well-separated comment for known future expectations. 1. Would be great to specify what the time granularities can be
2. Edge cases
3. Composition
4. Interaction with other calculations
5. Temporal raster
6. Temporal normalization with gauge/level(worth separating the spec for calculations from the spec of what user/config options there may be, and how the user could interact)
7. Temporal normalization with counter
Temporal normalization with cumulative sum
Formatting
|
... I see #77811 is an attempt to distill the topic of this issue and might answer a couple of questions; also, clear separation of function vs. implementation notes. As that's on the verge of turning into a discussion too, I'm not commenting there |
Thanks for the questions, @monfera - some non-authoritative answers The biggest issues I see are the bootstrapping of cumulative sum and the handling of partial buckets. IMHO we should split out the bootstrapping and solve it separately in a later iteration. Partial bucket handling is not a new problem for scaling, we already have that issue today - that's why I think we should tackle is separately in a later version as well. Detailed answers:
I agree there are a bunch of edge cases, I propose starting out simple with second, minute, hour and day. Multipliers are not super relevant in the first version IMHO. As those can be shown in a dropdown we can handle them using the i18n process in place.
Is this about things like DST? If yes I propose starting with absolute time (e.g. day -> hour always uses a factor of 1/24)
Yes, we only consider this kind of binning (the meta data will only carry a single bucket interval)
That's a good point. In Visualize we mitigate this by indicating incomplete buckets in the chart: Also, there is an option to drop partial buckets on the search service level. One option would be adding partial bucket information to the meta data of the table together with the bucket interval - maybe like this (unit string just for illustration, it should probably use milliseconds)
That way No strong opinion, I would probably go with accepting this shortcoming for now (first and last bucket being cut of is a general problem in Elasticsearch/Kibana).
Yes, this is a metric level setting.
This functionality won't influence binning at al, it's just about scaling numerical value by a factor. Mixing daily and hourly bins on the same chart is already possible.
Yes, I thank that's what a user would expect
Yes
I'm not sure I get that correctly - there is no temporal normalizer changing the interval / bin width - it's just about applying a factor to numerical values. The upstream difference op would null the value in the first row in the table and the time scaling op would just skip it.
Empty intervals can always happen and all functions will have to deal with that - in case of time scaling it's just skipping those numbers. Charts are also handling empty values. No rows will be removed from the table based on this.
For xy charts we are exposing the interpolation capabilities of elastic-charts - we don't have meta information whether a field is gauge, event attribute or counter. The user has to configure that (at least for now)
Seems like this is similar to the second point of this comment
We don't have this meta information so we can't guide the user. The best thing we can do right now IMHO is a tooltip text. We can improve on that once we have the meta information.
We already do that in TSVB by using the max for each bucket for counters. That's not perfect but ES is working on providing a more correct approach taking resets within the buckets into account. So it's max per bucket -> difference to previous bucket -> cap at 0 to accomodate for resets -> scale by bucket interval (if configured)
That's a good point in itself - currently cumulative sum in Elasticsearch/Kibana is always bootstrapped with zero. I would keep it like this for now but think about how we can improve that.
Time scaling by a constant factor for each bucket doesn't make sense for cumulative sum - for now I would disallow using time scaling for cumulative sums. This is possible in the UI because we don't allow nested operations for now.
A user has a field formatter configured for each field - we can use that and add a suffix "per second", "per hour" and so on. It's also overwritable from within the Lens UI.
Formatting is preserved when switching. FIeld-assigned formatting information is overwritten by Lens level formatting information if provided. |
On the expression level, this is how we plan on implementing the "time scaling" logic that is part of supporting time series functions in Lens:
Algorithm to calculate initial scaling factor
The time scale function takes an output target scale which is defined by a fixed interval which can get statically translated into milliseconds (not relying on time zones etc.).
To get the input scale (the size of the current time bucket), use the utility function getInterval #78745 on the column meta data to get the used interval on ES level. Then use moment to expand that into a millisecond range, starting with the start of the bucket defined by either the very beginning of the queried time range or the key of the current bucket), using the
add
function of the moment object, capping at the very end of the queried time range.The value of the metric column is multiplied by
target scale in ms
/input scale in ms for current bucket
. This is repeated for each row in the current data table.Implementation
Because of the specifics of the function and the difficulties to use it correctly in outside of the Lens context, this function will be implemented within the Lens plugin for now.
The text was updated successfully, but these errors were encountered: