Date histograms can be very slow due to time_zone #18853
Labels
Feature:Visualizations
Generic visualization features (in case no more specific feature label is available)
Meta
performance
What's the issue?
We often see issues popping up, that the date histogram aggregations are too slow. Most of these times the cause is the used time zone. If you use any timezone that implementing daylight saving times (DST), e.g. by explicitly setting Kibana to such a timezone (like
Europe/Berlin
) or using the default timezone setting, which will autodetect the timezone from the browser and usually end up with a timezone that implements DST.The reason why this makes the aggregation way slower is that every timestamp aggregated in Elasticsearch now needs to be calculated against it's timezone, and the actual offset might be different from document to document, since some could fall into DST and some could be outside of DST.
Just for clarification: this performance issue happens in Elasticsearch, when aggregating the documents, not in Kibana itself.
What's the common workaround?
A common workaround is to switch Kibana to a fixed offset timezone (like
Etc/GMT-2
) meaning any timezone that doesn't implement DST. That way the calculation in Elasticsearch will be faster - depending on your amount of documents that might make a noticeable performance difference.There are a couple of issues with that approach (also mentioned in this comment):
What we can't do
There is one naive solution to that: automatically replace the users timezone by a fixed offset timezone before querying Elasticsearch, e.g. if the users browser has to be detected to be in
Europe/Berlin
, replace that timezone byEtc/GMT-2
orEtc/GMT-1
depending on whether the user is currently in DST or not. That would indeed improve performance of all requests.Unfortunately that solution would still trigger the third issue in the above list and even worse: make this implicit and hide it from the user. Let's look at an detailed example:
The date is March 26th, 2018 (Monday). A security engineer in Berlin, Germany - let's call him Hans - is auditing some login logs from the early day and the past week. Everything looks good for today, but there are some strange findings in last weeks logins. To further check those findings Hans compares them to actual working time of the correlating employees. Unfortunately that's the point where this implicit system would be very dangerous, since all dates from past weeks are actually now off by one hour from when they "actually" happened, since DST began on March 25th, 2018 in Europe/Berlin.
For Hans' sake and not to hide time shifting of some times in your data but not others, this is not a viable solution for Kibana at the moment. Of course the same issue happens with the workaround, but at least in that case, the user explicitly chose the specified fixed offset timezone.
What we can do
User specific timezones
One way to solve the first issue of that workaround (forcing all users into the same timezone) could be to allow user specific timezones, e.g. via user specific setting or via allowing the timezone to be changed in the time picker.
That way you could switch to an fixed offset time zone and still every user would be able to use their own appropriate fixed offset timezone.
See #18852
Optimizing timezones when within the same DST period
Update: The following behavior has been introduced in Elasticsearch since 6.4.0.
Another possible solution to improve performance, but still to produce valid output: Detect whether the date range filter when sending a date histogram lies both within one DST period, meaning I am not viewing data that crosses a DST switch. If that would be the case, we could use the offset that timezone had during that time as a fixed offset to the date histogram aggregation. This solution would improve performance, if you are looking at data from within one DST period, and would still show valid data (but with the usual decreased performance) when looking at a period, that had a DST switch in it.
I think that optimization should rather be done in Elasticsearch, than in Kibana, since that way all date histogram aggregations would benefit from that performance improvement. Also it would prevent issues in case Kibana should ever have different DST periods in their timezones than Elasticsearch - which hopefully should never happen.
That's why I commented that suggestion to elastic/elasticsearch#28727 which tracks the date histogram timezone performance issue in Elasticsearch.
The text was updated successfully, but these errors were encountered: