-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow date_histogram after upgrading to 7.3.0 #45702
Comments
Thanks very much for your interest in Elasticsearch. This appears to be a user question, and we'd like to direct these kinds of things to the forums. If you can stop by there, we'd appreciate it. This allows us to use GitHub for verified bug reports, feature requests, and pull requests. There's an active community in the forums that should be able to help get an answer to your question. As such, I hope you don't mind that I close this. Finally note that we track performance across releases in https://elasticsearch-benchmarks.elastic.co and you can see results between 6.8 and 7.3 e.g. for date_histogram using the nyc_taxis track in https://elasticsearch-benchmarks.elastic.co/#tracks/nyc-taxis/release. |
I'm linking here to the forum post I created: https://discuss.elastic.co/t/slow-date-histogram-after-upgrading-to-7-3-0-on-dense-indexes/196475. Please note that we could not reproduce the issue with the |
reopening, as a performance issue between 6.8 and 7.3 seems to exist and needs to be evaluated. Thanks for opening @brenuart! |
Pinging @elastic/es-analytics-geo |
Hi, We are in the process of upgrading our ELK stack from 6.8 to 7.3 and we see exactly the same behaviour. In our logs the I do not have a nice dataset like @brenuart created to reproduce the issue, but ran some test queries that show the problem (839ms in ES6.8, 4683ms in ES7.3): In ES 6.8
in ES 7.3
|
Thanks for the extra details @vboulaye! We're actively looking into this issue right now, will update as we diagnose. Out of curiosity, would you mind trying those queries again without the |
@polyfractal We did the test with and without timezone with both versions. 7.3 is indeed a few seconds slower when a timezone is set. |
Noted, thanks for the clarification! |
Hi, |
Do you think it is worth trying with version 7.2 or is this more likely to be related to the major changes in v7? |
We've attempted to reproduce the issue without success. We used the data sets you provided and @csoulios kindly created a Rally track to help us benchmark the queries. We used a bare metal environment (separate machine for loaddriver and one dedicated node for Elasticsearch). Elasticsearch uses 1G heap, running on Java-12 and in between every experiment we cleaned up caches, slab objects, compacted memory and trimmed SSD disks as shown in [1]. In this gist I have the comparisons for the
The results from another identical experiment are here and again we see comparable performance between 6.8 and 7.3 for example for the 50th percentile latency difference is:
Do you think you could give it a go yourself and see if it reproduces for you? Running it should be fairly easy, just follow the instructions in the README file of the custom Rally track. Looking forward to your feedback. [1]:
|
I'll have a look this evening. Stay tuned. |
I'm back with the results... and of course, I could NOT reproduce the issue with your rally challenge... Same conclusion as yours: 7.3 is faster! I looked at the differences between your test and ours and noticed that our query for 7.3 is slightly different: it makes use of
To make sure this had an impact, I quickly made the following tests through Kibana's dev console targetting two different test clusters running respectively 6.8.2 and 7.3.0. The timings were:
The most important thing here is not the difference between 6.8 and 7.3 as I executed the queries only a couple of times manually and took only the best result. I made the test with the 4 datasets (gaussian and uniform distributions). I got the same ranking as in my forum post: gaussian is faster, followed by uniform-sameday, 1s and 10s. However, I don't believe anymore that the date distribution in the dataset is the issue (although it has obviously some impact on the performance). I'm really sorry I missed that initially. I pushed you in the wrong direction with an incomplete and inaccurate report. Sorry about that... For your information, Kibana is using a |
Hm interesting, the fixed/calendar interval change theoretically should have been equivalent to the now-deprecated One question though @brenuart. The interval
When you did the fixed vs calendar test did you use an interval like |
Oh god - sorry once again. Wrong copy/paste in the example query above. PS: I updated the query above to avoid confusion. |
I made some tests on my side with the various types of interval. on the 7.3: for a
for a
for a
on ES6.8
for a
for a
So there really seem to be a difference in the way the date histogram handles the fixed_interval in 7.3 |
We have optimizations for the calendar_interval if the timezone is UTC. This was added because we spotted the slowdown in our benchmark: However we don't benchmark fixed_interval and they suffer from the same issue. We don't special case UTC when using a fixed interval while Joda uses a fast path that doesn't require any conversion. @spinscale could we just change the fixed interval rounding to do:
? |
Glad you could reproduce the issue! |
Yes my proposal is too simplistic, we should be able to tackle all fixed timezone (UTC, +1, -3, ...).
This should handle the case where a non-fixed timezone is rewritten into a fixed timezone ? |
I also wonder if we should apply the same logic to the calendar interval since only UTC timezone is optimized there so the same slowdown would happen with fixed time zones. |
@jimczi after a quick look through the code that sounds good to me |
Fixes #45702 with date_histogram aggregation when using fixed_interval. Optimization has been implemented for both fixed and calendar intervals
Thanks for the fix. |
Is there any way to circumvent the issue temporarily while we wait for the 7.4.1 release? We are looking to do an upgrade this weekend and this bug makes the Kibana discover tab almost unusable for long date ranges. Would love any suggestions here. |
@brhardwick, try using You can find more information here: |
…6670) Fixes elastic#45702 with date_histogram aggregation when using fixed_interval. Optimization has been implemented for both fixed and calendar intervals
…6670) Fixes elastic#45702 with date_histogram aggregation when using fixed_interval. Optimization has been implemented for both fixed and calendar intervals
Fixes #45702 with date_histogram aggregation when using fixed_interval. Optimization has been implemented for both fixed and calendar intervals
Fixes #45702 with date_histogram aggregation when using fixed_interval. Optimization has been implemented for both fixed and calendar intervals
We deployed the 7.4 and the problem is fixed: the query now takes the same time with fixed_interval than with calendar_interval. |
Same here. Thx a lot for the quick response. |
Elasticsearch version (
bin/elasticsearch --version
):Version: 7.3.0, Build: default/rpm/de777fa/2019-07-24T18:30:11.767338Z, JVM: 12.0.1
Plugins installed: []
repository-s3
JVM version (
java -version
):JVM shipped with Elasticsearch (rpm)
OS version (
uname -a
if on a Unix-like system):Linux es-hot-03.aegaeon-it.com 4.9.120-xxxx-std-ipv6-64 #327490 SMP Thu Aug 16 10:11:35 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
We noticed a severe degradation of
date_histogram
performance after upgrading from Elasticsearch6.8.2
to the latest7.3.0
.The table below shows the execution time of the same query on the same cluster before and after the upgrade. The first execution happens after caches are cleared as follows:
POST _cache/clear
/var/lib/elasticsearch
do/dev/null
(that's the best we found...)Timings are as follows:
As you notice
7.3.0
is at least twice slower than6.8.2
.The dataset is made of about
650m
documents spread evenly between 15 indexes. Indexes have 3 shards each and no replica. The cluster is made of three nodes, each with 7Gb RAM (of each 4Gb is allocated to the heap) and 2 vCPUs at 3.1Ghz. There is no activity on the cluster besides this test query.The query targets only half of the documents (using a date_range) and builds a
date_histogram
with buckets of3h
. This query is actually what KIbana's Discover panel will do...We are very surprised by this drop of performance...
Did we forgot to change some configurations parameters when doing the upgrade or is it a regression in ES itself ?
The text was updated successfully, but these errors were encountered: