-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate date histogram slowdowns with non-fixed timezones. #30534
Mitigate date histogram slowdowns with non-fixed timezones. #30534
Conversation
Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower than histograms on fixed timezones in elastic#28727. This change mitigates the issue by using a fixed time zone instead when shard data doesn't cross a transition so that all timestamps share the same fixed offset. This should be a common case with daily indices. NOTE: Rewriting the aggregation doesn't work since the timezone is then also used on the coordinating node to create empty buckets, which might be out of the range of data that exists on the shard. NOTE: In order to be able to get a shard context in the tests, I reused code from the base query test case by creating a new parent test case for both queries and aggregations: `AbstractBuilderTestCase`. Mitigates elastic#28727
Pinging @elastic/es-search-aggs |
if (ft != null && reader != null) { | ||
Long anyInstant = null; | ||
final IndexFieldData<?> fieldData = context.getForField(ft); | ||
if (fieldData instanceof IndexNumericFieldData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not be guaranteed to be an IndexNumericFieldData
? if so maybe this should either be an assert or we should throw an exception if this condition is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I had it implemented in rewrite() initially where it felt wrong since validation happens afterwards, but here the values source is already resolved so an exception would have been thrown already if the histogram ran against a non-numeric field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments
assertSame(tz, builder.rewriteTimeZone(shardContextThatCrosses)); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have a test that makes sure fixed time zones are not changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above assertions check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I had missed that, you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@colings86 I had to do some changes because of test failures in case all values were between the same transitions, but some rounded values were not. Could you have another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz good catch on the case where the start date and transition are in the same bucket. The solution LGTM
Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower than histograms on fixed timezones in #28727. This change mitigates the issue by using a fixed time zone instead when shard data doesn't cross a transition so that all timestamps share the same fixed offset. This should be a common case with daily indices. NOTE: Rewriting the aggregation doesn't work since the timezone is then also used on the coordinating node to create empty buckets, which might be out of the range of data that exists on the shard. NOTE: In order to be able to get a shard context in the tests, I reused code from the base query test case by creating a new parent test case for both queries and aggregations: `AbstractBuilderTestCase`. Mitigates #28727
…ngs-to-true * elastic/master: (34 commits) Test: increase search logging for LicensingTests Adjust serialization version in IndicesOptions [TEST] Fix compilation Remove version argument in RangeFieldType (elastic#30411) Remove unused DirectoryUtils class. (elastic#30582) Mitigate date histogram slowdowns with non-fixed timezones. (elastic#30534) Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (elastic#29594) Removes AwaitsFix on IndicesOptionsTests Template upgrades should happen in a system context (elastic#30621) Fix bug in BucketMetrics path traversal (elastic#30632) Fixes IndiceOptionsTests to serialise correctly (elastic#30644) S3 repo plugin populate SettingsFilter (elastic#30652) mute IndicesOptionsTests.testSerialization Rest High Level client: Add List Tasks (elastic#29546) Refactors ClientHelper to combine header logic (elastic#30620) [ML] Wait for ML indices in rolling upgrade tests (elastic#30615) Watcher: Ensure secrets integration tests also run triggered watch (elastic#30478) Move allocation awareness attributes to list setting (elastic#30626) [Docs] Update code snippet in has-child-query.asciidoc (elastic#30510) Replace custom reloadable Key/TrustManager (elastic#30509) ...
* es/master: (74 commits) Preserve REST client auth despite 401 response (#30558) [test] packaging: add windows boxes (#30402) Make xpack modules instead of a meta plugin (#30589) Mute ShrinkIndexIT [ML] DeleteExpiredDataAction should use client with origin (#30646) Reindex: Fixed typo in assertion failure message (#30619) [DOCS] Fixes list of unconverted snippets in build.gradle [DOCS] Reorganizes RBAC documentation SQL: Remove dependency for server's version from JDBC driver (#30631) Test: increase search logging for LicensingTests Adjust serialization version in IndicesOptions [TEST] Fix compilation Remove version argument in RangeFieldType (#30411) Remove unused DirectoryUtils class. (#30582) Mitigate date histogram slowdowns with non-fixed timezones. (#30534) Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (#29594) Removes AwaitsFix on IndicesOptionsTests Template upgrades should happen in a system context (#30621) Fix bug in BucketMetrics path traversal (#30632) Fixes IndiceOptionsTests to serialise correctly (#30644) ...
* es/6.x: (44 commits) SQL: Remove dependency for server's version from JDBC driver (#30631) Make xpack modules instead of a meta plugin (#30589) Security: Remove SecurityLifecycleService (#30526) Build: Add task interdependencies for ssl configuration (#30633) Mute ShrinkIndexIT [ML] DeleteExpiredDataAction should use client with origin (#30646) Reindex: Fixed typo in assertion failure message (#30619) [DOCS] Fixes list of unconverted snippets in build.gradle Use readFully() to read bytes from CipherInputStream (#30640) Add Create Repository High Level REST API (#30501) [DOCS] Reorganizes RBAC documentation Test: increase search logging for LicensingTests Delay _uid field data deprecation warning (#30651) Deprecate Empty Templates (#30194) Remove unused DirectoryUtils class. (#30582) Mitigate date histogram slowdowns with non-fixed timezones. (#30534) [TEST] Remove AwaitsFix in IndicesOptionsTests#testSerialization S3 repo plugin populates SettingsFilter (#30652) Rest High Level client: Add List Tasks (#29546) Fixes IndiceOptionsTests to serialise correctly (#30644) ...
* es/ccr: (75 commits) Preserve REST client auth despite 401 response (elastic#30558) [test] packaging: add windows boxes (elastic#30402) Make xpack modules instead of a meta plugin (elastic#30589) Mute ShrinkIndexIT [ML] DeleteExpiredDataAction should use client with origin (elastic#30646) Reindex: Fixed typo in assertion failure message (elastic#30619) [DOCS] Fixes list of unconverted snippets in build.gradle [DOCS] Reorganizes RBAC documentation SQL: Remove dependency for server's version from JDBC driver (elastic#30631) Test: increase search logging for LicensingTests Adjust serialization version in IndicesOptions [TEST] Fix compilation Remove version argument in RangeFieldType (elastic#30411) Remove unused DirectoryUtils class. (elastic#30582) Mitigate date histogram slowdowns with non-fixed timezones. (elastic#30534) Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (elastic#29594) Removes AwaitsFix on IndicesOptionsTests Template upgrades should happen in a system context (elastic#30621) Fix bug in BucketMetrics path traversal (elastic#30632) Fixes IndiceOptionsTests to serialise correctly (elastic#30644) ...
…30534) Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower than histograms on fixed timezones in elastic#28727. This change mitigates the issue by using a fixed time zone instead when shard data doesn't cross a transition so that all timestamps share the same fixed offset. This should be a common case with daily indices. NOTE: Rewriting the aggregation doesn't work since the timezone is then also used on the coordinating node to create empty buckets, which might be out of the range of data that exists on the shard. NOTE: In order to be able to get a shard context in the tests, I reused code from the base query test case by creating a new parent test case for both queries and aggregations: `AbstractBuilderTestCase`. Mitigates elastic#28727
Date histograms on non-fixed timezones such as
Europe/Paris
proved much slowerthan histograms on fixed timezones in #28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.
NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.
NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations:
AbstractBuilderTestCase
.Mitigates #28727