Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rounding optimization for fixed offset timezones #46670

Merged
merged 4 commits into from
Sep 16, 2019

Conversation

csoulios
Copy link
Contributor

Fixes #45702 with date_histogram aggregation when using fixed_interval.

Implement code change discussed at:
#45702 (comment)

Optimization has been implemented for both fixed and calendar intervals


TimeUnitRounding(DateTimeUnit unit, ZoneId timeZone) {
this.unit = unit;
this.timeZone = timeZone;
this.unitRoundsToMidnight = this.unit.field.getBaseUnit().getDuration().toMillis() > 3600000L;
this.isUtcTimeZone = timeZone.normalized().equals(ZoneOffset.UTC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember adding this because of performance problems with timeZone.getRules().getOffset(Instant.EPOCH) in the UTC case. did you benchmark that this is not the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the performance differ ? UTC is a fixed timezone so this change just englobes more cases, we're just changing the way we check if we can apply a fixed offset ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood your comment sorry, there is a TODO in the java code saying that getRules should be optimized so +1 to resolve the offset once in the ctr and use it to determine whether the fast rounding should be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous implementation did not call timeZone.getRules().getOffset(Instant.EPOCH).getTotalSeconds() in the UTC case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to resolve the fixeed offset here and call timeZone.getRules().getOffset(Instant.EPOCH) only once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep agreed, this should be changed

// This works as long as the tz offset doesn't change. It is worth getting this case out of the way first,
// as the calculations for fixing things near to offset changes are a little expensive and unnecessary
// in the common case of working with fixed offset timezones (such as UTC).
if (timeZone.getRules().isFixedOffset() == true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above. can you benchmark this against the UTC case and see if the performance differs?

Copy link
Contributor Author

@csoulios csoulios Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the results of the esrally benchmark used for reproducing the bug.

Dataset is the timestamps-gaussian-sameday
Query is:

      "aggs": {
        "2": {
          "date_histogram": {
            "field": "@timestamp",
            "fixed_interval": "24h",
            "min_doc_count": 1
          }
        }
      },
      "query": {
        "match_all": {}
      }

baseline: master branch (8.x)
contender: branch with fix

| Metric                        | Task                     | Baseline    |   Contender |     Diff |   Unit |
|-------------------------------+--------------------------+-------------+-------------+----------+--------|
|             Median Throughput | query-agg-date_histogram |   0.0896863 |    0.202938 |  0.11325 |  ops/s |
|                Max Throughput | query-agg-date_histogram |   0.0903743 |    0.207398 |  0.11702 |  ops/s |
|       50th percentile latency | query-agg-date_histogram |      161809 |     3268.02 |  -158541 |     ms |
|       90th percentile latency | query-agg-date_histogram |      230431 |     3435.84 |  -226995 |     ms |
|      100th percentile latency | query-agg-date_histogram |      247591 |     3502.15 |  -244089 |     ms |
|  50th percentile service time | query-agg-date_histogram |     10808.3 |     3263.49 | -7544.81 |     ms |
|  90th percentile service time | query-agg-date_histogram |     11114.6 |     3431.34 | -7683.28 |     ms |
| 100th percentile service time | query-agg-date_histogram |     11688.9 |     3499.13 | -8189.74 |     ms |

I am going to work on the micro benchmarks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also compare the patch with the check for the UTC timezone in the ctor and without?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a regression and we're just copying what we used to do with Joda. Let's setup a quick micro benchmarks for this if you want but the most important part here is correctness ;).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we should resolve the fixed offset once in the ctr

Copy link
Contributor Author

@csoulios csoulios Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to constructor as discussed. Benchmark results improved by ~12%.

See results below (contender is the newer revision):

|-------------------------------+--------------------------+-------------+-------------+----------+--------|
| Metric                        | Task                     | Baseline    |  Contender  |   Diff   |   Unit |
|-------------------------------+--------------------------+-------------+-------------+----------+--------|
|       50th percentile latency | query-agg-date_histogram |     2892.92 |     2539.31 | -353.609 |     ms |
|       90th percentile latency | query-agg-date_histogram |     3086.98 |     2762.28 | -324.694 |     ms |
|      100th percentile latency | query-agg-date_histogram |     3182.51 |     2865.38 | -317.137 |     ms |
|  50th percentile service time | query-agg-date_histogram |     2885.19 |     2532.11 | -353.078 |     ms |
|  90th percentile service time | query-agg-date_histogram |     3078.69 |     2755.39 | -323.297 |     ms |
| 100th percentile service time | query-agg-date_histogram |     3174.46 |     2857.74 | -316.713 |     ms |
|-------------------------------+--------------------------+-------------+-------------+----------+--------|

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

Copy link
Contributor

@spinscale spinscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am unsure about the performance implications, maybe resurrecting the RoundingBenchmark for testing from the 7.x branch makes sense here (or just test on the 7.x branch then port)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csoulios , the changes look good, I left some comments


TimeUnitRounding(DateTimeUnit unit, ZoneId timeZone) {
this.unit = unit;
this.timeZone = timeZone;
this.unitRoundsToMidnight = this.unit.field.getBaseUnit().getDuration().toMillis() > 3600000L;
this.isUtcTimeZone = timeZone.normalized().equals(ZoneOffset.UTC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to resolve the fixeed offset here and call timeZone.getRules().getOffset(Instant.EPOCH) only once.


TimeUnitRounding(DateTimeUnit unit, ZoneId timeZone) {
this.unit = unit;
this.timeZone = timeZone;
this.unitRoundsToMidnight = this.unit.field.getBaseUnit().getDuration().toMillis() > 3600000L;
this.isUtcTimeZone = timeZone.normalized().equals(ZoneOffset.UTC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep agreed, this should be changed

// This works as long as the tz offset doesn't change. It is worth getting this case out of the way first,
// as the calculations for fixing things near to offset changes are a little expensive and unnecessary
// in the common case of working with fixed offset timezones (such as UTC).
if (timeZone.getRules().isFixedOffset() == true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we should resolve the fixed offset once in the ctr

@csoulios csoulios changed the title Implement rounding optimization for fixed offset timezones. Implement rounding optimization for fixed offset timezones Sep 13, 2019
@csoulios
Copy link
Contributor Author

csoulios commented Sep 13, 2019

I ran ES Rally benchmark (https://github.com/csoulios/date_histogram-benchmark) for the following scenarios (on my laptop):

  • ES 6.8.0, UTC tz
  • ES 8.x (with fix), UTC tz
  • ES 8.x (with fix), -01:00 tz
  • ES 8.x (with fix), Europe/Berlin tz

All produced similar results:

|-------------------------------+--------------------------+-------------+-----------+--------------+---------------------|
| Metric                        | Task                     | 6.8.0 (UTC) | 8.x (UTC) | 8.x (-01:00) | 8.x (Europe/Berlin) |
|-------------------------------+--------------------------+-------------+-----------+--------------+---------------------|
|      50th percentile latency  | query-agg-date_histogram |     2570.75 |   2539.31 |      2687.07 |             2583.18 |  
|      90th percentile latency  | query-agg-date_histogram |     2666.16 |   2762.28 |      2764.50 |             2674.80 |
|     100th percentile latency  | query-agg-date_histogram |     2780.84 |   2865.38 |      2786.68 |             2898.41 | 
|-------------------------------+--------------------------+-------------+-----------+--------------+---------------------|

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @csoulios

Copy link
Contributor

@spinscale spinscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few minor nits, feel free to ignore them

LGTM otherwise, thanks for fixing!


private final DateTimeUnit unit;
private final ZoneId timeZone;
private final boolean unitRoundsToMidnight;
private final boolean isUtcTimeZone;
/** For fixed offset timezones, this is the offset in milliseconds, otherwise TZ_OFFSET_NON_FIXED */
private final long fixedOffset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignoreable nit: maybe name this fixedOffsetMs or something to indicate its contents

@@ -218,17 +218,20 @@ public Rounding build() {
static class TimeUnitRounding extends Rounding {

static final byte ID = 1;
static final long TZ_OFFSET_NON_FIXED = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment nit: maybe include a small comment what -1 indicates, and why this can never be -1 in a regular offset

@@ -432,20 +436,25 @@ public String toString() {
}

static final byte ID = 2;
static final long TZ_OFFSET_NON_FIXED = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: would one instance of this be enough?

@@ -218,17 +218,20 @@ public Rounding build() {
static class TimeUnitRounding extends Rounding {

static final byte ID = 1;
static final long TZ_OFFSET_NON_FIXED = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: can be private?

@matriv
Copy link
Contributor

matriv commented Sep 16, 2019

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/default-distro

@csoulios csoulios merged commit f7a34c8 into elastic:master Sep 16, 2019
@csoulios csoulios deleted the fixed_interval_rounding branch September 16, 2019 20:17
@colings86 colings86 added v7.4.0 and removed v7.4.1 labels Sep 17, 2019
@colings86 colings86 added v7.4.1 and removed v7.4.0 labels Sep 17, 2019
csoulios added a commit to csoulios/elasticsearch that referenced this pull request Sep 18, 2019
…6670)

Fixes elastic#45702 with date_histogram aggregation when using fixed_interval.
Optimization has been implemented for both fixed and calendar intervals
csoulios added a commit to csoulios/elasticsearch that referenced this pull request Sep 18, 2019
…6670)

Fixes elastic#45702 with date_histogram aggregation when using fixed_interval.
Optimization has been implemented for both fixed and calendar intervals
@colings86 colings86 added v7.4.0 and removed v7.4.1 labels Sep 20, 2019
@jimczi jimczi added v7.5.0 and removed v7.5.0 labels Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow date_histogram after upgrading to 7.3.0
7 participants