[Metrics UI] Drop partial buckets from ALL Metrics UI queries #104784

simianhacker · 2021-07-07T23:47:31Z

Summary

This PR fixes #104699 by adding functions to drop partial buckets, including "micro buckets". This PR also changes the resolution of the histogram offset from seconds to milliseconds; in turn this will further eliminate the possibility of "micro buckets".

Checklist

Unit or functional tests were updated or added to match the most common scenarios

- Change offset calculation to millisecond percission - Change dropLastBucket to dropPartialBuckets - Impliment partial bucket filter - Adding partial bucket filter to metric threshold alerts

spalger · 2021-07-08T20:46:41Z

jenkins, test this

(had to abort for Jenkins upgrade)

…offset-ms-percission-trin-buckets

elasticmachine · 2021-07-13T13:51:44Z

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

…offset-ms-percission-trin-buckets

estermv

I took a first quick look, I have some small questions, but I think I need a better understanding of the problem we are trying to solve.

I don't really understand how this scenario can be possible. Shouldn't the query limit the documents that are returned? If in the query I filter documents "lt": "2021-07-07T19:37:00.000Z" it seems to me really unintuitive that are returned documents >=2021-07-07T19:37:00.000Z.

Also, from what I understood reviewing the PR, this could be a problem of precision, would it work if, instead of putting the upper limit to 2021-07-07T19:37:00.000Z, we put something like 2021-07-07T19:36:59.999Z?
This came to my mind looking at the example you put in the linked issue. In that case, we get an extra bucket with documents >=2021-07-07T19:37:00.000Z to <=2021-07-07T19:38:00.000Z, because documents like 2021-07-07T19:37:xx.xxxZ goes to this extra bucket. Since we get rid of this data, setting a slightly lower limit, shouldn't return any 2021-07-07T19:37:xx.xxxZ documents. Does this make any sense? I could be missing a million things here 😅

x-pack/plugins/infra/server/lib/alerting/metric_threshold/lib/evaluate_alert.ts

estermv · 2021-07-15T11:30:11Z

x-pack/plugins/infra/server/lib/alerting/metric_threshold/lib/metric_query.test.ts

-      undefined,
-      timerange
-    );
-    test('by rounding timestamps to the nearest timeUnit', () => {


Not sure if this was removed on purpose. If it's on purpose I think the describe that contains it, can also be removed as there is no other test.

Whoops... good catch on the describe. This part of the function was moved up a level.

simianhacker · 2021-07-15T14:58:59Z

@estermv Good questions, I'm not sure changing the time range will help. Here is my attempt at distilling what's happening and what this change is trying to mitigate.

When date_histogram creates the buckets it will round the first bucket to the nearest interval. Let's look at an example:

POST metricbeat-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": "2021-07-15T14:14:14.771Z",
              "lte": "2021-07-15T14:19:14.771Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "timeseries": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1m",
        "extended_bounds": {
          "min": "2021-07-15T14:14:14.771Z",
          "max": "2021-07-15T14:19:14.771Z"
        }
      }
    }
  }
}

This produces this response:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "timeseries" : {
      "buckets" : [
        {
          "key_as_string" : "2021-07-15T14:14:00.000Z",
          "key" : 1626358440000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:15:00.000Z",
          "key" : 1626358500000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:16:00.000Z",
          "key" : 1626358560000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:17:00.000Z",
          "key" : 1626358620000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:18:00.000Z",
          "key" : 1626358680000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:19:00.000Z",
          "key" : 1626358740000,
          "doc_count" : 0
        }
      ]
    }
  }
}

If the from (or gte) timestamp is 2021-07-15T14:14:14.771Z then the first bucket will be 2021-07-15T14:14:00.000Z which actually covers an extra 45229ms of data; we call this a partial bucket because it starts before the actual time range. If we took an average of this bucket, it might only contain 2 events where a normal bucket might contain 6 events, so the average is not representative of the whole bucket so we should toss it.

The same thing happens on the other end, the last bucket's starting timestamp is 2021-07-15T14:19:00.000Z and the ending timestamp would be 2021-07-15T14:20:00.000Z which also overlaps the end of the time range. To try and correct for this we add an offset of -45229ms to the date_histogram to shift the buckets to try and get a "whole" bucket for the last value.

This PR changes the resolution of the offset to milliseconds which should eliminate the need to prune the last bucket. There is still an edge case where sometimes instead of returning 6 buckets for a 5 minute time range, it returns 7 due to how the rounding works (I think).

For most scenarios, the changing the offset to have millisecond resolution fixes the issue with the last bucket being partial. The dropPartialBuckets filter will trim off the first bucket for us and will trim off the last backet when we hit the edge case. The dropPartialBuckets filter is really our safety net for edge cases and weirdness.

estermv · 2021-07-16T11:06:40Z

x-pack/plugins/infra/server/lib/alerting/metric_threshold/metric_threshold_executor.test.ts

+  const from = params?.body.query.bool.filter[0]?.range['@timestamp'].gte;
+  if (params.index === 'alternatebeat-*') return mocks.changedSourceIdResponse(from);


Just commenting here, as I think this is totally unrelated to the PR. I would like to understand the reason why we have this logic here (the whole implementation of the mock). This is a place for potential bugs and makes it hard to understand what each test is testing.

estermv

LGTM 🚀

@simianhacker thanks for the detailed explanation, it helped a lot 🙌

The only thing I notice is that in the alerts preview charts we display one less column but I don't expect anyone to count them 😅

Please, don't forget to remove the describe mentioned here https://github.com/elastic/kibana/pull/104784/files#r670377393 before merging it!

…offset-ms-percission-trin-buckets

simianhacker · 2021-07-19T16:59:07Z

@elasticmachine merge upstream

…kets

kibanamachine · 2021-07-19T19:10:05Z

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`infra`	1.7MB	1.7MB	+8.0B

History

💚 Build #139107 succeeded 1be7c48
💚 Build #138636 succeeded bb6e240
💚 Build #138036 succeeded 2e9fee7
💔 Build #137995 failed 9624e5a
💔 Build #137978 failed b085acf

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

…c#104784) * [Metrics UI] Change dropLastBucket to dropPartialBuckets - Change offset calculation to millisecond percission - Change dropLastBucket to dropPartialBuckets - Impliment partial bucket filter - Adding partial bucket filter to metric threshold alerts * Cleaning up getElasticsearchMetricQuery * Change timestamp to from_as_string to align to how date_histgram works * Fixing tests to be more realistic * fixing types; removing extra imports * Fixing new mock data to work with previews * Removing value checks since they don't really provide much value * Removing test for refactored functinality * Change value to match millisecond resolution * Fixing values for new partial bucket scheme * removing unused var * Fixing lookback since drops more than last buckets * Changing results count * fixing more tests * Removing empty describe Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

… (#106153) * [Metrics UI] Change dropLastBucket to dropPartialBuckets - Change offset calculation to millisecond percission - Change dropLastBucket to dropPartialBuckets - Impliment partial bucket filter - Adding partial bucket filter to metric threshold alerts * Cleaning up getElasticsearchMetricQuery * Change timestamp to from_as_string to align to how date_histgram works * Fixing tests to be more realistic * fixing types; removing extra imports * Fixing new mock data to work with previews * Removing value checks since they don't really provide much value * Removing test for refactored functinality * Change value to match millisecond resolution * Fixing values for new partial bucket scheme * removing unused var * Fixing lookback since drops more than last buckets * Changing results count * fixing more tests * Removing empty describe Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

…y-show-migrate-to-authzd-users * 'master' of github.com:elastic/kibana: (187 commits) Space management page UX improvements (elastic#100448) [Reporting] Unskip flaky test when downloading CSV with "no data" (elastic#105252) Update dependency @elastic/charts to v33 (master) (elastic#105633) [Observability RAC] Improve alerts table columns (elastic#105446) Introduce `preboot` lifecycle stage (elastic#103636) [Security Solution] Invalid kql query timeline refresh bug (elastic#105525) skip flaky suite (elastic#106121) [Security Solution][Endpoint] Fix UI inconsistency between isolation forms and remove display of Pending isolation statuses (elastic#106118) docs: APM RUM Source map API (elastic#105332) [CTI] Adds indicator match rule improvements (elastic#97310) [Security Solution] update text for Isolation action submissions (elastic#105956) EP Meta Telemetry Perf (elastic#104396) [Metrics UI] Drop partial buckets from ALL Metrics UI queries (elastic#104784) Remove beta admonitions for Fleet docs (elastic#106010) [Observability RAC] Remove indexing of rule evaluation documents (elastic#104970) Parameterize migration test for kibana version (elastic#105417) [Alerting] Allow rule to execute if the value is 0 and that mets the condition (elastic#105626) [ML] Fix Index data visualizer sometimes shows wrong doc count for saved searches (elastic#106007) [Security Solution] UX fixes for Policy page and Case Host Isolation comment (elastic#106027) [Security Solution]Memory protection configuration card for policies integration. (elastic#101365) ... # Conflicts: # x-pack/plugins/reporting/public/management/report_listing.test.tsx # x-pack/plugins/reporting/public/management/report_listing.tsx

elastic#104784

…ations inside of metrics threshold alerts (#106947) (#107167) * [Metrics UI] Correct inaccurate offsetting for non-rate aggregations inside of metrics threshold alerts (#106947) * Don't skip last bucket for most aggs * Allow alerting on partial buckets for certain aggs * Fix test, PR feedback, and some comments * Remove all offset logic for date_range aggs * Remove code comment * Add delivery delay * Fix the date range for query * Add TODO * Port over changes from PR on master #104784 * Add missing change

…c#104784) * [Metrics UI] Change dropLastBucket to dropPartialBuckets - Change offset calculation to millisecond percission - Change dropLastBucket to dropPartialBuckets - Impliment partial bucket filter - Adding partial bucket filter to metric threshold alerts * Cleaning up getElasticsearchMetricQuery * Change timestamp to from_as_string to align to how date_histgram works * Fixing tests to be more realistic * fixing types; removing extra imports * Fixing new mock data to work with previews * Removing value checks since they don't really provide much value * Removing test for refactored functinality * Change value to match millisecond resolution * Fixing values for new partial bucket scheme * removing unused var * Fixing lookback since drops more than last buckets * Changing results count * fixing more tests * Removing empty describe Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> # Conflicts: # x-pack/plugins/infra/server/lib/alerting/metric_threshold/lib/metric_query.ts

…104784) (#107838) * [Metrics UI] Drop partial buckets from ALL Metrics UI queries (#104784) * [Metrics UI] Change dropLastBucket to dropPartialBuckets - Change offset calculation to millisecond percission - Change dropLastBucket to dropPartialBuckets - Impliment partial bucket filter - Adding partial bucket filter to metric threshold alerts * Cleaning up getElasticsearchMetricQuery * Change timestamp to from_as_string to align to how date_histgram works * Fixing tests to be more realistic * fixing types; removing extra imports * Fixing new mock data to work with previews * Removing value checks since they don't really provide much value * Removing test for refactored functinality * Change value to match millisecond resolution * Fixing values for new partial bucket scheme * removing unused var * Fixing lookback since drops more than last buckets * Changing results count * fixing more tests * Removing empty describe Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> # Conflicts: # x-pack/plugins/infra/server/lib/alerting/metric_threshold/lib/metric_query.ts * Fix bad merge Co-authored-by: Chris Cowan <chris@chriscowan.us>

simianhacker added 2 commits July 7, 2021 17:11

[Metrics UI] Change dropLastBucket to dropPartialBuckets

999eed4

- Change offset calculation to millisecond percission - Change dropLastBucket to dropPartialBuckets - Impliment partial bucket filter - Adding partial bucket filter to metric threshold alerts

Cleaning up getElasticsearchMetricQuery

ff598db

simianhacker added release_note:fix Feature:Metrics UI Metrics UI feature v8.0.0 Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.15.0 labels Jul 7, 2021

simianhacker added 2 commits July 8, 2021 13:06

Change timestamp to from_as_string to align to how date_histgram works

a708973

Fixing tests to be more realistic

e493f0b

simianhacker added 12 commits July 8, 2021 15:51

fixing types; removing extra imports

a127bda

Fixing new mock data to work with previews

4d40c1e

Removing value checks since they don't really provide much value

cfa1236

Removing test for refactored functinality

1f04f75

Change value to match millisecond resolution

bba8782

Fixing values for new partial bucket scheme

5d0da2e

removing unused var

c89fdf6

Fixing lookback since drops more than last buckets

509403c

Changing results count

9ea49e6

Merge branch 'master' of github.com:elastic/kibana into issue-104699-…

b085acf

…offset-ms-percission-trin-buckets

Merge branch 'master' of github.com:elastic/kibana into issue-104699-…

9624e5a

…offset-ms-percission-trin-buckets

fixing more tests

2e9fee7

simianhacker marked this pull request as ready for review July 13, 2021 13:51

simianhacker requested a review from a team as a code owner July 13, 2021 13:51

Merge branch 'master' of github.com:elastic/kibana into issue-104699-…

bb6e240

…offset-ms-percission-trin-buckets

estermv self-requested a review July 15, 2021 10:29

estermv reviewed Jul 15, 2021

View reviewed changes

estermv reviewed Jul 16, 2021

View reviewed changes

estermv approved these changes Jul 16, 2021

View reviewed changes

simianhacker added 2 commits July 16, 2021 09:27

Removing empty describe

14bc50c

Merge branch 'master' of github.com:elastic/kibana into issue-104699-…

1be7c48

…offset-ms-percission-trin-buckets

Merge branch 'master' into issue-104699-offset-ms-percission-trin-buc…

508a94f

…kets

simianhacker merged commit cb7187f into elastic:master Jul 19, 2021

simianhacker mentioned this pull request Jul 19, 2021

[7.x] [Metrics UI] Drop partial buckets from ALL Metrics UI queries (#104784) #106153

Merged

phillipb added a commit to phillipb/kibana that referenced this pull request Jul 29, 2021

Port over changes from PR on master

4de206c

elastic#104784

This was referenced Aug 6, 2021

[7.14] [Metrics UI] Drop partial buckets from ALL Metrics UI queries (#104784) #107838

Merged

[7.14] [Metrics UI] [REDO] Correct inaccurate offsetting for non-rate aggregations inside of metrics threshold alerts (#106947) #107820

Merged

simianhacker deleted the issue-104699-offset-ms-percission-trin-buckets branch April 17, 2024 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics UI] Drop partial buckets from ALL Metrics UI queries #104784

[Metrics UI] Drop partial buckets from ALL Metrics UI queries #104784

simianhacker commented Jul 7, 2021 •

edited

Loading

spalger commented Jul 8, 2021

elasticmachine commented Jul 13, 2021

estermv left a comment

estermv Jul 15, 2021

simianhacker Jul 15, 2021

simianhacker commented Jul 15, 2021 •

edited

Loading

estermv Jul 16, 2021

estermv left a comment

simianhacker commented Jul 19, 2021

kibanamachine commented Jul 19, 2021

		const from = params?.body.query.bool.filter[0]?.range['@timestamp'].gte;
		if (params.index === 'alternatebeat-*') return mocks.changedSourceIdResponse(from);

[Metrics UI] Drop partial buckets from ALL Metrics UI queries #104784

[Metrics UI] Drop partial buckets from ALL Metrics UI queries #104784

Conversation

simianhacker commented Jul 7, 2021 • edited Loading

Summary

Checklist

spalger commented Jul 8, 2021

elasticmachine commented Jul 13, 2021

estermv left a comment

Choose a reason for hiding this comment

estermv Jul 15, 2021

Choose a reason for hiding this comment

simianhacker Jul 15, 2021

Choose a reason for hiding this comment

simianhacker commented Jul 15, 2021 • edited Loading

estermv Jul 16, 2021

Choose a reason for hiding this comment

estermv left a comment

Choose a reason for hiding this comment

simianhacker commented Jul 19, 2021

kibanamachine commented Jul 19, 2021

💚 Build Succeeded

Metrics [docs]

Async chunks

History

simianhacker commented Jul 7, 2021 •

edited

Loading

simianhacker commented Jul 15, 2021 •

edited

Loading