Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics UI] Drop partial buckets from ALL Metrics UI queries #104784

Conversation

simianhacker
Copy link
Member

@simianhacker simianhacker commented Jul 7, 2021

Summary

This PR fixes #104699 by adding functions to drop partial buckets, including "micro buckets". This PR also changes the resolution of the histogram offset from seconds to milliseconds; in turn this will further eliminate the possibility of "micro buckets".

Checklist

- Change offset calculation to millisecond percission
- Change dropLastBucket to dropPartialBuckets
- Impliment partial bucket filter
- Adding partial bucket filter to metric threshold alerts
@simianhacker simianhacker added release_note:fix Feature:Metrics UI Metrics UI feature v8.0.0 Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.15.0 labels Jul 7, 2021
@spalger
Copy link
Contributor

spalger commented Jul 8, 2021

jenkins, test this

(had to abort for Jenkins upgrade)

@simianhacker simianhacker marked this pull request as ready for review July 13, 2021 13:51
@simianhacker simianhacker requested a review from a team as a code owner July 13, 2021 13:51
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@estermv estermv self-requested a review July 15, 2021 10:29
Copy link
Contributor

@estermv estermv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first quick look, I have some small questions, but I think I need a better understanding of the problem we are trying to solve.

I don't really understand how this scenario can be possible. Shouldn't the query limit the documents that are returned? If in the query I filter documents "lt": "2021-07-07T19:37:00.000Z" it seems to me really unintuitive that are returned documents >=2021-07-07T19:37:00.000Z.

Also, from what I understood reviewing the PR, this could be a problem of precision, would it work if, instead of putting the upper limit to 2021-07-07T19:37:00.000Z, we put something like 2021-07-07T19:36:59.999Z?
This came to my mind looking at the example you put in the linked issue. In that case, we get an extra bucket with documents >=2021-07-07T19:37:00.000Z to <=2021-07-07T19:38:00.000Z, because documents like 2021-07-07T19:37:xx.xxxZ goes to this extra bucket. Since we get rid of this data, setting a slightly lower limit, shouldn't return any 2021-07-07T19:37:xx.xxxZ documents. Does this make any sense? I could be missing a million things here 😅

undefined,
timerange
);
test('by rounding timestamps to the nearest timeUnit', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this was removed on purpose. If it's on purpose I think the describe that contains it, can also be removed as there is no other test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops... good catch on the describe. This part of the function was moved up a level.

@simianhacker
Copy link
Member Author

simianhacker commented Jul 15, 2021

@estermv Good questions, I'm not sure changing the time range will help. Here is my attempt at distilling what's happening and what this change is trying to mitigate.

When date_histogram creates the buckets it will round the first bucket to the nearest interval. Let's look at an example:

POST metricbeat-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": "2021-07-15T14:14:14.771Z",
              "lte": "2021-07-15T14:19:14.771Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "timeseries": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1m",
        "extended_bounds": {
          "min": "2021-07-15T14:14:14.771Z",
          "max": "2021-07-15T14:19:14.771Z"
        }
      }
    }
  }
}

This produces this response:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "timeseries" : {
      "buckets" : [
        {
          "key_as_string" : "2021-07-15T14:14:00.000Z",
          "key" : 1626358440000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:15:00.000Z",
          "key" : 1626358500000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:16:00.000Z",
          "key" : 1626358560000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:17:00.000Z",
          "key" : 1626358620000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:18:00.000Z",
          "key" : 1626358680000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2021-07-15T14:19:00.000Z",
          "key" : 1626358740000,
          "doc_count" : 0
        }
      ]
    }
  }
}

If the from (or gte) timestamp is 2021-07-15T14:14:14.771Z then the first bucket will be 2021-07-15T14:14:00.000Z which actually covers an extra 45229ms of data; we call this a partial bucket because it starts before the actual time range. If we took an average of this bucket, it might only contain 2 events where a normal bucket might contain 6 events, so the average is not representative of the whole bucket so we should toss it.

The same thing happens on the other end, the last bucket's starting timestamp is 2021-07-15T14:19:00.000Z and the ending timestamp would be 2021-07-15T14:20:00.000Z which also overlaps the end of the time range. To try and correct for this we add an offset of -45229ms to the date_histogram to shift the buckets to try and get a "whole" bucket for the last value.

This PR changes the resolution of the offset to milliseconds which should eliminate the need to prune the last bucket. There is still an edge case where sometimes instead of returning 6 buckets for a 5 minute time range, it returns 7 due to how the rounding works (I think).

For most scenarios, the changing the offset to have millisecond resolution fixes the issue with the last bucket being partial. The dropPartialBuckets filter will trim off the first bucket for us and will trim off the last backet when we hit the edge case. The dropPartialBuckets filter is really our safety net for edge cases and weirdness.

Comment on lines +459 to +460
const from = params?.body.query.bool.filter[0]?.range['@timestamp'].gte;
if (params.index === 'alternatebeat-*') return mocks.changedSourceIdResponse(from);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commenting here, as I think this is totally unrelated to the PR. I would like to understand the reason why we have this logic here (the whole implementation of the mock). This is a place for potential bugs and makes it hard to understand what each test is testing.

Copy link
Contributor

@estermv estermv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@simianhacker thanks for the detailed explanation, it helped a lot 🙌

The only thing I notice is that in the alerts preview charts we display one less column but I don't expect anyone to count them 😅

Please, don't forget to remove the describe mentioned here https://github.com/elastic/kibana/pull/104784/files#r670377393 before merging it!

@simianhacker
Copy link
Member Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
infra 1.7MB 1.7MB +8.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@simianhacker simianhacker merged commit cb7187f into elastic:master Jul 19, 2021
simianhacker added a commit to simianhacker/kibana that referenced this pull request Jul 19, 2021
…c#104784)

* [Metrics UI] Change dropLastBucket to dropPartialBuckets

- Change offset calculation to millisecond percission
- Change dropLastBucket to dropPartialBuckets
- Impliment partial bucket filter
- Adding partial bucket filter to metric threshold alerts

* Cleaning up getElasticsearchMetricQuery

* Change timestamp to from_as_string to align to how date_histgram works

* Fixing tests to be more realistic

* fixing types; removing extra imports

* Fixing new mock data to work with previews

* Removing value checks since they don't really provide much value

* Removing test for refactored functinality

* Change value to match millisecond resolution

* Fixing values for new partial bucket scheme

* removing unused var

* Fixing lookback since drops more than last buckets

* Changing results count

* fixing more tests

* Removing empty describe

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
simianhacker added a commit that referenced this pull request Jul 19, 2021
… (#106153)

* [Metrics UI] Change dropLastBucket to dropPartialBuckets

- Change offset calculation to millisecond percission
- Change dropLastBucket to dropPartialBuckets
- Impliment partial bucket filter
- Adding partial bucket filter to metric threshold alerts

* Cleaning up getElasticsearchMetricQuery

* Change timestamp to from_as_string to align to how date_histgram works

* Fixing tests to be more realistic

* fixing types; removing extra imports

* Fixing new mock data to work with previews

* Removing value checks since they don't really provide much value

* Removing test for refactored functinality

* Change value to match millisecond resolution

* Fixing values for new partial bucket scheme

* removing unused var

* Fixing lookback since drops more than last buckets

* Changing results count

* fixing more tests

* Removing empty describe

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jul 20, 2021
…y-show-migrate-to-authzd-users

* 'master' of github.com:elastic/kibana: (187 commits)
  Space management page UX improvements (elastic#100448)
  [Reporting] Unskip flaky test when downloading CSV with "no data" (elastic#105252)
  Update dependency @elastic/charts to v33 (master) (elastic#105633)
  [Observability RAC] Improve alerts table columns (elastic#105446)
  Introduce `preboot` lifecycle stage (elastic#103636)
  [Security Solution] Invalid kql query timeline refresh bug (elastic#105525)
  skip flaky suite (elastic#106121)
  [Security Solution][Endpoint] Fix UI inconsistency between isolation forms and remove display of Pending isolation statuses (elastic#106118)
  docs: APM RUM Source map API (elastic#105332)
  [CTI] Adds indicator match rule improvements (elastic#97310)
  [Security Solution] update text for Isolation action submissions (elastic#105956)
  EP Meta Telemetry Perf (elastic#104396)
  [Metrics UI] Drop partial buckets from ALL Metrics UI queries (elastic#104784)
  Remove beta admonitions for Fleet docs (elastic#106010)
  [Observability RAC] Remove indexing of rule evaluation documents (elastic#104970)
  Parameterize migration test for kibana version (elastic#105417)
  [Alerting] Allow rule to execute if the value is 0 and that mets the condition (elastic#105626)
  [ML] Fix Index data visualizer sometimes shows wrong doc count for saved searches (elastic#106007)
  [Security Solution] UX fixes for Policy page and Case Host Isolation comment (elastic#106027)
  [Security Solution]Memory protection configuration card for policies integration. (elastic#101365)
  ...

# Conflicts:
#	x-pack/plugins/reporting/public/management/report_listing.test.tsx
#	x-pack/plugins/reporting/public/management/report_listing.tsx
phillipb added a commit to phillipb/kibana that referenced this pull request Jul 29, 2021
phillipb added a commit that referenced this pull request Jul 29, 2021
…ations inside of metrics threshold alerts (#106947) (#107167)

* [Metrics UI] Correct inaccurate offsetting for non-rate aggregations inside of metrics threshold alerts (#106947)

* Don't skip last bucket for most aggs

* Allow alerting on partial buckets for certain aggs

* Fix test, PR feedback, and some comments

* Remove all offset logic for date_range aggs

* Remove code comment

* Add delivery delay

* Fix the date range for query

* Add TODO

* Port over changes from PR on master

 #104784

* Add missing change
Zacqary pushed a commit to Zacqary/kibana that referenced this pull request Aug 6, 2021
…c#104784)

* [Metrics UI] Change dropLastBucket to dropPartialBuckets

- Change offset calculation to millisecond percission
- Change dropLastBucket to dropPartialBuckets
- Impliment partial bucket filter
- Adding partial bucket filter to metric threshold alerts

* Cleaning up getElasticsearchMetricQuery

* Change timestamp to from_as_string to align to how date_histgram works

* Fixing tests to be more realistic

* fixing types; removing extra imports

* Fixing new mock data to work with previews

* Removing value checks since they don't really provide much value

* Removing test for refactored functinality

* Change value to match millisecond resolution

* Fixing values for new partial bucket scheme

* removing unused var

* Fixing lookback since drops more than last buckets

* Changing results count

* fixing more tests

* Removing empty describe

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
# Conflicts:
#	x-pack/plugins/infra/server/lib/alerting/metric_threshold/lib/metric_query.ts
Zacqary added a commit that referenced this pull request Aug 6, 2021
…104784) (#107838)

* [Metrics UI] Drop partial buckets from ALL Metrics UI queries (#104784)

* [Metrics UI] Change dropLastBucket to dropPartialBuckets

- Change offset calculation to millisecond percission
- Change dropLastBucket to dropPartialBuckets
- Impliment partial bucket filter
- Adding partial bucket filter to metric threshold alerts

* Cleaning up getElasticsearchMetricQuery

* Change timestamp to from_as_string to align to how date_histgram works

* Fixing tests to be more realistic

* fixing types; removing extra imports

* Fixing new mock data to work with previews

* Removing value checks since they don't really provide much value

* Removing test for refactored functinality

* Change value to match millisecond resolution

* Fixing values for new partial bucket scheme

* removing unused var

* Fixing lookback since drops more than last buckets

* Changing results count

* fixing more tests

* Removing empty describe

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
# Conflicts:
#	x-pack/plugins/infra/server/lib/alerting/metric_threshold/lib/metric_query.ts

* Fix bad merge

Co-authored-by: Chris Cowan <chris@chriscowan.us>
@simianhacker simianhacker deleted the issue-104699-offset-ms-percission-trin-buckets branch April 17, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Metrics UI Metrics UI feature release_note:fix Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metrics UI] Drop buckets that DO NOT strictly fall within the specified time range.
5 participants