Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] Migrate data source for legacy alerts to monitoring data directly #87377

Merged

Conversation

chrisronline
Copy link
Contributor

@chrisronline chrisronline commented Jan 5, 2021

Resolves #81020

This PR reintroduces the logic discussed in #68805 where our legacy alerts (alerts which still exist as watches) need to fetch their data from .monitoring-{stack_product}-* indices instead of .monitoring-alerts-* indices.

This change is necessary because, due to #85047, we will be removing all cluster alert watches so the current logic to read from the output of these watches (which is .monitoring-alerts-* indices) needs to change. Luckily, we already wrote most of this code so it's just a matter of porting it over and making any adjustments to account for changes in the base alerting code.

To test, we just need to ensure the alerts fire properly. I recommend using the following gists (which contain code you can copy/paste into Dev Tools) to help trigger a specific alert:

Cluster health is fairly easy to replicate - just create a document in an index without a replica (which is the default behavior):

PUT twitter/_doc/1
{
  "name": "chris"
}

Keep in mind that legacy alerts required a gold+ license but the changes here will mean basic+ can leverage them.

@chrisronline chrisronline changed the title [Monitoring] Migrate legacy alerts to actual alerts [Monitoring] Migrate data source for legacy alerts to monitoring data directly Jan 20, 2021
@chrisronline chrisronline self-assigned this Jan 20, 2021
@chrisronline chrisronline added 8.0.0 release_note:skip Skip the PR/issue when compiling release notes review Team:Monitoring Stack Monitoring team v7.12.0 labels Jan 20, 2021
@chrisronline chrisronline marked this pull request as ready for review January 20, 2021 17:04
@chrisronline chrisronline requested a review from a team January 20, 2021 17:04
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

@wylieconlon wylieconlon added v8.0.0 and removed 8.0.0 labels Jan 20, 2021
@chrisronline
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

merge conflict between base and head

@igoristic
Copy link
Contributor

Alright, I'm done with the functional testing under multiple environments/conditions, and everything seems to check out. Awesome job! and Thank you for the very detailed and easy to follow instructions!

Going to continue code review now

Copy link
Contributor

@igoristic igoristic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job overall! 🏆

Please review the comments in the code, and some of these other issues:


  1. I think the following alerts:
  • ALERT_LICENSE_EXPIRATION
  • ALERT_ELASTICSEARCH_VERSION_MISMATCH
  • ALERT_KIBANA_VERSION_MISMATCH
  • ALERT_LOGSTASH_VERSION_MISMATCH

Should have the default interval of 1d instead of 1m. Since, these don't change intermittently like the other metrics do


  1. We should remove:
    x-pack/plugins/monitoring/server/lib/alerts/fetch_legacy_alerts.ts
    Doesn't look like it's used anymore

  1. Shouldn't we also add nextSteps to these alerts? Or, maybe at least in a separate issue

  1. Not sure I understand:

This alert does not feature grouping

Is this because these alerts are only supported for a single cluster?


NIT: We should probably replace the "legacy" terminology to something like "default alerts", since this PR now makes them true Kibana Alerts

@chrisronline
Copy link
Contributor Author

@igoristic

I think the following alerts:

I think they all do, like [here for example](https://github.com/elastic/kibana/pull/87377/files#diff-60cf9dc990401c054b21b18964f436f24739eb0b7667fa75e8b0897e9bfef9cbL38

We should remove:

Done!

Shouldn't we also add nextSteps to these alerts? Or, maybe at least in a separate issue

Agreed on both fronts. I didn't want to complicate this work. I'm just going for straight parity

Is this because these alerts are only supported for a single cluster?

I changed the comment and added more details. LMK if this helps

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@igoristic igoristic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good 👍

Thank you for addressing some of my confusion!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@chrisronline chrisronline merged commit 231610c into elastic:master Feb 9, 2021
@chrisronline chrisronline deleted the monitoring/restore_legacy_alerts branch February 9, 2021 02:50
chrisronline added a commit that referenced this pull request Feb 9, 2021
… directly (#87377) (#90720)

* License expiration

* Fetch legacy alert data from the source

* Add back in the one test file

* Remove deprecated code

* Fix up tests

* Add test files

* Fix i18n

* Update tests

* PR feedback

* Fix types and tests

* Fix license headers

* Remove unused function

* Fix faulty license expiration logic
jloleysens added a commit to jloleysens/kibana that referenced this pull request Feb 9, 2021
…timeline-and-rollover-info

* 'master' of github.com:elastic/kibana: (47 commits)
  [Fleet] Use TS project references (elastic#87574)
  before/beforeEach clean up (elastic#90663)
  [Vega] user should be able to set a specific tilemap service using the mapStyle property (elastic#88440)
  [Security Solution][Case] ServiceNow SIR Connector (elastic#88655)
  [Search Sessions] Enable extend from management (elastic#90558)
  [ILM] Delete phase redesign (rework) (elastic#90291)
  [APM-UI][E2E] use withGithubStatus step (elastic#90651)
  Add folding in kb-monaco and update some viewers (elastic#90152)
  [Grok Debugger] Changed test to wait for grok debugger container to exist to fix test flakiness (elastic#90543)
  Strongly typed EUI theme for styled-components (elastic#90106)
  Fix vega renovate label (elastic#90591)
  [Uptime] Migrate to TypeScript project references (elastic#90510)
  [Monitoring] Migrate data source for legacy alerts to monitoring data directly (elastic#87377)
  [Upgrade Assistant] Add A11y Tests (elastic#90265)
  [Time to Visualize] Adds functional tests for linking/unlinking panel from embeddable library (elastic#89612)
  [dev-utils/ship-ci-stats] fail when CI stats is down (elastic#90678)
  chore(NA): remove write permissions on Bazel remote cache for PRs (elastic#90652)
  chore(NA): move bazel workspace status from bash script into nodejs executable (elastic#90560)
  Use default ES distribution for functional tests (elastic#88737)
  [Alerts] Jira: Disallow labels with spaces (elastic#90548)
  ...

# Conflicts:
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/components/timeline/timeline.tsx
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/lib/absolute_timing_to_relative_timing.test.ts
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/lib/absolute_timing_to_relative_timing.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes review Team:Monitoring Stack Monitoring team v7.12.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Monitoring] Migrate cluster alerts from watcher to Kibana alerting
5 participants