Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Alerting] Reusable functions for FAAD resource installation #152849

Merged
merged 4 commits into from
Mar 20, 2023

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Mar 7, 2023

Resolves #152490

Summary

This PR refactors the resource installation methods in AlertsService to be reusable library functions. It exports them from the alerting plugin and changes the rule registry resource installer to use them as well.

To Verify

  1. Run this branch with enableFrameworkAlerts: true. Verify that we can create a detection rule in the default space & a different space and generate a rule preview. The logs should show that the rule registry creates the resources for the preview indices and for indices in the non-default space.
  2. Verify that when running this branch on rule registry rules from main or a previous version, the rules continue to run successfully.

@ymao1 ymao1 force-pushed the alerting/faad-resources-preview branch from 9cf3a9e to 60ef606 Compare March 8, 2023 15:16
@ymao1 ymao1 changed the title Alerting/faad resources preview [Response Ops][Alerting] Reusable functions for FAAD resource installation Mar 9, 2023
@ymao1 ymao1 self-assigned this Mar 10, 2023
@ymao1 ymao1 added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry v8.8.0 release_note:skip Skip the PR/issue when compiling release notes labels Mar 10, 2023
@ymao1 ymao1 marked this pull request as ready for review March 10, 2023 03:18
@ymao1 ymao1 requested review from a team as code owners March 10, 2023 03:18
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

Copy link
Contributor

@CoenWarmer CoenWarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ymao1,

I'm using a CCS setup. I've attempted to follow the testing instructions provided. I have added xpack.alerting.enableFrameworkAlerts: true to kibana.dev.yml.

I then proceeded to create an APM Latency Threshold rule, which I have configured like this:

Screenshot 2023-03-10 at 11 15 02

When I clicked save, I got an error:

[2023-03-10T11:16:09.488+01:00][ERROR][plugins.alerting.apm.transaction_duration] Executing Rule default:apm.transaction_duration:8433ce40-bf2b-11ed-a8ca-e57eefdc2c75 has resulted in Error: Alert instance execution has already been scheduled, cannot schedule twice - Error: Alert instance execution has already been scheduled, cannot schedule twice
    at Alert.ensureHasNoScheduledActions (alert.ts:174:13)
    at Alert.scheduleActions (alert.ts:157:10)
    at executor (register_transaction_duration_rule_type.ts:281:12)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at Object.executor (create_lifecycle_executor.ts:215:35)
    at task_runner.ts:340:28
    at TaskRunnerTimer.runWithTimer (task_runner_timer.ts:49:20)
    at TaskRunner.runRule (task_runner.ts:304:38)
    at TaskRunner.run (task_runner.ts:671:31)
    at TaskManagerRunner.run (task_runner.ts:304:22)

I am unsure if this relates to your PR or not, but as it stands I couldn't successfully test your PR.

@mikecote mikecote self-requested a review March 10, 2023 14:16
@ymao1
Copy link
Contributor Author

ymao1 commented Mar 10, 2023

@CoenWarmer, thanks for taking a look. That error looks to be unrelated to this PR as it's coming from the APM transaction duration rule executor. The framework throws that error if an alert with the same ID is reported twice during the same execution. Looking at the coding inside register_transaction_duration_rule_type.ts, it looks like alerts are being reported in a for loop that loops over buckets that contains values for serviceName, environment, transactionType, transactionDuration, sourceFields

for (const {
serviceName,
environment,
transactionType,
transactionDuration,
sourceFields,
} of triggeredBuckets) {
const environmentLabel = getEnvironmentLabel(environment);

However inside the for loop, you are creating and reporting an alertId based only on the environment value:

const id = `${ApmRuleType.TransactionDuration}_${environmentLabel}`;

This means if you have buckets with the same value for environment but different values for serviceName or transactionType (which seems likely because of the multi-terms aggregation being performed), they will get reported back to the framework with the same alertID, causing this error.

This seems like a bug within the rule type executor, would you like me to open an issue for this?

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM! Tested locally and worked as expected 👍

@ymao1
Copy link
Contributor Author

ymao1 commented Mar 20, 2023

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 518 531 +13

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 41 42 +1
Unknown metric groups

API count

id before after diff
alerting 534 550 +16

ESLint disabled line counts

id before after diff
securitySolution 433 436 +3

Total ESLint disabled count

id before after diff
securitySolution 513 516 +3

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@CoenWarmer
Copy link
Contributor

Tested and works as expected! LGTM!

@ymao1 ymao1 merged commit 49a0996 into elastic:main Mar 20, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Mar 20, 2023
@ymao1 ymao1 deleted the alerting/faad-resources-preview branch March 20, 2023 16:40
v1v added a commit to v1v/kibana that referenced this pull request Mar 20, 2023
…loy-my-kibana-oblt

* upstream/main: (727 commits)
  Upgrade caniuse-lite db (elastic#153318)
  [Security Solution] expanded flyout - right section - json tab implementation (elastic#152935)
  chore(slo): Make APM indicator's index required (elastic#153311)
  skip failing test suite (elastic#136688)
  [Security Solution] Fix security-solution storybook package codeowners (elastic#153307)
  [EUI] Add `scrollLock` workaround CSS to Kibana's `body` (elastic#153227)
  [Cloud Security] Show coming soon deployments of vulnerability management (elastic#153249)
  [Cloud Security] fixed onboarding link directs to cspm integration (elastic#153268)
  [Response Ops][Alerting] Reusable functions for FAAD resource installation (elastic#152849)
  remove geohash_grid aggregation support (elastic#152952)
  [Tech Debt] Reorder Rules page (elastic#152897)
  [Saved Object Finder] Add help text & left button (elastic#152742)
  [Transform] Replace SavedObjectsFinder component (elastic#153128)
  Make pipeline creation endpoint accept a full pipeline definition (elastic#153133)
  [Fleet] Displaying policy changes in Agent activity (elastic#153237)
  skip flaky suite (elastic#152852)
  [Security Solution][Endpoint] Add tests to cover RBAC entries in the Role Kibana Privileges flyout (elastic#153068)
  [Security Solution][Endpoint] Additional tests for Response Console History Log page (covers TestRail manual tests) (elastic#153042)
  [Monitoring] Display node roles in Nodes table (elastic#152127)
  Rename getEditAlertFlyout to getEditRuleFlyout (elastic#153243)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Response Ops][Alerting] Migrate installation of preview resources to framework alerts-as-data
6 participants