Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft for socialization at this point] Migration side car actions 2 #112043

Conversation

FrankHassanabad
Copy link
Contributor

Summary

Summarize your PR. If it involves visual changes include a screenshot or gif.

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@mikecote
Copy link
Contributor

Hey @FrankHassanabad, after discussing with the team, we can’t recommend other plugins manipulating the internals of the alerting plugin without going through an API. We made the alerting saved objects hidden for this purpose. Otherwise, this can become hard to support and put our users in a bad position as we change the platform workflows by causing unintended side effects by the external code. The concern comes as we need to support customers upgrading from any 7.x version to any 8.x version, and it can get pretty complex without going through the standard migrations.

Would it be possible to have a write-up somewhere of what you’re trying to accomplish? Then, I can engage the @elastic/kibana-alerting-services team to help address the concerns. It’s crucial to keep resiliency in mind given the challenges our customers face, and we need to be cautious by looping in other Kibana teams on changes like this.

@FrankHassanabad
Copy link
Contributor Author

FrankHassanabad commented Sep 14, 2021

Hi @mikecote,

We want to further align with kibana-alerting and kibana-core and that is what we are attempting here.

We engaged with Kibana core here:
#109169

under the section, "Add ability to query during saved object migrations" and the sub-ticket listed from it:
#109188

I updated the sub-ticket to include these words to outline this secondary use case to try and be clearer:

Second use case is that we have saved objects such as the "siem-detection-engine-rule-actions" which is a saved object that has a reference back to the original alert and was put into place by developers a long time ago before the maturity of notifyWhen and throttle options. For us to move away from our legacy model, we need a type of "join" where we can query the alerting from it when we migrate.

The conclusion from kibana-core so far is this comment:
#109188 (comment)

Which indicated to us that hooks for doing queries during migrations such as a "join" was not going to happen during startup time or any time soonish if at all.

Would it be possible to have a write-up somewhere of what you’re trying to accomplish?

Please let me know if my write up above does not make sense. I can re-state it once more below if this helps:

We have a legacy notification system and a legacy notification rule. This legacy notification system used a "side car" object called siem-detection-engine-rule-actions and we have already removed it for 7.16.0 since all of the saved object id's are going to be changed and this legacy notification system will no longer work. We really cannot use this legacy notification system starting with 7.16.0. Also this legacy notification system is not aligned with the current alerting throttle notification system and we want to align with what the current and future kibana alerting system's notification and throttle is going to be doing (as well as anything else -- you all write great stuff thank you!). This legacy notification does not allow other operations from kibana-alerting to operate as well such as muteAll and impedes us from utilizing SOM. So outsie-daisy it has to go.

The good news is that we already re-did the code to use the newer notification system with great success, however we now need to migrate away and delete this legacy notification system on next upgrade which has a type of "join" from it to the alerting system which is the pain point.

Other notes to hopefully be more clear on what is going on in our thinking:

  • We would ❤️ love to use an API to do this rather than going down to the SO level to do this at startup but do not have one currently that we know of. If provided we would prefer to use the alerting client at startup time. We thought since migrations happen at an SO level this wouldn't be so bad? Maybe that's where we are incorrect.
  • We would ❤️ love to use a way if provided for the core of Kibana to do joins for migrations to help migrate siem-detection-engine-rule-actions which as stated above needs to be migrated by 7.16.0. The siem-detection-engine-rule-actions will not operate with the SO id's being re-generated and the time/effort to support this side car model is not a possibility.

If it matters we discussed during our team meetings alternatives such as:

  • We did discuss using a REST API to implement a migration with a warning banner to click a button to migrate but the current thinking was that we no longer want to be the "outlier" and to focus our best abilities on writing code that will eventually be adopted or molded into features that will be adopted by other team members and Kibana core or make it simpler to migrate and align towards their implementations (In short we want to align, align, align). For example, if we move to a REST API and then do the migration (UI side) then our team has to maintain that code for just as long if not longer and then are still forced to re-write it all from using the alerting client API to the SO migration way for joins if they are supported. Also, we did not want to re-provision the API keys from 1 user to another user during a banner click (so far).
  • It's far worse for us to not align with teams. We want to align (if I didn't state that earlier).
  • We could actually do nothing and just notify users they have to manually re-adjust any and all notifications through the UI if had a "hourly", "daily", or "weekly" schedule. That seemed bad, obviously but still an option.

We have added this to our agenda for today's meeting. If you see other options please list them below and if you have a particular option you want us to perform please tell us directly as well.

@FrankHassanabad FrankHassanabad changed the title Migration side car actions 2 [Draft for socialization at this point] Migration side car actions 2 Sep 14, 2021
@FrankHassanabad FrankHassanabad self-assigned this Sep 14, 2021
}

export const getThrottle = ({ actionSideCar, logger }: GetThrottleOptions): string => {
if (actionSideCar.attributes.alertThrottle != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check "rule" first intead of "alert" first since terminology changed and it's more than likely rule that is being stored as a more first class than "alert".

return [];
}

if (actionSideCar.attributes.actions.length !== resolvedActions.length) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @yctercero during in-person review with @dhurley14 these might not be 1-1. So I need to remove this check here and write tests (if this goes off of draft mode)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think these will be 1-1 and they will just have duplicates. I think we are ok here. More testing will be utilized for this point.

@oatkiller
Copy link
Contributor

Related issue: #112327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants