Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Rule Suppression UI Improvements #9

Merged
merged 21 commits into from
Jun 18, 2024

Conversation

rylnd
Copy link
Owner

@rylnd rylnd commented Jun 6, 2024

Summary

  1. Re-enables and adds additional ML cypress tests

  2. Adds ML fields to Define Step
    Screenshot 2024-06-06 at 5 14 17 PM

  3. Disables suppression UI when no relevant ML jobs are enabled
    Screenshot 2024-06-17 at 11 26 01 PM

  4. Adds warning text when some relevant ML jobs are not enabled
    Screenshot 2024-06-17 at 11 26 16 PM

@rylnd rylnd changed the title ML Rule Suppression Improvements ML Rule Suppression UI Improvements Jun 10, 2024
@rylnd rylnd force-pushed the ml_rule_suppression_warnings branch from e59a189 to 5d0f0b3 Compare June 11, 2024 19:14
rylnd added 5 commits June 11, 2024 14:19
* Disables suppression fields if no relevant ML jobs are running (as we
  cannot retrieve field info)
* Adds a warning message if not all relevant ML jobs are running (as we
  may be missing some field info)

Next step is testing this; we don't currently have a way to run ML rules
in cypress, but I'm going to attempt to copy the logic in our FTR to
accomplish this.
This was previously only available on the About step, via the
useRuleIndices hook in combination with the useFetchIndex hook.

Add new composite hook that encapsulates the same logic, and provides it
to the define step. Unlike on the About step, we are currently only
using this for ML fields as other situations derive their field list
from a passed prop (which might be a performance optimization, or a bug,
or both).
To do this we need to get the ML API to recognize our jobs as installed
and running. They are currently _not_ recognizing this (although there
are anomalies in the index).

Still troubleshooting to see what's missing, here. This logic was
cribbed from the analogous FTR tests, but those also aren't working so
*shrug*.
@rylnd rylnd force-pushed the ml_rule_suppression_warnings branch from 5d0f0b3 to 4955831 Compare June 11, 2024 21:58
rylnd added 4 commits June 12, 2024 17:04
Specifying the `groups` parameter when using the "setup module" API
causes the corresponding jobs to be installed _with only the specified
group_. This meant that in our FTR tests, we have been installing jobs
with the `auditbeat` group.

However, part of the contract between ML and Detection Engine is that we
use the `group` parameter to determine relevance: if it doesn't belong
to either the `security` or (legacy) `siem` group(s), it effectively
does not exist to the Detection Engine.

This fixes the (very confusing) issue of jobs being installed but not
recognized, by specifying a recognized group id (and using our shared
constant for it), both in the FTR and cypress utilities.
I have seen the _ecs prefix in a few places, but I'm not quite sure if
it's actually part of official ML naming or not. Regardless, using the
incorrect name caused the "start datafeed" request to fail with a "no
datafeed for job ID" error.
The existing one also references the shared constants for our group IDs,
so 👍.
* Cleans up debugging logs
* Adds helper for ensuring that jobs are not started at beginning of
  suite
* Fixes form filling utility to support single values for
  machine_learning_job_id
* Updates suppression fields now that we're actually using real fields
  from anomaly indices
@@ -22,7 +23,7 @@ export const executeSetupModuleRequest = async ({
.set(getCommonRequestHeader('1'))
.send({
prefix: '',
groups: ['auditbeat'],
groups: [ML_GROUP_ID],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job sticking with it to figure this out. Woohoo to finally having our ML tests able to run.

rylnd and others added 12 commits June 17, 2024 17:00
Co-authored-by: Nastasha Solomon <79124755+nastasha-solomon@users.noreply.github.com>
Co-authored-by: Nastasha Solomon <79124755+nastasha-solomon@users.noreply.github.com>
The way the cypress helper that consumes these is written, both of these
forms work, but we're never going to encounter a rule with the display
name params, and knowing the display name but not the ID is not useful
for investigative purposes.

I can see how this might have been done to prevent needing to change
these jobs as their IDs change, but I think it's more likely that those
will change than their IDs.
There's not a lot here, but I feel bad for adding anything to
step_define_rule so this is an attempt to minimize that.

In the course of refactoring I also caught a bug (perhaps just a test
environment one) where the form fields are temporarily `undefined` when
the hooks are run. I updated the form type to reflect this; hopefully
that doesn't have broader impact (but if it does, those are probably
also uncaught bugs).
The combination of shared state and retry logic means that asserting
exactly 1 rule exists will never work if rule creation succeeds in a
previous step. If we instead assert that there is _at least_ the
expected number of rules, we have a chance of the retry working.
* Stop datafeeds before creating rule
* Simplify jobId logic
Turns out the reason the "Job IDs" were persisted as human-readable text
was so that they could be reused for assertions.

I still think these should be separate, so I'm adding them back for this
specific assertion.
* Adds necessary setup/teardown for ML integration
1. Use "proper" combobox text, and capture it within a helper method
I swear I saw this working when I was doing the same stuff for the ML
Job picker, but I must have only been dealing with one item, or the
items I was selecting were somehow different. Downarrow is _required_ on
the first option (a simple "enter" will select nothing), but using
downarrow on subsequent options will cause the _second_ suggested item
to be selected. E.g. if I type "by_field_value", it suggests both
"by_field_value" and "client.by_field_value," and {downarrow}{enter}
would cause the latter to be selected.

2. I also ensure that our new ML validations have run (which causes
  suppression fields to be disabled) before attempting to interact with
  the suppression fields, as this was causing some flakiness now that
  these checks are done async

3. I also fixed the broken `clearAlertSuppressionFields` task, which had
  never work but also had never been exercised since the relevant test
  was skipped.
There were no less than four assertions in this test that relied on
there being no other rules present in the environment, but nothing was
being done to ensure that was the case. I can't imagine why these were
skipped!
I want to run these in the flaky runner to get a sense of how/where
they're still failing, for now.
We were over-eagerly disabling these fields when the ML checks were not
relevant.
@rylnd rylnd marked this pull request as ready for review June 18, 2024 04:29
@rylnd rylnd merged commit e6aae21 into ml_rule_alert_suppression Jun 18, 2024
1 check passed
@rylnd rylnd deleted the ml_rule_suppression_warnings branch June 18, 2024 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants