Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[outputs] adding support for required alerting outputs #643

Merged
merged 6 commits into from
Mar 20, 2018

Conversation

ryandeivert
Copy link
Contributor

to: @austinbyers
cc: @airbnb/streamalert-maintainers
size: medium

Background

A previous change was made to enable athena as part of the default deployment. Due to this change, StreamAlert must enforce sending alerts to required outputs even if they are not defined in the actual rule.

Changes

  • Required alerting outputs will be hard-coded in the the stream_alert/shared/resource.py module.
    • Currently, this only includes the default aws-firehose output for alerts
  • The rule processor will combine the default output names with the user-defined outputs for a rule.
  • The alert processor config will now be passed to the resources function to merge in required outputs.
  • Updating rules as necessary.
  • Adding suppression for noisy log statement during tests.

Testing

Updating unit tests that broke because of this change. Adding tests for the shared functions that return the required outputs.

@ryandeivert ryandeivert force-pushed the ryandeivert-default-output branch 2 times, most recently from c1d7a61 to 6c7e296 Compare March 20, 2018 20:50
@ryandeivert
Copy link
Contributor Author

@austinbyers just fixed the integration testing mocks, PTAL

@coveralls
Copy link

coveralls commented Mar 20, 2018

Coverage Status

Coverage increased (+0.08%) to 95.727% when pulling da8192a on ryandeivert-default-output into a97fb60 on master.

Copy link
Contributor

@austinbyers austinbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how rules don't need explicit outputs now! A few comments

@@ -19,7 +19,7 @@

from stream_alert.rule_processor import LOGGER
from stream_alert.rule_processor.threat_intel import StreamThreatIntel
from stream_alert.shared import NORMALIZATION_KEY
from stream_alert.shared import resources, NORMALIZATION_KEY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the alert processor, these are imported in the other order. Not sure which way is the "proper" alphabetization, but let's make it the same either way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - this was a lazy find/replace since I used to have resources named common but switch the naming convention

@@ -53,6 +53,7 @@ class StreamRules(object):
def __init__(self, config):
"""Initialize a StreamRules instance to cache a StreamThreatIntel instance."""
self._threat_intel = StreamThreatIntel.load_from_config(config)
self._required_outputs = set(resources.get_required_outputs())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_required_outputs() already returns a set, it doesn't need to be recreated as such. However, for clarity, you could rename self._required_outputs to self._required_output_set if you wanted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - the initial implementation of get_required_outputs actually returned a dict, which needed to be cast to set. updating now

outputs = set()
for service, value in REQUIRED_OUTPUTS.iteritems():
if not isinstance(value, dict):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since REQUIRED_OUTPUTS is hardcoded, I don't think we need any error-checking logic. I would rather fail with an exception here than just silently skip what is likely a problem with the REQUIRED_OUTPUTS definition

"""
for service, value in REQUIRED_OUTPUTS.iteritems():
if not isinstance(value, dict):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto - I don't think we need this

"""Iterates through the required outputs and merges them with the user outputs

Args:
user_config (dict): Loaded user outputs dictionary from conf/outputs.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line explaining that the incoming config is modified? Just reading the docstring I was under the impression a new dictionary was returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will actually change this to return a copy, not modify in place

# Combine the required alert outputs with the ones for this rule
all_outputs = self._required_outputs
if rule.outputs:
all_outputs = all_outputs.union(set(rule.outputs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I saw this, I thought it was a bug because it looks like modifying all_outputs should modify self._required_outputs (they are initially the same reference). But this line actually rebinds all_outputs to reference a new set created by the union.

For clarity, and to avoid possible future bugs with this, can we just do

all_outputs = self._required_outputs | set(rule.outputs)

Ideally, rule.outputs is always a list or set, and that will work fine. But if not, you can do

all_outputs = self._required_outputs | set(rule.outputs or [])

Copy link
Contributor Author

@ryandeivert ryandeivert Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you! I was actually trying to figure out the best way to ensure rule outputs wasn't None. going to use set.union vs | for clarity

alert = {
'id': alert_id,
'record': record,
'rule_name': rule.rule_name,
'rule_description': rule.rule_function.__doc__ or DEFAULT_RULE_DESCRIPTION,
'log_source': str(payload.log_source),
'log_type': payload.type,
'outputs': rule.outputs,
'outputs': list(all_outputs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually are saving the outputs as a StringSet in Dynamo, so I'd say let's keep this as a set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a TODO to change this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you still need JSON-compatibility for the alert processor (but not for much longer!)

@@ -1,6 +1,7 @@
"""Define some shared resources."""
import logging
import os
import string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised pylint didn't complain about this! Just of curiosity, where did it come from?

Copy link
Contributor Author

@ryandeivert ryandeivert Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting - I think pylint won't complain about imports within __init__ since it probable assumes you're doing package imports like from . import *

removed it now :)

@ryandeivert ryandeivert force-pushed the ryandeivert-default-output branch from 3e53ef7 to da8192a Compare March 20, 2018 22:42
@ryandeivert
Copy link
Contributor Author

hey @austinbyers just addressed your feedback. PTAL when you get a chance

return outputs
return {'{}:{}'.format(service, output)
for service, value in REQUIRED_OUTPUTS.iteritems()
for output in value.keys()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Pythonic and elegant! 🐍

Copy link
Contributor

@austinbyers austinbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants