config consolidation/removal of tech debt #707

ryandeivert · 2018-04-26T05:58:23Z

to: @austinbyers
cc: @airbnb/streamalert-maintainers
size: large

Background

We currently have JSON config loading logic in various places, each implemented for specific use-cases (for the most part). Some of these are untested but most are largely redundant. These changes are part of an effort to consolidate and simplify how we handle config loading.

Changes

Adding a new function (named simply load_config) to a new stream_alert.shared.config module.
This function is flexible and can handle all of the following:
- Loading the entire config (default)
- Loading certain files only (through the use of an include keyword argument)
- Loading the entire config minus specific items (through the use of an exclude keyword argument)
- Optionally 'validating' certain aspects of the config via a validate keyword argument.
Updating all current config loading logic (that I was able to find) to use this new function.
- Threat Intel Downloader function
- Athena Partition Refresh function
- Rule Processor
- Alert Processor
- CLIConfig
The CLIConfig class has been refactored slightly, since it can use the new load_config function for loading instead of duplicating this effort internally.
Updating some code related to configuring outputs that was doing its own config magic instead of leveraging code that currently exists.
Standardizing lambda function ARN parsing to use the name qualifier instead of lambda_alias.

Future Work

Ensure all places where lambda function arns are parsed use the new parse_lambda_arn function in the stream_alert.shared.config module.
Move our config to be class-based, in which CLIConfig would inherit from in order to implement the 'write' functionality for writing the config back to disk.

Testing

Updating all unit tests to utilize to the new load_config function and adding new unit tests for the stream_alert.shared.config module.

coveralls · 2018-04-26T06:03:15Z

Coverage increased (+0.06%) to 96.524% when pulling f2e1419 on ryandeivert-config-consolidation into 0e43516 on master.

jacknagz

👏 Great change, we really needed this!!

jacknagz · 2018-04-26T15:53:05Z

stream_alert/shared/config.py

+            }
+    """
+    split_arn = function_arn.split(':')
+    return {


Optionally, we could also load function_name and qualifier from other attributes of the context:

function_name Name of the Lambda function that is executing. function_version The Lambda function version that is executing. If an alias is used to invoke the function, then function_version will be the version the alias points to. invoked_function_arn The ARN used to invoke this function. It can be function ARN or alias ARN. An unqualified ARN executes the $LATEST version and aliases execute the function version it is pointing to.

From: https://docs.aws.amazon.com/lambda/latest/dg/python-context-object.html

jacknagz · 2018-04-26T15:55:34Z

stream_alert/shared/config.py

+    return config
+
+def _load_json_file(path, ordered=False):
+    """Helper to return the loaded json from a given path"""


Nit: add kwargs/return/raise values in the docstring

didn't we say no more 'nits'? 🤣

jacknagz · 2018-04-26T15:57:23Z

stream_alert/shared/config.py

+            raise ConfigError('Invalid JSON format for {}'.format(path))
+
+
+def _validate_config(config):


We should probably refactor this at some point, since 'logs' and 'sources's shouldn't ever be optional. It would be great to check a set of required/optional keys to ensure other arbitrary values aren't added as well into the conf.

The above load_config function allows anything to be optional.. meaning logs.json or sources.json could be excluded by the caller using exclude={'logs.json'}, etc. These checks just ensure that the user did not do something like:

load_config(exclude={'logs.json'}, validate=True)

jacknagz · 2018-04-26T16:20:27Z

stream_alert_cli/config.py

-        self.load()
+    def __init__(self, config_path=DEFAULT_CONFIG_PATH):
+        self.config_path = config_path
+        self.config = config.load_config(config_path)


jacknagz · 2018-04-26T16:24:37Z

stream_alert/shared/config.py

+
+    conf_files.intersection_update(default_files)
+    if not (conf_files or include_clusters):
+        raise ConfigError('No config files to load')


Could we add more context into this error?

austinbyers

More red than green!! 🔻 ❌ 🔻
Awesome, thanks for doing this, it was much needed.

Since this changes how the config is loaded for multiple Lambda functions, I'd be most comfortable with a test deploy if you haven't already

austinbyers · 2018-04-26T21:42:03Z

stream_alert/shared/config.py

+        exclude (set): Names of config files or folders that should not be loaded
+        include (set): Names of specific config files to only load
+        validate (bool): Validate aspects of the config to check for user error
+    """


If there are only 3 keyword arguments, can we list them explicitly instead of using **kwargs? I'm nearly always a fan of explicit kwargs - it makes it easier for IDEs to autocomplete, etc

no real reason not to - I can make the change :)

austinbyers · 2018-04-26T21:45:26Z

stream_alert/shared/config.py

+
+    Keyword Arguemnts:
+        exclude (set): Names of config files or folders that should not be loaded
+        include (set): Names of specific config files to only load


Out of curiosity, what was the motivation for selectively loading parts of the config? Performance?

A potential downside to this approach is having to reason about which sections of the config are available to different Lambda functions instead of being able to reason about a single config object.

A potential future iteration of this (once we have a Config class) would be to lazy-load attributes. That way, callers don't need to explicitly say which config they need

I suppose the motivation was largely to allow for supporting what we currently do. For instance, the alert processor only loads the outputs.json, so I thought supporting a similar model was ideal. While I agree it could be confusing to have to reason about which files are loaded and which are not, the caller is the one who restricts the loaded files (and thus should be responsible for knowing this..) otherwise all of them will be loaded.

After this initial iteration I'm open to building it out more (and anticipate doing so). I like the idea of lazy loading and think that we can benefit a lot from that approach!

ryandeivert · 2018-04-26T23:50:04Z

Just deployed this in test account and all is calm! merging

ryandeivert added 14 commits April 25, 2018 22:58

adding new shared config loading functionality

75479ec

updating alert processor to use new config loading func

e60dc26

updates to some alert processor unit tests

98c653e

updates to threat intel downloader to use new config loading func

cba9029

updates to threat intel downloader unit tests

a82ff25

adding tests for new config loading logic

03c6fc6

updates to athena partition refresh to use new config loading func

2771ad2

updates to athena partition refresh unit tests

5a02931

updating unit test lambda.json config file

dff31a9

updates to rule processor to use new config loading func

bac158a

updates to rule processor unit tests

14a2def

removing unnecessary logic related to config from CLI

b67d305

updating the CLIConfig class to use the shared config loading logic

7c25589

fixing test failures

b29a6eb

ryandeivert force-pushed the ryandeivert-config-consolidation branch from 9c4a9bc to b29a6eb Compare April 26, 2018 05:59

ryandeivert changed the title ~~Ryandeivert config consolidation~~ config consolidation/removal of tech debt Apr 26, 2018

jacknagz approved these changes Apr 26, 2018

View reviewed changes

ryandeivert added config improvement labels Apr 26, 2018

ryandeivert added this to the 2.0.0 milestone Apr 26, 2018

austinbyers approved these changes Apr 26, 2018

View reviewed changes

addressing pr feedback

f2e1419

ryandeivert merged commit e59eb90 into master Apr 26, 2018

ryandeivert deleted the ryandeivert-config-consolidation branch April 26, 2018 23:50

ryandeivert added the tech debt label Jul 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config consolidation/removal of tech debt #707

config consolidation/removal of tech debt #707

ryandeivert commented Apr 26, 2018

coveralls commented Apr 26, 2018 •

edited

Loading

jacknagz left a comment

jacknagz Apr 26, 2018

jacknagz Apr 26, 2018

ryandeivert Apr 26, 2018

jacknagz Apr 26, 2018

ryandeivert Apr 26, 2018

jacknagz Apr 26, 2018

jacknagz Apr 26, 2018

austinbyers left a comment

austinbyers Apr 26, 2018

ryandeivert Apr 26, 2018

austinbyers Apr 26, 2018

ryandeivert Apr 26, 2018

ryandeivert commented Apr 26, 2018 •

edited

Loading

		raise ConfigError('Invalid JSON format for {}'.format(path))


		def _validate_config(config):

config consolidation/removal of tech debt #707

config consolidation/removal of tech debt #707

Conversation

ryandeivert commented Apr 26, 2018

Background

Changes

Future Work

Testing

coveralls commented Apr 26, 2018 • edited Loading

jacknagz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

austinbyers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryandeivert commented Apr 26, 2018 • edited Loading

coveralls commented Apr 26, 2018 •

edited

Loading

ryandeivert commented Apr 26, 2018 •

edited

Loading