Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config consolidation/removal of tech debt #707

Merged
merged 15 commits into from
Apr 26, 2018

Conversation

ryandeivert
Copy link
Contributor

to: @austinbyers
cc: @airbnb/streamalert-maintainers
size: large

Background

We currently have JSON config loading logic in various places, each implemented for specific use-cases (for the most part). Some of these are untested but most are largely redundant. These changes are part of an effort to consolidate and simplify how we handle config loading.

Changes

  • Adding a new function (named simply load_config) to a new stream_alert.shared.config module.
  • This function is flexible and can handle all of the following:
    • Loading the entire config (default)
    • Loading certain files only (through the use of an include keyword argument)
    • Loading the entire config minus specific items (through the use of an exclude keyword argument)
    • Optionally 'validating' certain aspects of the config via a validate keyword argument.
  • Updating all current config loading logic (that I was able to find) to use this new function.
    • Threat Intel Downloader function
    • Athena Partition Refresh function
    • Rule Processor
    • Alert Processor
    • CLIConfig
  • The CLIConfig class has been refactored slightly, since it can use the new load_config function for loading instead of duplicating this effort internally.
  • Updating some code related to configuring outputs that was doing its own config magic instead of leveraging code that currently exists.
  • Standardizing lambda function ARN parsing to use the name qualifier instead of lambda_alias.

Future Work

  • Ensure all places where lambda function arns are parsed use the new parse_lambda_arn function in the stream_alert.shared.config module.
  • Move our config to be class-based, in which CLIConfig would inherit from in order to implement the 'write' functionality for writing the config back to disk.

Testing

Updating all unit tests to utilize to the new load_config function and adding new unit tests for the stream_alert.shared.config module.

@ryandeivert ryandeivert force-pushed the ryandeivert-config-consolidation branch from 9c4a9bc to b29a6eb Compare April 26, 2018 05:59
@coveralls
Copy link

coveralls commented Apr 26, 2018

Coverage Status

Coverage increased (+0.06%) to 96.524% when pulling f2e1419 on ryandeivert-config-consolidation into 0e43516 on master.

@ryandeivert ryandeivert changed the title Ryandeivert config consolidation config consolidation/removal of tech debt Apr 26, 2018
Copy link
Contributor

@jacknagz jacknagz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Great change, we really needed this!!

}
"""
split_arn = function_arn.split(':')
return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, we could also load function_name and qualifier from other attributes of the context:

function_name
Name of the Lambda function that is executing.

function_version
The Lambda function version that is executing. If an alias is used to invoke the function, then function_version will be the version the alias points to.

invoked_function_arn
The ARN used to invoke this function. It can be function ARN or alias ARN. An unqualified ARN executes the $LATEST version and aliases execute the function version it is pointing to.

From: https://docs.aws.amazon.com/lambda/latest/dg/python-context-object.html

return config

def _load_json_file(path, ordered=False):
"""Helper to return the loaded json from a given path"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add kwargs/return/raise values in the docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we say no more 'nits'? 🤣

raise ConfigError('Invalid JSON format for {}'.format(path))


def _validate_config(config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably refactor this at some point, since 'logs' and 'sources's shouldn't ever be optional. It would be great to check a set of required/optional keys to ensure other arbitrary values aren't added as well into the conf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above load_config function allows anything to be optional.. meaning logs.json or sources.json could be excluded by the caller using exclude={'logs.json'}, etc. These checks just ensure that the user did not do something like:

load_config(exclude={'logs.json'}, validate=True)

self.load()
def __init__(self, config_path=DEFAULT_CONFIG_PATH):
self.config_path = config_path
self.config = config.load_config(config_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏


conf_files.intersection_update(default_files)
if not (conf_files or include_clusters):
raise ConfigError('No config files to load')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add more context into this error?

Copy link
Contributor

@austinbyers austinbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More red than green!! 🔻 ❌ 🔻
Awesome, thanks for doing this, it was much needed.

Since this changes how the config is loaded for multiple Lambda functions, I'd be most comfortable with a test deploy if you haven't already

exclude (set): Names of config files or folders that should not be loaded
include (set): Names of specific config files to only load
validate (bool): Validate aspects of the config to check for user error
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are only 3 keyword arguments, can we list them explicitly instead of using **kwargs? I'm nearly always a fan of explicit kwargs - it makes it easier for IDEs to autocomplete, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no real reason not to - I can make the change :)


Keyword Arguemnts:
exclude (set): Names of config files or folders that should not be loaded
include (set): Names of specific config files to only load
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what was the motivation for selectively loading parts of the config? Performance?

A potential downside to this approach is having to reason about which sections of the config are available to different Lambda functions instead of being able to reason about a single config object.

A potential future iteration of this (once we have a Config class) would be to lazy-load attributes. That way, callers don't need to explicitly say which config they need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the motivation was largely to allow for supporting what we currently do. For instance, the alert processor only loads the outputs.json, so I thought supporting a similar model was ideal. While I agree it could be confusing to have to reason about which files are loaded and which are not, the caller is the one who restricts the loaded files (and thus should be responsible for knowing this..) otherwise all of them will be loaded.

After this initial iteration I'm open to building it out more (and anticipate doing so). I like the idea of lazy loading and think that we can benefit a lot from that approach!

@ryandeivert
Copy link
Contributor Author

ryandeivert commented Apr 26, 2018

Just deployed this in test account and all is calm! merging

@ryandeivert ryandeivert merged commit e59eb90 into master Apr 26, 2018
@ryandeivert ryandeivert deleted the ryandeivert-config-consolidation branch April 26, 2018 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants