-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cli][metrics] metric alarms configurable through the cli #302
Conversation
22bcdaa
to
99eb49c
Compare
@@ -29,6 +29,7 @@ | |||
}, | |||
"rule_processor": { | |||
"current_version": "$LATEST", | |||
"enable_metrics": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In doc (docs/source/athena-deploy.rst
), it says default value is false
. So which one if outdate? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs definitely - I do not have time to add those to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also that's related to the athena processor not the rule processor or alert processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but this looks good! The extended validation of the user input will make for a much better user experience instead of trying to deploy and waiting for Terraform to fail
"""Subclass of argparse.Action to avoid multiple of the same choice from a list""" | ||
def __call__(self, parser, namespace, values, option_string=None): | ||
unique_items = set(values) | ||
setattr(namespace, self.dest, unique_items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Some advanced argparsing I haven't seen before
GreaterThanOrEqualToThreshold | ||
GreaterThanThreshold | ||
LessThanThreshold | ||
LessThanOrEqualToThreshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more user-friendly if you allow >=, >, <, <=
and convert to the longer form, like you do for the function names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's smart - I was just using aws terms as a template but I like that approach better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought I think I'll leave it as is - using $ python manage.py create-alarm --help
will show the acceptable options.
Also if an invalid param is used, it will print the options like: manage.py [command] [subcommand] [options] create-alarm: error: argument -co/--comparison-operator: invalid choice: 'GreaterThanOrEqualToThresho' (choose from 'GreaterThanOrEqualToThreshold', 'GreaterThanThreshold', 'LessThanThreshold', 'LessThanOrEqualToThreshold')
A good thought but I think I'd just like to mirror the AWS options :)
manage.py
Outdated
) | ||
|
||
# Set the name of this parser to 'validate-schemas' | ||
metric_alarm_parser.set_defaults(command='create-alarm') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment does not match the code
// CloudWatch metric alarms that are created per-cluster | ||
// The split list is our way around poor tf support for lists of maps and is made up of: | ||
// <alarm_name>, <alarm_description>, <comparison_operator>, <evaluation_periods>, | ||
// <metric>, <period>, <statistic>, <threshold> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pretty good workaround. Maybe add a TODO so we can come back and update this once Terraform has better support for complex types
* Fixing bug in help string for validate-schemas command
* Adding some flags to the `manage.py metrics` command to accept a cluster and a function name * By default, function name is required while cluster is not (will default to all clusters)
…ter files * Metric alarms related to aggregate metrics get saved in the conf/global.json file * Metric alarms related to the athena function get saved in the conf/global.json file * Metric alarms related to the rule or alert processor functions get save to each cluster file. This enables us to better organize metrics and allows the user to set different alarms for different clusters.
* Terraform is awful and supporting both statistic and extended-statistic programatically is much too difficult right now. * Updating the property name for `metric` to be `metric_name` to align with what tf expects
… make it simpler * The terraform generate code now completely supports writing cloudwatch alarms for both aggregate metric alarms and alarms per cluster/fucntion * Aggregate metric alarms are written to the `main.tf` file to be published. * Per-cluster/function metric alarms are done via the `stream_alert` tf module.
* Updating unit test * Migarting a constant dict to metrics pacakge to be shared * Various linting
d8ae5f4
to
e427d16
Compare
to @austinbyers or @chunyong-lin
cc @airbnb/streamalert-maintainers
size: large
Background & Why
stream_alert/shared/metrics.py
package to be used throughout the project.FailedParses
(aka incoming logs that do not match a defined schema) rises above a certain threshold, CloudWatch can fire an alarm that then notifies any service connected to the alarm.Changes
manage.py
cli.--metric-target cluster
):--metric-target aggregate
):conf/global.json
(or the default SNS topic ofstream_alert_monitoring
if one is not defined).Other changes
TOTAL_PROCESSED_SIZE
that will log the processed bytes for each rule processor invocation.manage.py
cli also now supports turning on or off metrics for a given cluster/function:manage.py metrics --enable --functions rule
--clusters
can be added to only enable for specific clusters. By default without--clusters
this will enable metrics for all clusters for the given functionTesting