Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Normalization] Make logs optional when datatypes is defined #307

Merged
3 commits merged into from
Sep 12, 2017

Conversation

chunyong-lin
Copy link
Contributor

@chunyong-lin chunyong-lin commented Sep 11, 2017

to @ryandeivert @mime-frame
cc @airbnb/streamalert-maintainers
size: tiny
contributes to: #304

Summary

Remove log_source constrain which means logs keyword is optional when we define a rule if datatypes keyword present. Otherwise logs keyword is required.
Apply Data Normalization feature to a rule, we can define a rule in this format

from helpers.base import fetch_values_by_datatype

@rule(datatypes=['command'])
def alert_suspicious_wget(rec):
    results = fetch_values_by_datatype(rec, 'command')
    for result in results:
        if fnmatch(result, "wget *"):
            return true
    return false

Changes

Testing

  • Integration test is passed, it covers three cases
    • logs is present, datatypes is not
    • logs is present, datatypes is
    • logs is not present, datatypes is
  • Unit test is passed, and it covers four cases
    • logs is present, datatypes is not - working as expected
    • logs is present, datatypes is - working as expected
    • logs is not present, datatypes is not - log an ERROR
    • logs is not present, datatypes is - working as expected.
  • Testing in AWS testing account is passed

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 95.298% when pulling 2f16953 on remove_log_source_constrain into d3ac7f3 on master.

payload.normalized_types,
rule.datatypes)
record['normalized_types'] = types_result
if rule.datatypes is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: The is not None here is implicit with just if rule.datatypes:

return False

for datatype in datatypes:
if not datatype in normalized_types:
LOGGER.error('The datatype [%s] is not defined!', datatype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logging statement no longer helpful? I understand why the one above was removed but can you explain why this one was also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we removed log_sources constrain, one rule will matched to multiple records. Normalized types dictionaries are defined based on each log source though. It is not an error if normalized type foo doesn't defined in Record A, it just doesn't match and return False is enough. Does it make sense to you?

@ghost
Copy link

ghost commented Sep 11, 2017

@chunyong-lin - For the testing, please ensure there is a test for all possible permutations and update the PR description:

  1. logs is present, datatypes is not
  2. logs is present, datatypes is
  3. logs is not present, datatypes is not - should log an ERROR
  4. logs is not present, datatypes is

@ghost ghost changed the title [Data Normalization] Remove log_source constrain. [Data Normalization] Make logs optional when datatypes is defined Sep 11, 2017
@chunyong-lin chunyong-lin force-pushed the remove_log_source_constrain branch from 2f16953 to c725c7c Compare September 11, 2017 20:49
@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 95.298% when pulling c725c7c on remove_log_source_constrain into 81044e1 on master.

@chunyong-lin
Copy link
Contributor Author

@mime-frame @ryandeivert PTAL, I also updated my test status to PR description. Thanks 😸

Chunyong Lin added 2 commits September 11, 2017 16:01
* Add unit tests to test `logs` and `datatypes` in rule header. They cover 4 cases:
  * logs is present, datatypes is not - working as expected
  * logs is present, datatypes is - working as expected
  * logs is not present, datatypes is not - log an ERROR
  * logs is not present, datatypes is - working as expected.
@chunyong-lin chunyong-lin force-pushed the remove_log_source_constrain branch from b6c4741 to 3dd0e15 Compare September 11, 2017 23:01
@coveralls
Copy link

Coverage Status

Coverage remained the same at 95.295% when pulling 3dd0e15 on remove_log_source_constrain into 6610ff6 on master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 95.295% when pulling 3dd0e15 on remove_log_source_constrain into 6610ff6 on master.

* Remove this field from record before sending record to outputs (slack, PD, etc).
@chunyong-lin chunyong-lin force-pushed the remove_log_source_constrain branch from 3dd0e15 to 201aa1e Compare September 12, 2017 00:11
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 95.231% when pulling 201aa1e on remove_log_source_constrain into 6610ff6 on master.

@ghost
Copy link

ghost commented Sep 12, 2017

@chunyong-lin - to confirm, this is ready for review?

@chunyong-lin
Copy link
Contributor Author

@mime-frame Yes, it is ready for review.

Copy link
Contributor

@ryandeivert ryandeivert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ghost ghost merged commit 4a330f6 into master Sep 12, 2017
@ghost ghost deleted the remove_log_source_constrain branch September 12, 2017 16:42
@ryandeivert ryandeivert added this to the 1.5.0 milestone Sep 12, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants