-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings & system tests #10405
base: main
Are you sure you want to change the base?
Conversation
… detect rerouted datastreams
… class support, ignore _dev folder
…into it's own file
…oxy_endpoint field, uupdated network activity class and segregated endpoint event mappings into separate files across all data streams. updated ocsf object as necessary across respective data streams
… data streams, added new fields to support newly added event class
…ta fields across all data streams, flattened ldap fields in event data stream to make room for more fields
…d finding_info in findings data stream
…ed resources object group, added new objects as required
…port for incitent findings event class
…ema version in comment and dashboard links to 1.1.0
…r file hosting activity class
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
…s and updated missing mappings accorgingly
🚀 Benchmarks reportTo see the full report comment with |
💚 Build Succeeded
History
cc @ShourieG |
Quality Gate failedFailed conditions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of general comments.
I looked at the README and dashboard diffs.
@@ -19,6 +19,8 @@ The Amazon Security Lake integration collects logs from both [Third-party servic | |||
### **NOTE**: | |||
- The Amazon Security Lake integration supports events collected from [AWS services](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html) and [third-party services](https://docs.aws.amazon.com/security-lake/latest/userguide/custom-sources.html). | |||
|
|||
- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward. | |
- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable and stay within field mapping [limits](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). This will evolve as needed going forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or even, s/This will evolve as needed going forward./This will evolve as needed./
@@ -1,4 +1,9 @@ | |||
# newer versions go on top | |||
- version: "2.0.0" | |||
changes: | |||
- description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic template support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic template support. | |
- description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic mapping support. |
I think you're using dynamic field mapping, but not yet dynamic templates.
Would be good to correct this terminology in the proposed commit message as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of colour and movement in this change which makes it difficult to be confident of the review. I have looked at:
- updated root org templates
- reworked 'org' object mapping as tynamic template for all data streams
- segregated process fields in 'findings', added 'actor' fields for new class support, ignore _dev folder
- added fulnerability findings support and segregated 'resource' group into it's own file
- added ntp activity event class, deprecated proxy event class, aded proxy_endpoint field, uupdated network activity class and segregated endpoint event mappings into separate files across all data streams. updated ocsf object as necessary across respective data streams
- added os patch state event class, segregated device fields across all data streams, added new fields to support newly added event class
- added datastore activity event class, segregated actor, user & metadata fields across all data streams, flattened ldap fields in event data stream to make room for more fields
- added support for detection finding event class, segregated and mapped finding_info in findings data stream
- added support of compliance finding event class, segregated and updated resources object group, added new objects as required
- segregated and expanded api object across all data streams, added support for incitent findings event class
- added support for Device Config State Change event class, updated schema version in comment and dashboard links to 1.1.0
- added support for scan activity event class
- segregated file fields across required data streams, added support for file hosting activity class
- added cwe & epss objects as flattened to cve object
- converted feature object to follow dynamic mapping rules across all data streams
- converted feature object to follow dynamic mapping rules across all data streams
- added some missing fields after locally running system tests for discovery datastream
- reworked terrform deployer to support multi-bucket based system tests
- updated docs and changelog
- fixed timestamp issues across all data streams, added all system tests and updated missing mappings accorgingly
There are a variety of comments and suggestions. I'll take another look again tomorrow.
output "bucket_arn" { | ||
value = aws_s3_bucket.security_lake_logs.arn | ||
description = "The ARN of the S3 bucket" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final new line.
@@ -19,6 +19,8 @@ The Amazon Security Lake integration collects logs from both [Third-party servic | |||
### **NOTE**: | |||
- The Amazon Security Lake integration supports events collected from [AWS services](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html) and [third-party services](https://docs.aws.amazon.com/security-lake/latest/userguide/custom-sources.html). | |||
|
|||
- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or even, s/This will evolve as needed going forward./This will evolve as needed./
@@ -14,7 +14,7 @@ | |||
type: keyword | |||
description: The account type, normalized to the caption of 'account_type_id'. In the case of 'Other', it is defined by the event source. | |||
- name: type_id | |||
type: keyword | |||
type: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being changed to an integer? In general, IDs are not semantically orderable, so a keyword is usually what is wanted, even if the underlying type is a number. If OCSF specifies that it is orderable, ignore this. Also below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCSF schema defines type_id as an integer, if we define it as keyword we need to convert manually from int to string on our end for all its occurrences or use a recursive script to do so, which will be expensive. The OCSF JSON payload also has it as a number, so explicit conversion is required on our end for every instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keywords can be numeric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember having it as a keyword in the beginning and then encountered type errors while running tests. I'll revisit once again and check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to annotate the fields in numeric_keyword_fields
in the test configs.
description: The type of FTP network connection (e.g. active, passive). | ||
description: The type the event. | ||
- name: type_id | ||
type: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyword
type: keyword | ||
description: The type of the user. For example, System, AWS IAM User, etc. | ||
- name: type_id | ||
type: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyword (throughout)
packages/amazon_security_lake/data_stream/findings/fields/_dev/fields.yml
Outdated
Show resolved
Hide resolved
type: keyword | ||
description: The type of scan. | ||
- name: type_id | ||
type: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyword
type: integer | ||
description: The number of items that were skipped. | ||
- name: num_trusted_items | ||
- name: num_* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? til
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup this works
@@ -583,6 +583,9 @@ | |||
- name: raw_data | |||
type: flattened | |||
description: The event data as received from the event source. | |||
- name: raw_data_keyword | |||
type: keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match_only_text? (throughout)
int digits = ("" + timestamp).length(); | ||
if (digits > 16 && digits <= 19) { | ||
return timestamp / 1000000; // Convert nanoseconds to milliseconds | ||
} else if (digits > 13 && digits <= 16) { | ||
return timestamp / 1000; // Convert microseconds to milliseconds | ||
} else if (digits > 10 && digits <= 13) { | ||
return timestamp; // Already in milliseconds, no conversion needed | ||
} else if (digits <= 10) { | ||
return timestamp * 1000; // Convert seconds to milliseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more expensive than necessary.
def convertToMilliseconds(long timestamp) {
if ((long)1e19 - 1 < timestamp) {
throw new IllegalArgumentException("Timestamp format not recognized: " + timestamp);
} else if ((long)1e16 - 1 < timestamp) {
return timestamp / 1000000; // Convert nanoseconds to milliseconds
} else if ((long)1e13 - 1 < timestamp) {
return timestamp / 1000; // Convert microseconds to milliseconds
} else if ((long)1e10 - 1 < timestamp) {
return timestamp; // Already in milliseconds, no conversion needed
} else {
return timestamp * 1000; // Convert seconds to milliseconds
}
}
or alternatively (and cleaner IMO)
def convertToMilliseconds(long timestamp) {
if (timestamp < (long)1e10) {
return timestamp * (long)1e3; // Convert seconds to milliseconds
}
long t = timestamp;
// Find first milli-, micro- or nano second-sane value in multiple-steps
// of 1000.
for (int i = 0; i < 3; i++) {
if ((long)1e13 - 1 < t) {
return t;
}
t /= (long)1e3
}
throw new IllegalArgumentException("Timestamp format not recognized: " + timestamp);
}
Type of change
Proposed commit message
With the upgrade of OCSF schemas, we are enhancing our support to meet compatibility requirements for OCSF v1.1. We are also reworking the ingest pipeline to incorporate dynamic templates and mappings to enable faster OCSF upgrades in future.
Checklist
changelog.yml
file.Author's Checklist
- [ ] Update dashboards wherever required.(dashboards are at the category level and atm after inspection no changes are required since they operate on shared values).NOTE
Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward.
The CI tests will pass once the respective elastic-package changes are implemented as defined here.
Approach
How to review this PR
Due to the scale of the changes, intermittent merges with main to resolve conflicts and reworks all across the board, re-writing the git history and consolidating the commits with git rebase is proving to be really challenging, hence I suggest the following approach to review this PR:-
Some commits have certain elements that could stand out and might have been reworked/removed later down stream. In those scenarios, feel free to review in the complete context or reach out to me in case of any confusion.
How to test this PR locally
Related issues
System Tests
Screenshots