[Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings & system tests #10405

ShourieG · 2024-07-08T08:20:26Z

Type of change

Enhancement

Proposed commit message

With the upgrade of OCSF schemas, we are enhancing our support to meet compatibility requirements for OCSF v1.1. We are also reworking the ingest pipeline to incorporate dynamic templates and mappings to enable faster OCSF upgrades in future.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

Added support for User Inventory Info class events.
Added Terraform based system tests (requires elastic package change to work)
Rework ingestion pipelines to incorporate dynamic templates and dynamic mappings.
Add support for more event classes introduced in OCSF v1.1.
Add new profiles and objects as required based on OCSF v1.1 updates.
~~- [ ] Update dashboards wherever required.~~ (dashboards are at the category level and atm after inspection no changes are required since they operate on shared values).
Updated documentation
Removed system test configs

NOTE

Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward.
The CI tests will pass once the respective elastic-package changes are implemented as defined here.

Approach

Segregated objects like user,actor,process,device,file,network and some more into their separate files across all data streams.
Implemented dynamic templates on few objects which aligned to current pipeline capabilities.
Implemented terraform based multi bucket system tests.
Added all new ocsf v1.1 classes and objects across data streams.
Fixed existing errors and issues in mappings and timestamp parsing across all data streams.
Cleaned up existing codebase and made it more maintainable.

How to review this PR

Due to the scale of the changes, intermittent merges with main to resolve conflicts and reworks all across the board, re-writing the git history and consolidating the commits with git rebase is proving to be really challenging, hence I suggest the following approach to review this PR:-

Complete review of the terraform based deployer, which will later be used in system tests after elastic-package changes are available.
Prioritize Commits with the "dynamic" keyword which is specific to some dynamic template implementations.
Prioritize Commits withe the keywords "updated", "converted", "segregated", "fixed".
Prioritise reviewing the pipeline changes as they contain core logic.
The commits with "added" keywords signify addition of new mappings for ocsf v1.1. These are quite large and often times redundant to review completely due to the nature of OCSF. Having said that if you personally feel any of the mappings are worth reviewing please go ahead.
Ignore commits with keywords "resolved", "merged", "test", "initial", "trying" .

Some commits have certain elements that could stand out and might have been reworked/removed later down stream. In those scenarios, feel free to review in the complete context or reach out to me in case of any confusion.

How to test this PR locally

Related issues

System Tests

--- Test results for package: amazon_security_lake - START ---
╭──────────────────────┬─────────────┬───────────┬──────────────────────┬────────┬─────────────────╮
│ PACKAGE              │ DATA STREAM │ TEST TYPE │ TEST NAME            │ RESULT │    TIME ELAPSED │
├──────────────────────┼─────────────┼───────────┼──────────────────────┼────────┼─────────────────┤
│ amazon_security_lake │ event       │ system    │ application-activity │ PASS   │ 3m12.321247667s │
│ amazon_security_lake │ event       │ system    │ discovery            │ PASS   │ 3m10.478033084s │
│ amazon_security_lake │ event       │ system    │ findings             │ PASS   │ 3m13.906812333s │
│ amazon_security_lake │ event       │ system    │ iam                  │ PASS   │ 3m13.214461166s │
│ amazon_security_lake │ event       │ system    │ network-activity     │ PASS   │ 3m10.608428458s │
│ amazon_security_lake │ event       │ system    │ system-activity      │ PASS   │ 3m12.344728625s │
╰──────────────────────┴─────────────┴───────────┴──────────────────────┴────────┴─────────────────╯

Screenshots

…omplete mappings

… detect rerouted datastreams

… class support, ignore _dev folder

…into it's own file

…oxy_endpoint field, uupdated network activity class and segregated endpoint event mappings into separate files across all data streams. updated ocsf object as necessary across respective data streams

… data streams, added new fields to support newly added event class

…ta fields across all data streams, flattened ldap fields in event data stream to make room for more fields

…d finding_info in findings data stream

…ed resources object group, added new objects as required

…port for incitent findings event class

…ema version in comment and dashboard links to 1.1.0

…r file hosting activity class

…ata streams

elasticmachine · 2024-08-09T16:30:37Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

…s and updated missing mappings accorgingly

…are implemented

elasticmachine · 2024-08-19T08:31:04Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

elasticmachine · 2024-08-26T09:58:10Z

💚 Build Succeeded

Buildkite Build
Commit: 5794401

History

💚 Build #14728 succeeded 2261431
💔 Build #14641 failed 360c3d8
💔 Build #14630 failed dd90df2
💔 Build #14525 failed 19ffbf7
💔 Build #14478 failed bb88d57
💔 Build #14475 failed 97459f5

cc @ShourieG

elastic-sonarqube · 2024-08-26T09:58:17Z

Quality Gate failed

Failed conditions
1.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

chrisberkhout

Couple of general comments.

I looked at the README and dashboard diffs.

chrisberkhout · 2024-09-11T13:14:55Z

packages/amazon_security_lake/_dev/build/docs/README.md

@@ -19,6 +19,8 @@ The Amazon Security Lake integration collects logs from both [Third-party servic
 ### **NOTE**:
 - The Amazon Security Lake integration supports events collected from [AWS services](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html) and [third-party services](https://docs.aws.amazon.com/security-lake/latest/userguide/custom-sources.html).

+- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward.


Suggested change

- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward.

- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable and stay within field mapping [limits](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). This will evolve as needed going forward.

... or even, s/This will evolve as needed going forward./This will evolve as needed./

chrisberkhout · 2024-09-11T13:19:52Z

packages/amazon_security_lake/changelog.yml

@@ -1,4 +1,9 @@
 # newer versions go on top
+- version: "2.0.0"
+  changes:
+    - description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic template support.


Suggested change

- description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic template support.

- description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic mapping support.

I think you're using dynamic field mapping, but not yet dynamic templates.
Would be good to correct this terminology in the proposed commit message as well.

efd6

There is a lot of colour and movement in this change which makes it difficult to be confident of the review. I have looked at:

updated root org templates
reworked 'org' object mapping as tynamic template for all data streams
segregated process fields in 'findings', added 'actor' fields for new class support, ignore _dev folder
added fulnerability findings support and segregated 'resource' group into it's own file
added ntp activity event class, deprecated proxy event class, aded proxy_endpoint field, uupdated network activity class and segregated endpoint event mappings into separate files across all data streams. updated ocsf object as necessary across respective data streams
added os patch state event class, segregated device fields across all data streams, added new fields to support newly added event class
added datastore activity event class, segregated actor, user & metadata fields across all data streams, flattened ldap fields in event data stream to make room for more fields
added support for detection finding event class, segregated and mapped finding_info in findings data stream
added support of compliance finding event class, segregated and updated resources object group, added new objects as required
segregated and expanded api object across all data streams, added support for incitent findings event class
added support for Device Config State Change event class, updated schema version in comment and dashboard links to 1.1.0
added support for scan activity event class
segregated file fields across required data streams, added support for file hosting activity class
added cwe & epss objects as flattened to cve object
converted feature object to follow dynamic mapping rules across all data streams
converted feature object to follow dynamic mapping rules across all data streams
added some missing fields after locally running system tests for discovery datastream
reworked terrform deployer to support multi-bucket based system tests
updated docs and changelog
fixed timestamp issues across all data streams, added all system tests and updated missing mappings accorgingly

There are a variety of comments and suggestions. I'll take another look again tomorrow.

efd6 · 2024-09-25T23:34:01Z

packages/amazon_security_lake/data_stream/event/_dev/deploy/tf/main.tf

+output "bucket_arn" {
+  value = aws_s3_bucket.security_lake_logs.arn
+  description = "The ARN of the S3 bucket"
+}


Final new line.

efd6 · 2024-09-25T23:38:23Z

packages/amazon_security_lake/_dev/build/docs/README.md

@@ -19,6 +19,8 @@ The Amazon Security Lake integration collects logs from both [Third-party servic
 ### **NOTE**:
 - The Amazon Security Lake integration supports events collected from [AWS services](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html) and [third-party services](https://docs.aws.amazon.com/security-lake/latest/userguide/custom-sources.html).

+- Due to the nature and structure of the OCSF schema, this integration has limitations on how deep the mappings run. Some important objects like 'Actor', 'User' and 'Product' have more fleshed-out mappings compared to others which get flattened after the initial 2-3 levels of nesting to keep them maintainable in a YAML format. This will evolve on a need-by-need basis going forward.


... or even, s/This will evolve as needed going forward./This will evolve as needed./

efd6 · 2024-09-26T00:11:33Z

packages/amazon_security_lake/data_stream/discovery/fields/user-fields.yml

@@ -14,7 +14,7 @@
              type: keyword
              description: The account type, normalized to the caption of 'account_type_id'. In the case of 'Other', it is defined by the event source.
            - name: type_id
-              type: keyword
+              type: integer


Why is this being changed to an integer? In general, IDs are not semantically orderable, so a keyword is usually what is wanted, even if the underlying type is a number. If OCSF specifies that it is orderable, ignore this. Also below.

OCSF schema defines type_id as an integer, if we define it as keyword we need to convert manually from int to string on our end for all its occurrences or use a recursive script to do so, which will be expensive. The OCSF JSON payload also has it as a number, so explicit conversion is required on our end for every instance.

Keywords can be numeric.

I remember having it as a keyword in the beginning and then encountered type errors while running tests. I'll revisit once again and check.

You will need to annotate the fields in numeric_keyword_fields in the test configs.

efd6 · 2024-09-26T00:12:45Z

packages/amazon_security_lake/data_stream/event/fields/fields.yml

-      description: The type of FTP network connection (e.g. active, passive).
+      description: The type the event.
+    - name: type_id
+      type: integer


efd6 · 2024-09-26T00:13:32Z

packages/amazon_security_lake/data_stream/iam/fields/user-fields.yml

+          type: keyword
+          description: The type of the user. For example, System, AWS IAM User, etc.
+        - name: type_id
+          type: integer


keyword (throughout)

packages/amazon_security_lake/data_stream/findings/fields/_dev/fields.yml

efd6 · 2024-09-26T01:37:02Z

packages/amazon_security_lake/data_stream/event/fields/misc-fields.yml

+          type: keyword
+          description: The type of scan.
+        - name: type_id
+          type: integer


efd6 · 2024-09-26T01:38:36Z

packages/amazon_security_lake/data_stream/application_activity/fields/fields.yml

-      type: integer
-      description: The number of items that were skipped.
-    - name: num_trusted_items
+    - name: num_*


Does this work? til

yup this works

efd6 · 2024-09-26T01:48:26Z

packages/amazon_security_lake/data_stream/application_activity/fields/fields.yml

@@ -583,6 +583,9 @@
    - name: raw_data
      type: flattened
      description: The event data as received from the event source.
+    - name: raw_data_keyword
+      type: keyword


match_only_text? (throughout)

efd6 · 2024-09-26T02:27:08Z

packages/amazon_security_lake/data_stream/event/elasticsearch/ingest_pipeline/default.yml

+              int digits = ("" + timestamp).length(); 
+              if (digits > 16 && digits <= 19) {
+                  return timestamp / 1000000;  // Convert nanoseconds to milliseconds
+              } else if (digits > 13 && digits <= 16) {
+                  return timestamp / 1000;  // Convert microseconds to milliseconds
+              } else if (digits > 10 && digits <= 13) {
+                  return timestamp;  // Already in milliseconds, no conversion needed
+              } else if (digits <= 10) {
+                  return timestamp * 1000;  // Convert seconds to milliseconds


This is more expensive than necessary.

def convertToMilliseconds(long timestamp) { if ((long)1e19 - 1 < timestamp) { throw new IllegalArgumentException("Timestamp format not recognized: " + timestamp); } else if ((long)1e16 - 1 < timestamp) { return timestamp / 1000000; // Convert nanoseconds to milliseconds } else if ((long)1e13 - 1 < timestamp) { return timestamp / 1000; // Convert microseconds to milliseconds } else if ((long)1e10 - 1 < timestamp) { return timestamp; // Already in milliseconds, no conversion needed } else { return timestamp * 1000; // Convert seconds to milliseconds } }

or alternatively (and cleaner IMO)

def convertToMilliseconds(long timestamp) { if (timestamp < (long)1e10) { return timestamp * (long)1e3; // Convert seconds to milliseconds } long t = timestamp; // Find first milli-, micro- or nano second-sane value in multiple-steps // of 1000. for (int i = 0; i < 3; i++) { if ((long)1e13 - 1 < t) { return t; } t /= (long)1e3 } throw new IllegalArgumentException("Timestamp format not recognized: " + timestamp); }

ShourieG added 5 commits June 7, 2024 12:10

added support for new user inventory info event class and updated inc…

66c9372

…omplete mappings

trying to make a working system test

fb78670

merged with upstream

3902a02

initial working system tests added pending elastic-package changes to…

6bec44b

… detect rerouted datastreams

merged with upstream/main

64f285b

ShourieG self-assigned this Jul 8, 2024

ShourieG added the 8.15 candidate label Jul 8, 2024

ShourieG added 4 commits July 10, 2024 14:08

test commit to be reverted

118b2d2

initial working test for dynamic template

185e2f9

updated root org templates

f784e75

reworked 'org' object mapping as tynamic template for all data streams

4282225

andrewkroh added Integration:amazon_security_lake Amazon Security Lake Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations] labels Jul 19, 2024

ShourieG added 17 commits July 23, 2024 13:18

Merge branch 'main' into security_lake/ocsf_1.1

e2f8457

Merge remote-tracking branch 'upstream/main' into security_lake/ocsf_1.1

d4788f4

segregated process fields in 'findings', added 'actor' fields for new…

32ed102

… class support, ignore _dev folder

added fulnerability findings support and segregated 'resource' group …

78c1ea2

…into it's own file

Merge remote-tracking branch 'upstream/main' into security_lake/ocsf_1.1

0656284

added ntp activity event class, deprecated proxy event class, aded pr…

8f7122d

…oxy_endpoint field, uupdated network activity class and segregated endpoint event mappings into separate files across all data streams. updated ocsf object as necessary across respective data streams

added os patch state event class, segregated device fields across all…

5352aac

… data streams, added new fields to support newly added event class

added datastore activity event class, segregated actor, user & metada…

ac66e6e

…ta fields across all data streams, flattened ldap fields in event data stream to make room for more fields

added support for detection finding event class, segregated and mappe…

73b7be8

…d finding_info in findings data stream

added support of compliance finding event class, segregated and updat…

1236584

…ed resources object group, added new objects as required

segregated and expanded api object across all data streams, added sup…

03b5099

…port for incitent findings event class

added support for Device Config State Change event class, updated sch…

e99119c

…ema version in comment and dashboard links to 1.1.0

added support for scan activity event class

7e5f687

segregated file fields across required data streams, added support fo…

516b63b

…r file hosting activity class

added cwe & epss objects as flattened to cve object

bf779a5

converted feature object to follow dynamic mapping rules across all d…

97459f5

…ata streams

added firewall rule object to respective event categories

bb88d57

ShourieG changed the title ~~[Amazon Security Lake] - OCSF v1.1 update with adding support for dynamic template and mappings~~ [Amazon Security Lake] - OCSF v1.1 update with refactor & adding support for dynamic template and mappings Aug 9, 2024

ShourieG changed the title ~~[Amazon Security Lake] - OCSF v1.1 update with refactor & adding support for dynamic template and mappings~~ [Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings Aug 9, 2024

ShourieG changed the title ~~[Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings~~ [Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings & system tests Aug 9, 2024

ShourieG added 2 commits August 9, 2024 21:48

reworked terrform deployer to support multi-bucket based system tests

0b356dc

updated docs and changelog

19ffbf7

ShourieG marked this pull request as ready for review August 9, 2024 16:30

ShourieG requested a review from a team as a code owner August 9, 2024 16:30

ShourieG requested review from andrewkroh, efd6 and chrisberkhout August 9, 2024 16:31

ShourieG added 2 commits August 14, 2024 00:38

fixed timestamp issues across all data streams, added all system test…

dd90df2

…s and updated missing mappings accorgingly

resolved merge conflicts

360c3d8

ShourieG removed request for chrisberkhout, andrewkroh and efd6 August 14, 2024 06:19

ShourieG added 8.16 candidate and removed 8.15 candidate labels Aug 19, 2024

ShourieG added 2 commits August 19, 2024 13:25

resolved merge conflicts

2b1250d

removed system test configs until respective elastic-package changes …

2261431

…are implemented

andrewkroh added the enhancement New feature or request label Aug 19, 2024

updated with main, resolved merge conflicts

5794401

chrisberkhout self-requested a review August 28, 2024 08:22

andrewkroh added the dashboard Relates to a Kibana dashboard bug, enhancement, or modification. label Aug 30, 2024

chrisberkhout reviewed Sep 25, 2024

View reviewed changes

efd6 reviewed Sep 26, 2024

View reviewed changes

ShourieG mentioned this pull request Sep 30, 2024

[Amazon Security Lake] - OCSF v1.1 and how to move foreword #11276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings & system tests #10405

[Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings & system tests #10405

ShourieG commented Jul 8, 2024 •

edited

Loading

elasticmachine commented Aug 9, 2024

elasticmachine commented Aug 19, 2024

elasticmachine commented Aug 26, 2024

elastic-sonarqube bot commented Aug 26, 2024

chrisberkhout left a comment

chrisberkhout Sep 11, 2024

efd6 Sep 25, 2024

chrisberkhout Sep 11, 2024

efd6 left a comment

efd6 Sep 25, 2024

efd6 Sep 25, 2024

efd6 Sep 26, 2024

ShourieG Sep 26, 2024 •

edited

Loading

efd6 Sep 26, 2024

ShourieG Sep 26, 2024

efd6 Sep 26, 2024

efd6 Sep 26, 2024

efd6 Sep 26, 2024

efd6 Sep 26, 2024

efd6 Sep 26, 2024

ShourieG Sep 26, 2024

efd6 Sep 26, 2024

efd6 Sep 26, 2024

	- description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic template support.
	- description: Updated to support OCSF v1.1.0. with major pipeline rework and dynamic mapping support.

[Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings & system tests #10405

Are you sure you want to change the base?

[Amazon Security Lake] - OCSF v1.1 update with major refactor & adding support for dynamic template and mappings & system tests #10405

Conversation

ShourieG commented Jul 8, 2024 • edited Loading

Type of change

Proposed commit message

Checklist

Author's Checklist

NOTE

Approach

How to review this PR

How to test this PR locally

Related issues

System Tests

Screenshots

elasticmachine commented Aug 9, 2024

elasticmachine commented Aug 19, 2024

🚀 Benchmarks report

elasticmachine commented Aug 26, 2024

💚 Build Succeeded

History

elastic-sonarqube bot commented Aug 26, 2024

Quality Gate failed

chrisberkhout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

efd6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShourieG Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShourieG commented Jul 8, 2024 •

edited

Loading

ShourieG Sep 26, 2024 •

edited

Loading