Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Automatic Import] Better recognize (ND)JSON formats and send samplesFormat to the backend #190588

Merged
merged 38 commits into from
Aug 22, 2024

Conversation

ilyannn
Copy link
Contributor

@ilyannn ilyannn commented Aug 15, 2024

Summary

This adds a samplesFormat group to the API. This group is filled out by the frontend when parsing the provided samples and used to set the log parsing specification for the produced integration.

We check this parameter to add toggle to support multiline newline-delimited JSON in the filestream input.

Closes https://github.com/elastic/security-team/issues/10277

Release note

Automatic Import now supports the 'multiline newline-delimited JSON' log sample format for the Filestream input.

Detailed Explanation

We add the optional samplesFormat group to the API, consisting of

  • name,
  • (optional) multiline,
  • and (optional) json_path.

Example values of this parameter:

  • { name: 'ndjson', multiline: false } for a newline-delimited JSON, known as NDJSON (where each entry only takes one line)
  • { name: 'ndjson', multiline: true } for newline-delimited JSON where each entry can span multiline lines
  • { name: 'json', json_path: [] } for valid JSON with the structure [{"key": "message1"}, {"key": "message2"}]
  • { name: 'json', json_path: ['events'] } for valid JSON with the structure {"events": [{"key": "message1"}, {"key": "message2"}]}

The json_path parameter is only relevant for name: 'json' and refers to the path in the original JSON to the array representing the events to ingest. Currently only one level is recognized:

$ echo '{"events": [{"key": "message1"}, {"key": "message2"}]}' | jq '.events[]'
{
  "key": "message1"
}
{
  "key": "message2"
}

Not all combinations of a log format with input type will work; more supported combinations as well as better user feedback on unsupported combinations will come later (see https://github.com/elastic/security-team/issues/10290).

In this PR we add support for the multiline NDJSON format for the fileinput input type. This support comes in the form of the user-changeable toggle under "Advanced Settings" that will be set to on in cases where we multiline NDJSON format

Example with the Crowdstrike Falcon Events integration (toggle is present, and defaults to on):

Log sample file

image

Checklist

  • Update the pictures for the new wording.
  • Unit or functional tests were updated or added to match the most common scenarios

Note we do not provide i18n support for texts within the integration:

Risk Matrix

None identified.

For maintainers

@ilyannn ilyannn added Team:Security-Scalability Team label for Security Integrations Scalability Team backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) labels Aug 15, 2024
@ilyannn ilyannn changed the title Send the log type [Automatic Import] Send the log type Aug 15, 2024
@ilyannn ilyannn changed the title [Automatic Import] Send the log type [Automatic Import] Provide the log format to the backend Aug 15, 2024
@ilyannn ilyannn marked this pull request as ready for review August 16, 2024 01:19
@ilyannn ilyannn requested a review from a team as a code owner August 16, 2024 01:19
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-scalability (Team:Security-Scalability)

@ilyannn ilyannn added the release_note:skip Skip the PR/issue when compiling release notes label Aug 16, 2024
@bhapas bhapas added the v8.16.0 label Aug 19, 2024
Copy link
Member

@P1llus P1llus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Awesome to see you already added in the path detection as well, will be very useful when we start adding conditions for some of the other inputs!

Just one minor nit, and will give it a test in the UI before adding the last approval 👍

@ilyannn
Copy link
Contributor Author

ilyannn commented Aug 20, 2024

@elasticmachine merge upstream

Copy link
Contributor Author

@ilyannn ilyannn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've addressed all comments

@ilyannn ilyannn changed the title [Automatic Import] Provide the log sample format to the backend [Automatic Import] Provide the log samples format to the backend Aug 20, 2024
@ilyannn ilyannn added release_note:enhancement and removed release_note:skip Skip the PR/issue when compiling release notes labels Aug 20, 2024
@ilyannn ilyannn changed the title [Automatic Import] Provide the log samples format to the backend [Automatic Import] Better recognize (ND)JSON formats and send samplesFormat to the backend Aug 20, 2024
Copy link
Contributor

@bhapas bhapas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally with json ndjson multiline samples. LGTM

Few questions before we can merge this.

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
integrationAssistant 40 41 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
integrationAssistant 937.3KB 938.3KB +978.0B
Unknown metric groups

API count

id before after diff
integrationAssistant 47 49 +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ilyannn

Copy link
Contributor

@bhapas bhapas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested locally

@ilyannn ilyannn merged commit 2a8b6d0 into elastic:main Aug 22, 2024
22 checks passed
@ilyannn ilyannn deleted the auto-import/json-or-ndjson branch August 22, 2024 01:25
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Aug 22, 2024
…Format to the backend (elastic#190588)

## Summary

This adds a `samplesFormat` group to the API. This group is filled out
by the frontend when parsing the provided samples and used to set the
log parsing specification for the produced integration.

We check this parameter to add toggle to support multiline
newline-delimited JSON in the filestream input.

## Release note

Automatic Import now supports the 'multiline newline-delimited JSON' log
sample format for the Filestream input.

## Detailed Explanation

We add the optional `samplesFormat` group to the API, consisting of
 - `name`,
 - (optional) `multiline`,
 - and (optional) `json_path`.

Example values of this parameter:

- `{ name: 'ndjson', multiline: false }` for a newline-delimited JSON,
known as [NDJSON](https://github.com/ndjson/ndjson-spec) (where each
entry only takes one line)
- `{ name: 'ndjson', multiline: true }` for newline-delimited JSON where
each entry can span multiline lines
- `{ name: 'json', json_path: [] }` for valid JSON with the structure
`[{"key": "message1"}, {"key": "message2"}]`
- `{ name: 'json', json_path: ['events'] }` for valid JSON with the
structure `{"events": [{"key": "message1"}, {"key": "message2"}]}`

The `json_path` parameter is only relevant for `name: 'json'` and refers
to the path in the original JSON to the array representing the events to
ingest. Currently only one level is recognized:

Not all combinations of a log format with input type will work; more
supported combinations as well as better user feedback on unsupported
combinations will come later (see
elastic/security-team#10290).

In this PR we add support for the multiline NDJSON format for the
`fileinput` input type. This support comes in the form of the
user-changeable toggle under "Advanced Settings" that will be set to on
in cases where we multiline NDJSON format

---------

Co-authored-by: Marius Iversen <marius.iversen@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit 2a8b6d0)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.15

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Aug 22, 2024
…samplesFormat to the backend (#190588) (#191040)

# Backport

This will backport the following commits from `main` to `8.15`:
- [[Automatic Import] Better recognize (ND)JSON formats and send
samplesFormat to the backend
(#190588)](#190588)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ilya
Nikokoshev","email":"ilya.nikokoshev@elastic.co"},"sourceCommit":{"committedDate":"2024-08-22T01:25:12Z","message":"[Automatic
Import] Better recognize (ND)JSON formats and send samplesFormat to the
backend (#190588)\n\n## Summary\r\n\r\nThis adds a `samplesFormat` group
to the API. This group is filled out\r\nby the frontend when parsing the
provided samples and used to set the\r\nlog parsing specification for
the produced integration.\r\n\r\nWe check this parameter to add toggle
to support multiline\r\nnewline-delimited JSON in the filestream
input.\r\n\r\n## Release note\r\n\r\nAutomatic Import now supports the
'multiline newline-delimited JSON' log\r\nsample format for the
Filestream input.\r\n\r\n## Detailed Explanation\r\n\r\nWe add the
optional `samplesFormat` group to the API, consisting of \r\n - `name`,
\r\n - (optional) `multiline`, \r\n - and (optional)
`json_path`.\r\n\r\nExample values of this parameter:\r\n\r\n- `{ name:
'ndjson', multiline: false }` for a newline-delimited JSON,\r\nknown as
[NDJSON](https://github.com/ndjson/ndjson-spec) (where each\r\nentry
only takes one line)\r\n- `{ name: 'ndjson', multiline: true }` for
newline-delimited JSON where\r\neach entry can span multiline lines\r\n-
`{ name: 'json', json_path: [] }` for valid JSON with the
structure\r\n`[{\"key\": \"message1\"}, {\"key\": \"message2\"}]`\r\n-
`{ name: 'json', json_path: ['events'] }` for valid JSON with
the\r\nstructure `{\"events\": [{\"key\": \"message1\"}, {\"key\":
\"message2\"}]}`\r\n\r\nThe `json_path` parameter is only relevant for
`name: 'json'` and refers\r\nto the path in the original JSON to the
array representing the events to\r\ningest. Currently only one level is
recognized:\r\n\r\nNot all combinations of a log format with input type
will work; more\r\nsupported combinations as well as better user
feedback on unsupported\r\ncombinations will come later
(see\r\nhttps://github.com/elastic/security-team/issues/10290).\r\n\r\nIn
this PR we add support for the multiline NDJSON format for
the\r\n`fileinput` input type. This support comes in the form of
the\r\nuser-changeable toggle under \"Advanced Settings\" that will be
set to on\r\nin cases where we multiline NDJSON
format\r\n\r\n---------\r\n\r\nCo-authored-by: Marius Iversen
<marius.iversen@elastic.co>\r\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"2a8b6d0a4490fbf09ccd3d03ab1b3a1fcfa9ec1c","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","backport:prev-minor","v8.16.0","Team:Security-Scalability"],"title":"[Automatic
Import] Better recognize (ND)JSON formats and send samplesFormat to the
backend","number":190588,"url":"https://github.com/elastic/kibana/pull/190588","mergeCommit":{"message":"[Automatic
Import] Better recognize (ND)JSON formats and send samplesFormat to the
backend (#190588)\n\n## Summary\r\n\r\nThis adds a `samplesFormat` group
to the API. This group is filled out\r\nby the frontend when parsing the
provided samples and used to set the\r\nlog parsing specification for
the produced integration.\r\n\r\nWe check this parameter to add toggle
to support multiline\r\nnewline-delimited JSON in the filestream
input.\r\n\r\n## Release note\r\n\r\nAutomatic Import now supports the
'multiline newline-delimited JSON' log\r\nsample format for the
Filestream input.\r\n\r\n## Detailed Explanation\r\n\r\nWe add the
optional `samplesFormat` group to the API, consisting of \r\n - `name`,
\r\n - (optional) `multiline`, \r\n - and (optional)
`json_path`.\r\n\r\nExample values of this parameter:\r\n\r\n- `{ name:
'ndjson', multiline: false }` for a newline-delimited JSON,\r\nknown as
[NDJSON](https://github.com/ndjson/ndjson-spec) (where each\r\nentry
only takes one line)\r\n- `{ name: 'ndjson', multiline: true }` for
newline-delimited JSON where\r\neach entry can span multiline lines\r\n-
`{ name: 'json', json_path: [] }` for valid JSON with the
structure\r\n`[{\"key\": \"message1\"}, {\"key\": \"message2\"}]`\r\n-
`{ name: 'json', json_path: ['events'] }` for valid JSON with
the\r\nstructure `{\"events\": [{\"key\": \"message1\"}, {\"key\":
\"message2\"}]}`\r\n\r\nThe `json_path` parameter is only relevant for
`name: 'json'` and refers\r\nto the path in the original JSON to the
array representing the events to\r\ningest. Currently only one level is
recognized:\r\n\r\nNot all combinations of a log format with input type
will work; more\r\nsupported combinations as well as better user
feedback on unsupported\r\ncombinations will come later
(see\r\nhttps://github.com/elastic/security-team/issues/10290).\r\n\r\nIn
this PR we add support for the multiline NDJSON format for
the\r\n`fileinput` input type. This support comes in the form of
the\r\nuser-changeable toggle under \"Advanced Settings\" that will be
set to on\r\nin cases where we multiline NDJSON
format\r\n\r\n---------\r\n\r\nCo-authored-by: Marius Iversen
<marius.iversen@elastic.co>\r\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"2a8b6d0a4490fbf09ccd3d03ab1b3a1fcfa9ec1c"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/190588","number":190588,"mergeCommit":{"message":"[Automatic
Import] Better recognize (ND)JSON formats and send samplesFormat to the
backend (#190588)\n\n## Summary\r\n\r\nThis adds a `samplesFormat` group
to the API. This group is filled out\r\nby the frontend when parsing the
provided samples and used to set the\r\nlog parsing specification for
the produced integration.\r\n\r\nWe check this parameter to add toggle
to support multiline\r\nnewline-delimited JSON in the filestream
input.\r\n\r\n## Release note\r\n\r\nAutomatic Import now supports the
'multiline newline-delimited JSON' log\r\nsample format for the
Filestream input.\r\n\r\n## Detailed Explanation\r\n\r\nWe add the
optional `samplesFormat` group to the API, consisting of \r\n - `name`,
\r\n - (optional) `multiline`, \r\n - and (optional)
`json_path`.\r\n\r\nExample values of this parameter:\r\n\r\n- `{ name:
'ndjson', multiline: false }` for a newline-delimited JSON,\r\nknown as
[NDJSON](https://github.com/ndjson/ndjson-spec) (where each\r\nentry
only takes one line)\r\n- `{ name: 'ndjson', multiline: true }` for
newline-delimited JSON where\r\neach entry can span multiline lines\r\n-
`{ name: 'json', json_path: [] }` for valid JSON with the
structure\r\n`[{\"key\": \"message1\"}, {\"key\": \"message2\"}]`\r\n-
`{ name: 'json', json_path: ['events'] }` for valid JSON with
the\r\nstructure `{\"events\": [{\"key\": \"message1\"}, {\"key\":
\"message2\"}]}`\r\n\r\nThe `json_path` parameter is only relevant for
`name: 'json'` and refers\r\nto the path in the original JSON to the
array representing the events to\r\ningest. Currently only one level is
recognized:\r\n\r\nNot all combinations of a log format with input type
will work; more\r\nsupported combinations as well as better user
feedback on unsupported\r\ncombinations will come later
(see\r\nhttps://github.com/elastic/security-team/issues/10290).\r\n\r\nIn
this PR we add support for the multiline NDJSON format for
the\r\n`fileinput` input type. This support comes in the form of
the\r\nuser-changeable toggle under \"Advanced Settings\" that will be
set to on\r\nin cases where we multiline NDJSON
format\r\n\r\n---------\r\n\r\nCo-authored-by: Marius Iversen
<marius.iversen@elastic.co>\r\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"2a8b6d0a4490fbf09ccd3d03ab1b3a1fcfa9ec1c"}}]}]
BACKPORT-->

Co-authored-by: Ilya Nikokoshev <ilya.nikokoshev@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) release_note:enhancement Team:Security-Scalability Team label for Security Integrations Scalability Team v8.15.1 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants