Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest ES structured audit logs #10352

Merged
merged 9 commits into from
Jan 29, 2019

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Jan 27, 2019

This is a "forward port" of #8852.

In #8852, we taught Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conformed to the 6.x mapping structure.

In this PR we also teach Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conform to the 7.0 (ECS-based) mapping structure.

@ycombinator ycombinator added in progress Pull request is currently in progress. module Filebeat Filebeat v7.0.0 Feature:Stack Monitoring labels Jan 27, 2019
@ycombinator ycombinator requested review from a team as code owners January 27, 2019 14:09
@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring

@ycombinator ycombinator force-pushed the fb-es-structured-audit-log branch from 2c0ff8b to e9e01be Compare January 27, 2019 15:49
@ycombinator ycombinator changed the title [WIP] Ingest ES structured audit logs Ingest ES structured audit logs Jan 27, 2019
@ycombinator ycombinator added review and removed in progress Pull request is currently in progress. labels Jan 27, 2019
@ycombinator ycombinator requested a review from ruflin January 27, 2019 16:34
@ycombinator
Copy link
Contributor Author

ycombinator commented Jan 27, 2019

While working on this PR I realized that we don't have sample lines for the structured elasticsearch audit log containing a request body (which is supposed to be parsed into the http.request.body.content field). I'm working with @albertzaharovits to get such a sample and will incorporate it into follow up PRs (for master and 6.x).

@ycombinator
Copy link
Contributor Author

jenkins, test this

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the ecs-migration.yml file?

@ycombinator
Copy link
Contributor Author

Can you also update the ecs-migration.yml file?

Updated now. Do I need to run any make/mage commands as well?

@ycombinator
Copy link
Contributor Author

@ruflin I'm not sure what you meant by #10352 (review):

Can you also update the ecs-migration.yml file?

Could you please clarify? Otherwise this PR is ready for review, IMHO.

@ycombinator
Copy link
Contributor Author

jenkins, test this

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed very minor things. Looking pretty good!

},
{
"dot_expander": {
"field": "origin.address",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead rename this field to source.address (and keep it around).

{
"rename": {
"field": "elasticsearch.audit.user.name",
"target_field": "user.name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we dot_expand in place? If the original key is "user.name", I would think that the output to "user": { "name": "..." } doesn't conflict.

It would simplify the code in a few places where you have the same pattern happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what you mean by doing "dot_expand in place"? The dot_expander processor right above this one is necessary to go from:

{
  "elasticsearch.audit.user.name": "foo"
}

to:

{
  "elasticsearch.audit.user": {
    "name": "foo"
  }
}

That then allows us to call the rename processor as we are doing over here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used dot_expander yet, so perhaps I'm misunderstanding it. But I was under the impression that the following did the equivalent of the 2 processors above:

{ "dot_expander": { "field": "user.name" } }

And in cases where the output isn't the object equivalent of the dotted notation, you would use path this way, to get the equivalent of the two node.name processors above:

{ "dot_expander": { "field": "node.name", "path": "elasticsearch.node" } }

If that's not the case, you can ignore this ;-)

}
},
{
"date": {
"field": "elasticsearch.audit.timestamp",
"field": "elasticsearch.audit.@timestamp",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prior to grabbing the real timestamp from the log, could you populate event.created with Beat's @timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already being done in the very first processor in this pipeline. It's collapsed in the diff since nothing changed there:

https://github.com/elastic/beats/pull/10352/files#diff-a007340015953dcf31f9d48f74358ac3R4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that 👍

@webmat
Copy link
Contributor

webmat commented Jan 28, 2019

Also, the dev-tools/ecs-migration.yml file is where we document the field migrations as you're doing here.

It's not the format as the various fields.yml fields Beats uses to generate templates. Rather it's optimized for us to automate some of this migration (e.g. rename fields in dashboards).

But in the ecs-migration.yml file, you should document all of the fields for which you have migration: true in the fields.yml file. This file can indicate:

  • whether the field could get a forward-compatible alias in 6.7, provided the mapping is 1:1, with alias6: true
  • whether the field in 7.x will alias to the old field. Most of the time this is true. Use alias: true.
  • in case of duration scale differences (ms to ns), you can use scale:
  • if your field is only present in one beat, you identify it with beat: filebeat, as should be the case here.

@ycombinator
Copy link
Contributor Author

@webmat Thanks for the detailed explanation of how to use the ecs-migration.yml file — really appreciate it written down so I can go back and look at it in the future as well!

I've addressed all your comments in the review. Only one of them resulted in a code change, however. So you might want to look at my replies to your other two comments.

Thanks!

ycombinator added a commit that referenced this pull request Jan 29, 2019
Follow up to #10352 per #10352 (comment):

> While working on this PR I realized that we don't have sample lines for the **structured** elasticsearch audit log containing a request body (which is supposed to be parsed into the `http.request.body.content` field). I'm working with `@albertzaharovits` to get such a sample and will incorporate it into follow up PRs (for `master` and `6.x`).

Accordingly, this PR adds sample lines to the structured and unstructured log file test fixtures for the `elasticsearch/audit` fileset and teaches the fileset to parse any new fields encountered in these sample lines.
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Muy understanding of dot_expander may be flawed, haven't used it before. So feel free to ignore, if what I'm saying in my response below doesn't make sense ;-)

{
"rename": {
"field": "elasticsearch.audit.user.name",
"target_field": "user.name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used dot_expander yet, so perhaps I'm misunderstanding it. But I was under the impression that the following did the equivalent of the 2 processors above:

{ "dot_expander": { "field": "user.name" } }

And in cases where the output isn't the object equivalent of the dotted notation, you would use path this way, to get the equivalent of the two node.name processors above:

{ "dot_expander": { "field": "node.name", "path": "elasticsearch.node" } }

If that's not the case, you can ignore this ;-)

}
},
{
"date": {
"field": "elasticsearch.audit.timestamp",
"field": "elasticsearch.audit.@timestamp",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that 👍

@ycombinator ycombinator merged commit 5e460ed into elastic:master Jan 29, 2019
ycombinator added a commit that referenced this pull request Jan 30, 2019
Follow up to #10352 per #10352 (comment):

> While working on this PR I realized that we don't have sample lines for the **structured** elasticsearch audit log containing a request body (which is supposed to be parsed into the `http.request.body.content` field). I'm working with `@albertzaharovits` to get such a sample and will incorporate it into follow up PRs (for `master` and `6.x`).

Accordingly, this PR adds sample lines to the structured and unstructured log file test fixtures for the `elasticsearch/audit` fileset and teaches the fileset to parse any new fields encountered in these sample lines.
ycombinator added a commit that referenced this pull request Feb 1, 2019
This PR teaches the `elasticsearch/slowlog` fileset to ingest structured Elasticsearch search and indexing slow logs.

This PR takes the same approach as #10352, in that it creates an entrypoint pipeline, `pipeline.json`, that delegates further processing of a log entry depending on what it sees as the first character of the entry:
- If the first character is `{`, it assumes the log line is structured as JSON and delegates further processing to `pipeline-json.json`.
- Else, it assumes the log line is plaintext and delegates further processing to `pipeline-plaintext.json`.
@ycombinator ycombinator deleted the fb-es-structured-audit-log branch December 25, 2019 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants