Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filebeat elasticsearch module ingest timezone #13367

Merged
merged 6 commits into from
Sep 4, 2019
Merged

Fix filebeat elasticsearch module ingest timezone #13367

merged 6 commits into from
Sep 4, 2019

Conversation

pragkent
Copy link
Contributor

@pragkent pragkent commented Aug 28, 2019

This pull request fixes timezone parsing for elasticsearch module.

Just like #13308 fixes ingest timezone parsing for system module.

@pragkent pragkent requested a review from a team as a code owner August 28, 2019 09:26
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@jsoriano jsoriano added Team:Integrations Label for the Integrations team Filebeat Filebeat module needs_backport PR is waiting to be backported to other branches. review v7.3.2 v7.4.0 labels Aug 29, 2019
@jsoriano
Copy link
Member

ok to test

jsoriano
jsoriano previously approved these changes Aug 29, 2019
@jsoriano jsoriano dismissed their stale review August 29, 2019 11:07

I think it doesn't fix the issue for plain text logs :(

@jsoriano
Copy link
Member

@pragkent I was testing this branch with some of the test files we have and I have seen that for this log line:

[2018-05-17T08:19:35,939][INFO ][o.e.n.Node               ] [] initializing ...

The timestamp with timezone Europe/Berlin is parsed as:

"@timestamp": "2018-05-17T10:19:35.939+02:00"

But it should be:

"@timestamp": "2018-05-17T08:19:35.939+02:00"

I think that the same happens for any log without timestamp.

Something we could evaluate is to have different date processors for json and plain text files, as there are different pipelines for any of them.

@pragkent
Copy link
Contributor Author

I'm confused about the behavior of the date processor. I checked the source code and found it will use the last pattern that matches the time string instead of the first one. This is the reason why the timezone is messed up. The json log works fine just because timezone is included in the log line.

If we remove the ISO8601 format from the second date processor, then the timezone will be handled correctly.

ES DateProcessor

@jsoriano
Copy link
Member

jsoriano commented Sep 2, 2019

If we remove the ISO8601 format from the second date processor, then the timezone will be handled correctly.

This can be an option, but then I guess that we should ignore errors in this processor because timestamps parsed in the previous date processor as ISO8601 will fail here.

In any case as the data format in JSON and plain text logs are different, what do you think about moving date parsing to the specific pipelines, pipeline-json.json and pipeline-plaintext.json?

@pragkent
Copy link
Contributor Author

pragkent commented Sep 3, 2019

yes, I think you are right.
Moving date parsing to specific pipelines could make it better.
Let me try to move it around, and let's see the result then.

@pragkent
Copy link
Contributor Author

pragkent commented Sep 3, 2019

I've moved these out to specific pipeline. When the log is in JSON format, timezone is already included, so I deleted event.timezone related date processor in json pipeline.

Please have a look.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -294,6 +295,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Update PAN-OS fileset to use the ECS NAT fields. {issue}13320[13320] {pull}13330[13330]
- Add fields to the Zeek DNS fileset for ECS DNS. {issue}13320[13320] {pull}13324[13324]
- Add container image in Kubernetes metadata {pull}13356[13356] {issue}12688[12688]
- Add timezone information to apache error fileset. {issue}12772[12772] {pull}13304[13304]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be removed from this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my fault, I did the merge, let me fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I wanted to move the changelog entry 🙂

"formats": [
"yyyy-MM-dd'T'HH:mm:ss,SSS"
],
"ignore_failure": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this processor and the next one are both acting on the same field, perhaps we could add a "if": "ctx.event.timezone == null" property to this processor for a bit of extra clarity?

Copy link
Contributor Author

@pragkent pragkent Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to not add "if": "ctx.event.timezone == null", because:

  1. other modules don't have "if": "ctx.event.timezone == null" added, we'd better keep this convention
  2. in most cases, event.timezone was not set. without "if": "ctx.event.timezone == null" would make the second processor looks like a optional path instead of a alternative path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @ycombinator is right, we could add this if, as only one of the processors is needed.

  • other modules don't have "if": "ctx.event.timezone == null" added, we'd better keep this convention

In previous versions both processors were meant to be executed, one to parse the date, and the next one to apply the timezone. We (well, you @pragkent 🙂 ) found this approach is not correct in many cases, so now we are duplicating the date processor, first option only parses the date, and second option parses the date with a timezone if available.

  • in most cases, event.timezone was not set. without "if": "ctx.event.timezone == null" would make the second processor looks like a optional path instead of a alternative path

Actually we are setting event.timezone by default since ~7.2 in all filebeat events, so in most cases both date processors are executed even if only the last one is needed.

Said that, as we had it quite tested by both @pragkent and me, and it solves an existing issue, I'd go on with merging this change as is, and have a follow up PR to review the conditions in the pipelines we have recently changed to fix this same issue. This way we keep a common convention for this.

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jsoriano jsoriano merged commit 31e9a1e into elastic:master Sep 4, 2019
@jsoriano jsoriano removed the needs_backport PR is waiting to be backported to other branches. label Sep 4, 2019
jsoriano pushed a commit to jsoriano/beats that referenced this pull request Sep 4, 2019
jsoriano pushed a commit to jsoriano/beats that referenced this pull request Sep 4, 2019
jsoriano added a commit that referenced this pull request Sep 5, 2019
(cherry picked from commit 31e9a1e)

Co-authored-by: Kent Wang <pragkent@gmail.com>
jsoriano added a commit that referenced this pull request Sep 5, 2019
(cherry picked from commit 31e9a1e)

Co-authored-by: Kent Wang <pragkent@gmail.com>
@lucabelluccini
Copy link
Contributor

Hello,
The Audit JSON content is not ISO8601, so needs the same extra treatment like the plaintext.
E.g.

{"@timestamp":"2019-09-05T14:02:37,921", "node.id":"UwRu4mReRtyJO1-FWAPvIQ", "event.type":"transport", "event.action":"authentication_success", "user.name":"_system", "origin.type":"local_node", "origin.address":"127.0.0.1:9300", "realm":"__fallback", "request.id":"474ZciqtQteOhjLO3OdZIw", "action":"indices:monitor/stats", "request.name":"IndicesStatsRequest"}

leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…astic#13494)

(cherry picked from commit 871ce17)

Co-authored-by: Kent Wang <pragkent@gmail.com>
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…astic#13495)

(cherry picked from commit 871ce17)

Co-authored-by: Kent Wang <pragkent@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants