Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Filebeat Traefik module to handle mixed Common and Combined Log Format #8768

Merged
merged 10 commits into from
Dec 6, 2018

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Oct 26, 2018

This is a continuation of PRs #6488 and #6136

Previous PR's were misleading because there's no difference between Traefik 1.5 and 1.7 log formats but as you can see in this issue Traefik could miss some quote which would make our parser to fail. So those PR's were to address that issue.

Traefik uses by default since 1.4 version (at least) the Combined Log Format https://httpd.apache.org/docs/trunk/logs.html#combined also used in Apache. In 1.7 it still uses this format but you can also setup Common Log Format or JSON. Maybe we should open an issue to support JSON in the future.

Solves #8015 and #6111

  • frontend_name field was added so Traefik frontend_name is missing #8015 is solved.
  • The Grok pattern have been improved to handle the case of Traefik module for Filebeat is not working as expected #6111 and the log lines described there have been added to the test cases successfully so the issue is also solved.
  • Added user_identifier field which was hardcoded to '-' before
  • Added duration field which was missing too
  • Added request_count field
  • Numeric fields added as string have been converted to long like response_code

@sayden sayden added in progress Pull request is currently in progress. module Filebeat Filebeat enhancement labels Oct 26, 2018
@sayden
Copy link
Contributor Author

sayden commented Oct 26, 2018

GeoIP plugin error, as usual

@ruflin
Copy link
Contributor

ruflin commented Oct 29, 2018

The geoip problem should be solve for quite some time now. Make sure to run the most recent master version locally and run the test command with GENERATE=1.

For the IP address: I would prefer to keep the "original" IP addresses instead of defaulting to 1.2.3.4 in this PR.

@ruflin
Copy link
Contributor

ruflin commented Oct 29, 2018

Can you add a CHANGELOG entry?

I also added the needs_backport label. I'm thinking we should backport this to 6.x and potentially event 6.5. @jsoriano WDYT?

@ruflin ruflin added the needs_backport PR is waiting to be backported to other branches. label Oct 29, 2018
@jsoriano
Copy link
Member

@ruflin it'd be great if it could make it to 6.5, yes

@sayden
Copy link
Contributor Author

sayden commented Oct 29, 2018

@ruflin about the 1.7 format that you mention here. Do you think it should be added to this PR already or we open a new one once this is merged?

@ruflin
Copy link
Contributor

ruflin commented Oct 30, 2018

@sayden Let's keep this focused on 1.5 for now and follow up on 1.7 afterwards. Let me know if you need help with the generated data.

@ruflin ruflin added the Team:Integrations Label for the Integrations team label Nov 23, 2018
@jsoriano
Copy link
Member

This seems to need an update.

@sayden
Copy link
Contributor Author

sayden commented Nov 27, 2018

Apart from the fields.go file, I'm carefully looking at the rebase as there were some updated in the expected JSON, I think coming from the ECS migration

@ruflin ruflin added Team:Integrations Label for the Integrations team and removed Team:Integrations Label for the Integrations team labels Nov 27, 2018
@ruflin
Copy link
Contributor

ruflin commented Nov 27, 2018

This will soon conflict with this PR. @webmat for awareness: #9005 I would suggest to get this one in and then ECS to make sure we cover all the fields but please coordinate together.

@webmat
Copy link
Contributor

webmat commented Nov 27, 2018

The race is on, to merge first! ;-)

Kidding, I don't mind if this gets in first. Looks like this PR needs a rebase, and has test failures, though.

I can keep my Traefik PR for last to give this time to get in, but note that I'd like to merge mine today.

@sayden did you expect to be finishing this up today?

@sayden sayden force-pushed the traefik-additional-logs-parsing branch from 461872a to 100baa6 Compare November 27, 2018 15:12
@sayden
Copy link
Contributor Author

sayden commented Nov 27, 2018

@webmat If you want and you are close to finish just go on and I'll resolve conflicts later. I prefer to be careful with small details and take my time 👍

@webmat
Copy link
Contributor

webmat commented Nov 27, 2018

Perfect.

Note I may not get to my own Traefik PR today, as I need to work on something else before this.

}
},
{
"rename": {
"remove": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we remove this one now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After spending some minutes looking at it without understanding very well why either, I think that in a first look it made me think that it was a typo, because the field is set in the previous rule. Then, in a second look (maybe I couple of days later) I probably saw the "target_field": "traefik.access.user_agent.original" and I also thought it was some kind of typo that shouldn't be there and I delete it.

I guess that the approach here would be to not leave it again as a rename to preserve the original user_agent extracted from the Grok pattern, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please keep this field around

],
"ignore_missing": true
"(?:%{NUMBER:traefik.access.body_sent.bytes}|-)( \"%{DATA:traefik.access.referrer}\")?( \"%{DATA:traefik.access.agent}\")?(?:%{NUMBER:traefik.access.request_count}|-)?( \"%{DATA:traefik.access.frontend_name}\")?( \"%{DATA:traefik.access.backend_url}\")?",
"%{IPORHOST:traefik.access.remote_ip} - %{DATA:traefik.access.user_name} \\[%{HTTPDATE:traefik.access.time}\\] \"%{WORD:traefik.access.method} %{DATA:traefik.access.url} HTTP/%{NUMBER:traefik.access.http_version}\" (?:%{NUMBER:traefik.access.response_code}|-) (?:%{NUMBER:traefik.access.body_sent.bytes}|-)( \"%{DATA:traefik.access.referrer}\")?( \"%{DATA:traefik.access.agent}\")?(?: %{NUMBER:traefik.access.request_count}| -)?( \"%{DATA:traefik.access.frontend_name}\"| -)?( \"%{DATA:traefik.access.backend_url}\"| -)?( %{NUMBER:traefik.access.response_time}ms)?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we add 2 new grok patterns: How are they different? Below we only add 1 new log line to test. Would be good to have at least 1 for each new pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also think so. I'll add a new input test line.

@sayden sayden force-pushed the traefik-additional-logs-parsing branch from a68ab30 to 17361bb Compare November 30, 2018 08:52
@sayden
Copy link
Contributor Author

sayden commented Nov 30, 2018

After digging deeply into this issue I discovered that there's nothing like Traefik logs format but that they have always used CLF - Common Log Format or Combined Log Format.

The problem is that Traefik had a bug prior 1.5 where if the request_referrer field was empty it didn't match the requirements of CLF because quotes were missing. Fix is merged here.

Current version of Traefik (1.7) uses Combined Log Format as default output format.

I suggest to leave the pattern with Combined Log Format to "force" Traefik users to update. It doesn't make sense that we have a modified Grok pattern just to cover some bug in a different product that we will have track to eventually delete it. Option 2 is to try to cover both: combined and common log format although it seems that Common is not used in Traefik right now.

WDYT @ruflin?

@sayden sayden force-pushed the traefik-additional-logs-parsing branch from ed9f4ea to 6737664 Compare December 5, 2018 16:09
@@ -0,0 +1,3171 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayden Is this file a leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting on CI to go green.

@sayden sayden merged commit 0e9054c into elastic:master Dec 6, 2018
sayden added a commit to sayden/beats that referenced this pull request Dec 6, 2018
…g Format (elastic#8768)

* Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now.
* Added more log lines, including one in Common Log Format
* Added user_identifier field which was hardcoded to '-' before
* Added log test inputs where the user_name and user_identifier were set and not set.
* Added duration field which was missing
* Numeric fields added as string have been converted to long
* Added request count field
* Added two log lines more that covers issue elastic#6111
* Make bytes_sent field of type long. Most Grok expressions are optional now
* Added CHANGELOG entry

(cherry picked from commit 0e9054c)
@sayden sayden added v6.6.0 and removed needs_backport PR is waiting to be backported to other branches. labels Dec 6, 2018
@ruflin ruflin added review and removed in progress Pull request is currently in progress. labels Dec 7, 2018
@ruflin
Copy link
Contributor

ruflin commented Dec 7, 2018

@sayden I suggest we follow up this PR with moving the dissect processing to the ingest pipeline. It should still be dissect as it's more efficient but like this we have everything in one place if we want to do reprocessing for example. We can do this change for 7.0 but not backport it.

sayden added a commit that referenced this pull request Dec 7, 2018
…xed Common and Combined Log Format (#9419)

* Update Filebeat Traefik module to handle mixed Common and Combined Log Format (#8768)

* Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now.
* Added more log lines, including one in Common Log Format
* Added user_identifier field which was hardcoded to '-' before
* Added log test inputs where the user_name and user_identifier were set and not set.
* Added duration field which was missing
* Numeric fields added as string have been converted to long
* Added request count field
* Added two log lines more that covers issue #6111
* Make bytes_sent field of type long. Most Grok expressions are optional now
* Added CHANGELOG entry

(cherry picked from commit 0e9054c)

* Updated fields.go file
* Upload pre-ECS generated JSON expected file which differs from master
sayden added a commit to sayden/beats that referenced this pull request Dec 7, 2018
…g Format (elastic#8768)

 * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now.
* Added more log lines, including one in Common Log Format
* Added user_identifier field which was hardcoded to '-' before
* Added log test inputs where the user_name and user_identifier were set and not set.
* Added duration field which was missing
* Numeric fields added as string have been converted to long
* Added request count field
* Added two log lines more that covers issue elastic#6111
* Make bytes_sent field of type long. Most Grok expressions are optional now
* Added CHANGELOG entry

(cherry picked from commit 0e9054c)

fields.go file is updated and pre-ECS generated JSON expected file which differs from master is used

# Conflicts:
#	filebeat/include/fields.go
#	filebeat/module/traefik/access/test/test.log-expected.json
@sayden sayden added the v6.5.3 label Dec 7, 2018
sayden added a commit that referenced this pull request Dec 10, 2018
…xed Common and Combined Log Format (#9439)

* Update Filebeat Traefik module to handle mixed Common and Combined Log Format (#8768)

 * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now.
* Added more log lines, including one in Common Log Format
* Added user_identifier field which was hardcoded to '-' before
* Added log test inputs where the user_name and user_identifier were set and not set.
* Added duration field which was missing
* Numeric fields added as string have been converted to long
* Added request count field
* Added two log lines more that covers issue #6111
* Make bytes_sent field of type long. Most Grok expressions are optional now
* Added CHANGELOG entry

(cherry picked from commit 0e9054c)

fields.go file is updated and pre-ECS generated JSON expected file which differs from master is used

# Conflicts:
#	filebeat/include/fields.go
#	filebeat/module/traefik/access/test/test.log-expected.json
@sayden sayden deleted the traefik-additional-logs-parsing branch July 30, 2022 05:52
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…ndle mixed Common and Combined Log Format (elastic#9439)

* Update Filebeat Traefik module to handle mixed Common and Combined Log Format (elastic#8768)

 * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now.
* Added more log lines, including one in Common Log Format
* Added user_identifier field which was hardcoded to '-' before
* Added log test inputs where the user_name and user_identifier were set and not set.
* Added duration field which was missing
* Numeric fields added as string have been converted to long
* Added request count field
* Added two log lines more that covers issue elastic#6111
* Make bytes_sent field of type long. Most Grok expressions are optional now
* Added CHANGELOG entry

(cherry picked from commit 5db2afb)

fields.go file is updated and pre-ECS generated JSON expected file which differs from master is used

# Conflicts:
#	filebeat/include/fields.go
#	filebeat/module/traefik/access/test/test.log-expected.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants