-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Filebeat Traefik module to handle mixed Common and Combined Log Format #8768
Update Filebeat Traefik module to handle mixed Common and Combined Log Format #8768
Conversation
GeoIP plugin error, as usual |
The geoip problem should be solve for quite some time now. Make sure to run the most recent master version locally and run the test command with For the IP address: I would prefer to keep the "original" IP addresses instead of defaulting to 1.2.3.4 in this PR. |
Can you add a CHANGELOG entry? I also added the |
@ruflin it'd be great if it could make it to 6.5, yes |
@sayden Let's keep this focused on 1.5 for now and follow up on 1.7 afterwards. Let me know if you need help with the generated data. |
This seems to need an update. |
Apart from the |
The race is on, to merge first! ;-) Kidding, I don't mind if this gets in first. Looks like this PR needs a rebase, and has test failures, though. I can keep my Traefik PR for last to give this time to get in, but note that I'd like to merge mine today. @sayden did you expect to be finishing this up today? |
461872a
to
100baa6
Compare
@webmat If you want and you are close to finish just go on and I'll resolve conflicts later. I prefer to be careful with small details and take my time 👍 |
Perfect. Note I may not get to my own Traefik PR today, as I need to work on something else before this. |
} | ||
}, | ||
{ | ||
"rename": { | ||
"remove": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we remove this one now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending some minutes looking at it without understanding very well why either, I think that in a first look it made me think that it was a typo, because the field is set in the previous rule. Then, in a second look (maybe I couple of days later) I probably saw the "target_field": "traefik.access.user_agent.original"
and I also thought it was some kind of typo that shouldn't be there and I delete it.
I guess that the approach here would be to not leave it again as a rename to preserve the original user_agent
extracted from the Grok pattern, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please keep this field around
], | ||
"ignore_missing": true | ||
"(?:%{NUMBER:traefik.access.body_sent.bytes}|-)( \"%{DATA:traefik.access.referrer}\")?( \"%{DATA:traefik.access.agent}\")?(?:%{NUMBER:traefik.access.request_count}|-)?( \"%{DATA:traefik.access.frontend_name}\")?( \"%{DATA:traefik.access.backend_url}\")?", | ||
"%{IPORHOST:traefik.access.remote_ip} - %{DATA:traefik.access.user_name} \\[%{HTTPDATE:traefik.access.time}\\] \"%{WORD:traefik.access.method} %{DATA:traefik.access.url} HTTP/%{NUMBER:traefik.access.http_version}\" (?:%{NUMBER:traefik.access.response_code}|-) (?:%{NUMBER:traefik.access.body_sent.bytes}|-)( \"%{DATA:traefik.access.referrer}\")?( \"%{DATA:traefik.access.agent}\")?(?: %{NUMBER:traefik.access.request_count}| -)?( \"%{DATA:traefik.access.frontend_name}\"| -)?( \"%{DATA:traefik.access.backend_url}\"| -)?( %{NUMBER:traefik.access.response_time}ms)?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we add 2 new grok patterns: How are they different? Below we only add 1 new log line to test. Would be good to have at least 1 for each new pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also think so. I'll add a new input test line.
a68ab30
to
17361bb
Compare
After digging deeply into this issue I discovered that there's nothing like Traefik logs format but that they have always used CLF - Common Log Format or Combined Log Format. The problem is that Traefik had a bug prior 1.5 where if the Current version of Traefik (1.7) uses Combined Log Format as default output format. I suggest to leave the pattern with Combined Log Format to "force" Traefik users to update. It doesn't make sense that we have a modified Grok pattern just to cover some bug in a different product that we will have track to eventually delete it. Option 2 is to try to cover both: combined and common log format although it seems that Common is not used in Traefik right now. WDYT @ruflin? |
…d in dissect and in the ingest node again (instead of only the ingest node)
ed9f4ea
to
6737664
Compare
filebeat/test.json
Outdated
@@ -0,0 +1,3171 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayden Is this file a leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting on CI to go green.
…g Format (elastic#8768) * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now. * Added more log lines, including one in Common Log Format * Added user_identifier field which was hardcoded to '-' before * Added log test inputs where the user_name and user_identifier were set and not set. * Added duration field which was missing * Numeric fields added as string have been converted to long * Added request count field * Added two log lines more that covers issue elastic#6111 * Make bytes_sent field of type long. Most Grok expressions are optional now * Added CHANGELOG entry (cherry picked from commit 0e9054c)
@sayden I suggest we follow up this PR with moving the dissect processing to the ingest pipeline. It should still be dissect as it's more efficient but like this we have everything in one place if we want to do reprocessing for example. We can do this change for 7.0 but not backport it. |
…xed Common and Combined Log Format (#9419) * Update Filebeat Traefik module to handle mixed Common and Combined Log Format (#8768) * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now. * Added more log lines, including one in Common Log Format * Added user_identifier field which was hardcoded to '-' before * Added log test inputs where the user_name and user_identifier were set and not set. * Added duration field which was missing * Numeric fields added as string have been converted to long * Added request count field * Added two log lines more that covers issue #6111 * Make bytes_sent field of type long. Most Grok expressions are optional now * Added CHANGELOG entry (cherry picked from commit 0e9054c) * Updated fields.go file * Upload pre-ECS generated JSON expected file which differs from master
…g Format (elastic#8768) * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now. * Added more log lines, including one in Common Log Format * Added user_identifier field which was hardcoded to '-' before * Added log test inputs where the user_name and user_identifier were set and not set. * Added duration field which was missing * Numeric fields added as string have been converted to long * Added request count field * Added two log lines more that covers issue elastic#6111 * Make bytes_sent field of type long. Most Grok expressions are optional now * Added CHANGELOG entry (cherry picked from commit 0e9054c) fields.go file is updated and pre-ECS generated JSON expected file which differs from master is used # Conflicts: # filebeat/include/fields.go # filebeat/module/traefik/access/test/test.log-expected.json
…xed Common and Combined Log Format (#9439) * Update Filebeat Traefik module to handle mixed Common and Combined Log Format (#8768) * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now. * Added more log lines, including one in Common Log Format * Added user_identifier field which was hardcoded to '-' before * Added log test inputs where the user_name and user_identifier were set and not set. * Added duration field which was missing * Numeric fields added as string have been converted to long * Added request count field * Added two log lines more that covers issue #6111 * Make bytes_sent field of type long. Most Grok expressions are optional now * Added CHANGELOG entry (cherry picked from commit 0e9054c) fields.go file is updated and pre-ECS generated JSON expected file which differs from master is used # Conflicts: # filebeat/include/fields.go # filebeat/module/traefik/access/test/test.log-expected.json
…ndle mixed Common and Combined Log Format (elastic#9439) * Update Filebeat Traefik module to handle mixed Common and Combined Log Format (elastic#8768) * Added support for Common Log Format and Combined Log Format in Traefik which is the default format until now. * Added more log lines, including one in Common Log Format * Added user_identifier field which was hardcoded to '-' before * Added log test inputs where the user_name and user_identifier were set and not set. * Added duration field which was missing * Numeric fields added as string have been converted to long * Added request count field * Added two log lines more that covers issue elastic#6111 * Make bytes_sent field of type long. Most Grok expressions are optional now * Added CHANGELOG entry (cherry picked from commit 5db2afb) fields.go file is updated and pre-ECS generated JSON expected file which differs from master is used # Conflicts: # filebeat/include/fields.go # filebeat/module/traefik/access/test/test.log-expected.json
This is a continuation of PRs #6488 and #6136
Previous PR's were misleading because there's no difference between Traefik 1.5 and 1.7 log formats but as you can see in this issue Traefik could miss some quote which would make our parser to fail. So those PR's were to address that issue.
Traefik uses by default since 1.4 version (at least) the Combined Log Format https://httpd.apache.org/docs/trunk/logs.html#combined also used in Apache. In 1.7 it still uses this format but you can also setup Common Log Format or JSON. Maybe we should open an issue to support JSON in the future.
Solves #8015 and #6111
frontend_name
field was added so Traefik frontend_name is missing #8015 is solved.user_identifier field
which was hardcoded to '-' beforeduration
field which was missing toorequest_count
fieldresponse_code