Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why is logstash parsing the year only as 2-digit? #67

Open
csamsel opened this issue Jul 10, 2015 · 8 comments
Open

why is logstash parsing the year only as 2-digit? #67

csamsel opened this issue Jul 10, 2015 · 8 comments

Comments

@csamsel
Copy link
Contributor

csamsel commented Jul 10, 2015

I was using the following snippet to parse a customized size postgres logfile:

                grok {
                        match => {
                                "message" => [
                                  "%{DATESTAMP:timestamp_psql} %{TZ:tz} ...

which worked very well. As it turned out, sometimes postgres is using multiline, so my first shot was:

            multiline {
                    pattern => "^%{DATESTAMP}.*"
                    what => previous
                    negate => true
            }

which did not work. Looking at the JSON i found:

"timestamp_psql": "15-07-10 09:31:57.030 UTC",

so the leading 20 is discarded. I mean, for most logfiles this should be totally fine, but for me it was very confusing. I guess grok somehow ignores leading and trailing data for pattern matching.
Im now using

            multiline {
                    pattern => "^20%{DATESTAMP}.*"
                    what => previous
                    negate => true
            }

as multiline filter (it works). but still thats wierd.

@markwalkom
Copy link

Please join us in #logstash on Freenode or at https://discuss.elastic.co/ for troubleshooting help, we reserve Github for confirmed bugs and feature requests :)

@csamsel
Copy link
Contributor Author

csamsel commented Jul 10, 2015

Thanks for your response. Well, I figured it out by constructing the mentioned work around so i'm not really seeking for troubleshooting help.
I'm wondering if this is designed intentionally this way, if not i'd consider it worth looking into (i'd call it a bug).

@markwalkom
Copy link

Ok, apologies if I misunderstood! :)

@purbon
Copy link

purbon commented Sep 8, 2015

@csamsel This looks to me like a bug, I need to run a bit more rest, but a quick run of the YEAR expression at rubular.com matched for the 4 digits, so I would call it a bug for now.

Thanks a lot for your time and report!

@purbon
Copy link

purbon commented Sep 8, 2015

Hi @csamsel I was doing more test on your issue, specially with the DATESTAMP and seeing the output of the grok, all worked for me as expected. would you be able to provide me a sample log line? this would be super useful to actually validate if this is a grok error or a multiline one.

Thanks

@csamsel
Copy link
Contributor Author

csamsel commented Sep 9, 2015

Hi,
Yes this issue only rises with multiline parsing because in standard scenario grok will discard the leading 20 of date variables and the pattern will still match with 2-digit years. I was wondering if there is any reason for only groking and storing only 2-digit year numbers instead of the full year? btw this will als break in case old logfiles from 19XX are parsed.

Here is an (anonymized) log excerpt. All lines are related to the same query and therefore are useful to multiline. I'm correlating them by the same timestamp, which did not work initially because the leading 20 is not parse by grok.

2015-08-31 10:01:03.600 UTC [22567]: [46354-2/228947] user=XXX_user@127.0.0.1,db=XXX_db LOG:  duration: 0.085 ms  parse <unnamed>: insert into "ixsi"."booking_target_status_place" ("booking_target_id", "provider_id", "place_id") values ($1, $2, $3)
2015-08-31 10:01:03.600 UTC [22567]: [46354-2/228947] user=XXX_user@127.0.0.1,db=XXX_db LOG:  duration: 0.067 ms  bind <unnamed>: insert into "ixsi"."booking_target_status_place" ("booking_target_id", "provider_id", "place_id") values ($1, $2, $3)
2015-08-31 10:01:03.600 UTC [22567]: [46354-2/228947] user=XXX_user@127.0.0.1,db=XXX_db ERROR:  insert or update on table "booking_target_status_place" violates foreign key constraint "FK_booking_target_status_place_plid"
2015-08-31 10:01:03.600 UTC [22567]: [46354-2/228947] user=XXX_user@127.0.0.1,db=XXX_db STATEMENT:  insert into "ixsi"."booking_target_status_place" ("booking_target_id", "provider_id", "place_id") values ($1, $2, $3)

looking at the pattern

YEAR (?>\d\d){1,2}

it should allow either 2 or 4 diggits, but it only parses 2. Just check your own data if the timestamp is save as e.g. 15-09-09 08:45:49.644 UTC or 2015-09-09 08:45:49.644 UTC (the latter beeing prefered)

@xNinjaKittyx
Copy link

I noticed that datestamp for some reason is parsing the first number as a MONTH as first priority, DAY as second priority (Because of DATE_US/DATE_EU), but never parses it as YEAR first.

If you pass 2002/01/14, it parses it as 02 MONTH, 01 DAY, 14 YEAR.

If you pass 2015/01/14 it parses as 15 DAY, 01 MONTH, 14 YEAR.

but it'll never parse it as YEAR, MONTH, DAY.

It would be nice to have a built-in that parses YEAR/MONTH/DAY as it's mentioned in grok-patterns

# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)

Right now, DATESTAMP only works with MM/DD/YYYY or DD/MM/YYYY patterns.

@tedder
Copy link

tedder commented May 18, 2020

confirming what xNinjaKitty says. The comment about accepted datestamp formats doesn't include year-first. In fact, there's no YYYY/mm/dd format (with slashes). This is the cause of #112.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants