-
Notifications
You must be signed in to change notification settings - Fork 41
Ideal parser settings #431
Comments
@djaglowski FYI, I run the survey across several filelog receiver users and they support the changes |
Thank you @pmm-sumo, I'm glad to have many eyes on this! |
Just want to make sure I'm keeping up. With
Input message
Json parser pseudo output
Move operator output
|
That is the idea, though there's nothing requiring that As a possible future feature, we could add a type: json_parser
timestamp:
layout: ...
parse_from: time
severity:
parse_from: sev
body: msg |
I would be a fan of the body option. Not destructive by default, but allows us to skip the move operator. My example is a common workflow for myself, where I put everything into attributes and replace body with the message field's value. Popular loggers usually have a message field, severity, timestamp. Generally I would want to promote timestamp and severity and only preserve the message field under body. Input
Output
For something like nginx, this would not make sense because there is technically no message field, preserving body as is would make a lot of sense Input
Output
|
I created open-telemetry/opentelemetry-collector-contrib#10274 to capture the idea of adding a |
I agree with the principle. Question: if the parser takes and input and replaces it with a different representation that is reversible do we consider it non-destructive? So for example a JSON parser detects that Body is a well formed JSON string and replaces it with a structured Body (writes to the same field) that represents the same values but in structured form. Is this an acceptable behavior or we consider this a destruction? |
My position on this probably depends on the answer to my previous question. |
I agree with behaviors 2-4. |
I see your point, and realistically I would describe this as non-destructive. From a practical standpoint though, I'm not sure the distinction really matters. Assuming behaviors 2-4 stand, we have nearly guaranteed already that parsing is non-destructive by default. The only other way in which the parsing operation could still be destructive by default is if the original value is removed, which is where behavior 5 suggests that it not be. That said, I think behavior 5 is less important than 2-4, and may be worth reconsidering. Timestamps and severity values would typically not be missed, once parsed. Perhaps the same can be said of encoded json. On the other hand, I think regex should probably not consume the original value. CSV and key/value are less clear because in some sense "nothing is lost", but the original value may be useful for troubleshooting or just because a string representation is expected. Determining this on a case-by-case is too nuanced in my opinion, so I'd prefer we go one way or the other across the board. So I think the other option is to reject behavior 5, consume the |
Actually I was imagining a hypothetical scenario that is impossible, so ignore my question/objection. I am good with your entire proposal. |
Parsing behavior in this library was originally designed for a different data model than the one we now have. The expected usage of
body
andattributes
have been redefined. As such, the behavior of parsers should be reevaluated.This is not my first attempt to propose changes to the way parsers behave. I believe my previous attempts have been too focused on proposing changes, which is difficult to do without lengthy explanations of the current behavior. The following is instead meant to be read as a standalone proposal for how parsers should behave. Please read accordingly.
I suggest the following would be ideal behavior:
parse_from
andparse_to
.parse_from
setting should default tobody
.parse_to
setting should default toattributes
.preserve_to
setting should not exist.Rationale
The primary purpose of parsing operations is to isolate information. Where the extracted information should be placed is a secondary, but obvious, concern. However, what happens to the original value is easily overlooked. As a general principle, destruction of the original value should require intentional configuration.
parse_from
andparse_to
.A somewhat common use case involves applying structure to an unstructured log. For example, when a user extracts information from a line of text, they may wish keep only the key value pairs they have isolated. When this is the case, an intuitive choice is to overwrite the original value. This should be supported but should require explicit configuration.
parse_from
setting should default tobody
.The body typically contains an unstructured log. Parsing this value is the most common parsing use case. Therefore, parsers should default to parsing the body.
parse_to
setting should default toattributes
.The result of a parsing operation is a structured object. This is true of all parsers that have a
parse_to
field.The data model strongly encourages that structured data belong in
attributes
, so this is a natural default value forparse_to
.Additionally, the default value for
parse_to
should be different than that ofparse_from
, else we have a situation where parsing operations are destructive by default.preserve_to
setting should not exist.If parsing is non-destructive by default, there is no need to provide a special setting for preservation. In the case where a user wishes to "back up" a value and destroy the original, they can do so by copying or moving the value before parsing.
The text was updated successfully, but these errors were encountered: