-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better multiline handling: s-expressions #1559
Comments
@MOZGIII this seems somewhat related to the work you're doing. I'm curious if what you're doing will solve this, or if it's closely related? |
Yes, I guess this should be possible to implement using the merge transform I'm currently working on and a custom It doesn't look like it'd possible to parse with regexp to do partial message marking, but with custom logic - it's doable. |
Thanks for this @FungusHumungus. We have a discussion brewing about how to best solve this and will follow up as we get more clarity on our solution. |
@MOZGIII I'm assigning this to you as part of your overall merge work. It probably makes sense to see how we can modify the existing |
On the second thought, this case it's very tricky. First of all, s-expressions are a context-free grammar, so we can't use regexps to parse it. Then, our Good news is it's about to be changed in such a way that it'll (presumably) support this use case. That said, to solve this issue with a |
This case generalizes to the problem vector currently has with parsing multiline messages. For instance, if the case was not with s-expressions, but with multi-line JSON, the problem would be the same. The However, the parsing of the multi-line messages is, in fact, a different thing.
Merge transform helps with the first scenario, but it's useless with the second. The first scenario is also a special case of the second scenario - when the value we extract is a top-level string. To sum up, the most flexible way of implemeting this whole thing would be implementing a streaming tokenizer/parser with pluggable grammars: JSON / top-level string / user-supplied pattern / user-supplied grammar. This way we'll be able to actually properly support incoming data streams without providing workarounds to handle "read framing". |
Makes sense. Could there be a case for calling into Lua to do this parsing? |
We've improved multi-line handling support, please check out #1852. |
After our |
Here's a tutorial on merging multi-line logs with lua: https://vector.dev/guides/advanced/merge-multiline-logs-with-lua/ |
We delimit our log messages using s-expressions:
The file source is currently unable to parse this message as it is over multiple lines, and there is no set character to determine the start of the next log message. Using
(
wont work as this is sometimes used within the log message itself.It would be handy if there was a way to incorporate a more advanced parser that can keep track of all the quotes and parens (including taking into account when they are escaped) to determine when a message has been fully loaded.
It would probably be handy to share this logic with the socket source as well.
The text was updated successfully, but these errors were encountered: