-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail input plugin does not read line-by-line and sends partial line data #11372
Comments
Hi, Some questions for you:
Thanks! |
Sure, the file I'd rather not provide because there's a good amount of sensitive data in there, and the logs I mentioned have rotated out but here's another example: Telegraf error log:
Lines from the metric log file (4th line):
Yes, checked over the span of weeks of data to make sure actually because I was paranoid it was a writer issue but the log files are correct, no hidden characters or line breaks. |
Thanks for some example data! However, copying that data and using that with the tail and/or file did not reproduce. You do not have |
Yeah unfortunately it's not clear what causes it to do this, it is totally random but I feel like it should not read and flush metrics unless it is collecting the log line by line. I do not have It was suggested by @srebhan in the InfluxDB Community Slack that maybe a line-by-line vs at-once explicit reading mode definition might be worth implementing to try and resolve this, similar to what was done here: |
Just a few notes as well, it looks to be happening way more commonly with RHEL 6 than 7, not exactly sure why. Also just for reference, the largest batch of writes to the file that happens that I could find was approx 621 lines, 1863 words, 88710 bytes. I'm guessing the only way to emulate this is to set up a simulated influx line protocol metric generator and write it out to file (maybe even just have telegraf itself do it and to a file output?) and then also configure the tail plugin to read that file and output to Influx. This way you can have it generating InfluxDB Line protocol formatted metric data writing to file and then the reading of that file and ingesting to InfluxDB. |
Did this suddenly start happening recently or is this a new problem after some sort of upgrade? We have plenty of users who are using the tail plugin with influx protocol. While we could start generating a bunch of metrics in a VM this is sounding more and more like a system-specific issue if you are seeing differences between OSes. Are these systems under heavy load or have lots of disk writes going on at the same time? edit: more questions:
|
ok thanks for those metrics, so it doesn't sound like a load issue. I guess I could try to reproduce in a CentOS VM next. |
I think, the OS does not flush line-wise, but just flushes the buffer and this buffer might contain a partial line/line-fragment. The only way I see to prevent this is by enforce waiting for a complete line (i.e. wait for the newline to appear) and hold back all partial writes... To do this I suggest to implement #11234 here... Maybe we should have a common place for this kind of "scanner"... |
Do we know how the scanner completes its reading of the line? Does it look for a specific line-ending character when reading? Could be good to enforce terminating the read on a newline instead of an EOF if that may be impacting the current read point index for the file. Just another datapoint tho - I added namepass under the tail input to test preventing any bad partial line writes and so far so good, though it may take some more soak time to really prove it out. |
@dr3amville, the scanner takes a split function deciding when to return a chunk. #11234 uses Does that help? |
Just give it a shot! ;-P |
I will have to pick up some Go 😔 |
@srebhan maybe I'm not following, but tail seems to be a different animal. For example, we aren't using a scanner, but instead waiting on the external tail library to send us data via our receiver function. In the upstream tail library, which is our fork of hpcloud/tail, data is read new line delimited already. Did I follow the right path or missed something else? Thanks! |
@powersj ok, I didn't look into tail when I should have done it... :-( If the underlying library has a delimiter already, we should use it. |
I had same issue, |
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.18.3 (git: HEAD 6a94f65), RHEL 7.9 (Maipo) 63.10.0-1160.31.1.el7.x86_64 and RHEL 6.10 (Santiago) 2.6.32-754.35.1.el6.x86_64
Docker
No response
Steps to reproduce
Expected behavior
Full lines should be sent instead of partially reading lines and sending them.
Actual behavior
Lines are partially being read and sent leading to large number of bad measurements in InfluxDB.
Additional info
No response
The text was updated successfully, but these errors were encountered: