-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in_tail: add open_on_every_update
setting / support for UTF-16 and UTF-32
#1409
in_tail: add open_on_every_update
setting / support for UTF-16 and UTF-32
#1409
Conversation
open_on_every_update
setting / support for UTF-16 and UTF-32open_on_every_update
setting / support for UTF-16 and UTF-32
…sive sharing mode.
394c8ad
to
cdf7ee1
Compare
It seems some builds on travis-ci were stuck. Could anyone please rebuild them? I haven't got a modification permission on CI. |
I cannot understand why Another question: What's happen when someone opens the file tailed by |
Generally, if a file is opened by a process with neither of sharing modes ( FileWrapper opens the given file R/W shared ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature added looks reasonable.
Please add some test cases which includes newlines (2 or more events in file) to test non-"\n" EOLs.
|
||
def next_line | ||
idx = @buffer.index(@eol) | ||
convert(@buffer.slice!(0, idx + 1)) unless idx.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be @buffer.slice!(0, idx + @eol.bytesize)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is correct. @buffer
is searched character-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this code:
"\n".encode("utf-16").size #=> 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When encoded in Encoding::UTF_16, a BOM is prepended to the resulting string.
irb(main):006:0> "\n".encode("utf-16").size
=> 2
irb(main):007:0> "\n".encode("utf-16le").size
=> 1
irb(main):008:0> "\n".encode("utf-16")[0]
=> "\uFEFF"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I've forgotten about BOMs. Sorry.
pe # This pe will be updated in on_rotate after TailWatcher is initialized | ||
end | ||
|
||
class StatWatcher < Coolio::StatWatcher | ||
def initialize(path, log, &callback) | ||
def initialize(watcher, &callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class just refers watcher.path
and watcher.log
, and doesn't refer any other attributes of watcher
.
What is this change for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other helper classes needed a similar change for constructor parameters, and it made sense for me to apply the same policy to StatWatcher
as well.
convert(@buffer.slice!(0, idx + 1)) unless idx.nil? | ||
end | ||
|
||
def size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use bytesize
for bytesize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this.
if !io.nil? && @lines.empty? | ||
begin | ||
while true | ||
@fifo << io.readpartial(2048, @iobuf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any reasons to use @iobuf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what I was wondering about. I just did so as the original code does.
end | ||
|
||
def on_change(prev, cur) | ||
@callback.call | ||
rescue | ||
# TODO log? | ||
@log.error $!.to_s | ||
@log.error_backtrace | ||
@watcher.log.error $!.to_s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add any error messages to make us known where the error occurs, and use error "message", error: e
style logging instead of $!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just an adoption for the changes to the instance attributes. Totally agreed but I'd open the separate issue for this.
|
||
attr_reader :from_encoding, :encoding, :buffer | ||
|
||
def <<(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the argument chunk object reused in anywhere? (it's related with my comment about using @iobuf
).
If no, there seems no reason to re-do force_encoding(orig_encoding)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, this is the weirdest part. As IO.readpartial(len, buf)
returns the same object as the second argument, the encoding of the reused buffer (@iobuf
) eventually becomes the same encoding as that specfied to chunk.force_encoding()
. Reverting back to the original encoding is necessary because for unknown reasons an assertion failure occurs for older versions of Ruby when a buffer with any double-byte or quad-byte encoding (UTF-16 / UCS-2 / UTF-32) is given to readpartial()
as the second argument.
Try this:
a = open('/dev/urandom')
buf = ''.force_encoding(Encoding::UTF_16BE)
a.readpartial(2048, buf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - we need code comment for that here :)
yield @io | ||
end | ||
rescue | ||
@watcher.log.error $!.to_s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error message to find where the error occurs (and use rescue => e
and error: e
to dump error objects).
end | ||
@inode = inode | ||
@fsize = fsize | ||
end | ||
|
||
rescue | ||
@log.error $!.to_s | ||
@log.error_backtrace | ||
@watcher.log.error $!.to_s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
@@ -410,7 +398,7 @@ def parse_multilines(lines, tail_watcher) | |||
end | |||
|
|||
class TailWatcher | |||
def initialize(path, rotate_wait, pe, log, read_from_head, enable_watch_timer, read_lines_limit, update_watcher, line_buffer_timer_flusher, &receive_lines) | |||
def initialize(path, rotate_wait, pe, log, read_from_head, enable_watch_timer, read_lines_limit, update_watcher, line_buffer_timer_flusher, from_encoding, encoding, open_on_every_update, &receive_lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think time to add TailWatcherSetting like object to pass parameters.
Hard to maintain these long arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I hate these long arguments!
Points were fixed and newlines are not problem.
LGTM. |
if @from_encoding == @encoding | ||
s | ||
else | ||
c = Encoding::Converter.new(@from_encoding, @encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot pre-create this converter in initialize
?
Converter#finish
doesn't reset internal state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be no such descriptions in the document that indicate the method resets the internal state of a converter. It would definitely be better if we could do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, in_tail
sometimes consumes lots of CPU and it causes high pressure on application server.
So I want to avoid more CPU usage with typical cases if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nurse How about this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding::Converter#finish
just finishes the converter; it doesn't reset the state. And there's no way to recycle the converter.
For this code, it should be s.encode(@encoding, @from_encoding)
inode = stat.ino | ||
if inode == @pe.read_inode # truncated | ||
@pe.update_pos(stat.size) | ||
io_handler = IOHandler.new(io, @pe, @log, @read_lines_limit, &method(:wrap_receive_lines)) | ||
@pe.update_pos(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0
instead of stat.size
is correct?
What happen when file is truncated to non-zero size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior is somewhat strange. Because the file position of io
, which must have been newly opened by the caller, isn't updated while the position is updated to stat.size
here, IOHandler later starts reading from the head anyway.
The new code tries to update the file handle's position to that stored in @pe
before reading from the target file, so the position has to be set to 0 in order to preserve the behavior.
If this is a bug, it needs to be corrected to @pe.update_pos(stat.size)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
…use the explicit converter, which I don't remember clearly.
LGTM |
Sorry, I overlooked the notice. Thank you all for looking into this! |
When do we find this useful?
open_on_every_update
, which enables the plugin to tail such a target by letting the plugin open it only when it gets updated.Known problem(s)
Why did you end up with a single patch addressing two different issues?