Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: Ensure to discard TailWatcher with missing target when follow_inodes #4239

Merged

Conversation

daipom
Copy link
Contributor

@daipom daipom commented Jul 13, 2023

Which issue(s) this PR fixes:
Not related to a particular issue, but it can have something to do with reported log duplication or resource leak phenomenon about in_tail.

What this PR does / why we need it:
Ensure to discard TailWatcher with missing target when follow_inodes.

For example, when a rotation process is slow, there is a small time lag between moving and adding files.
There is a possibility that StatWatcher notifies too quickly to the TailWatcher before the new file is moved to that target path. From the TailWatcher, it appears as if the file is resurrected once it disappeared.

In this case, refresh_watcher can't recognize it, so TailWatcher needs to discard self correctly.
In the previous implementation, it was not done.
So it caused the handle leak and log duplication.
(Please check the added test case)

Because of the problem #4237 (comment), log duplication can still occur during rotate_wait.
We need to set rotate_wait 0 for a workaround.
(Duplication will occur if refresh_watcher is called during the rotate_wait. We need the mechanism not to add duplicated TailWathcer with detaching TailWatcher.)

Docs Changes:
Not needed.

Release Note:
Same as the title.

How to Reproduce:

<source>
  @type tail
  tag test
  path /path/to/test.log*
  pos_file /test/fluentd/pos/pos
  read_from_head true
  follow_inodes true
  refresh_interval 15s
  <parse>
    @type none
  </parse>
</source>

<match test.**>
  @type stdout
</match>
  • A: initial start
test.log
  • B: move the file
test.log1
  • C: add the new current file
test.log
test.log1
  • D: wait rerfresh_watcher TWICE
    • first: add TailWatcher for test.log (new inode)
      • this overwrites @tails[test.log] so old inode information is lost from @tails at this timing
    • second: add TailWatcher for test.log1 (old inode)
  • E: append logs to test.log1
    • Because now there are 2 TailWatchers for test.log1, log duplication occurs.
      • TailWatcher(test.log, old inode) (orphan) (This actually opens the file handle of test.log1)
      • TailWatcher(test.log1, old inode)

When StatWatcher (or TimeTrigger) notifies between B and C, then the TailWatcher becomes an orphan, and a handle leak happens.

@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

I will check this

* https://github.com/fluent/fluentd/actions/runs/5539804706/jobs/10111142203?pr=4239

I think this is just a list order problem. I will improve the test.

@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

It seems that log duplication and handle leak still can occur on macOS...
I will check it 😢

https://github.com/fluent/fluentd/actions/runs/5539804708/jobs/10111141849?pr=4239
https://github.com/fluent/fluentd/actions/runs/5539804708/jobs/10111141988?pr=4239

@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

It seems that log duplication and handle leak still can occur on macOS... I will check it 😢

https://github.com/fluent/fluentd/actions/runs/5539804708/jobs/10111141849?pr=4239 https://github.com/fluent/fluentd/actions/runs/5539804708/jobs/10111141988?pr=4239

It seems that this can happen only on macOS.
I can reproduce this on my macOS (Ventura, Apple M1) too, but I can't reproduce it on my Windows 10.

@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

It seems that log duplication and handle leak still can occur on macOS... I will check it 😢
https://github.com/fluent/fluentd/actions/runs/5539804708/jobs/10111141849?pr=4239 https://github.com/fluent/fluentd/actions/runs/5539804708/jobs/10111141988?pr=4239

It seems that this can happen only on macOS. I can reproduce this on my macOS (Ventura, Apple M1) too, but I can't reproduce it on my Windows 10.

It can happen on Windows and macOS because StatWatcher does not work on those platforms (specification).
So this is merely a matter of the interval time between moving and adding files.
I will change the sleep time a bit longer for those platform to make the test valid for those platforms.

@daipom daipom force-pushed the in_tail-ensure-discard-watcher-with-missing-target branch 2 times, most recently from 9e812a5 to 5a4c9b3 Compare July 13, 2023 08:19
@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

The above problems are fixed.

@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

@daipom daipom force-pushed the in_tail-ensure-discard-watcher-with-missing-target branch 2 times, most recently from 0c13c9f to 7ff9dd3 Compare July 13, 2023 08:56
@daipom
Copy link
Contributor Author

daipom commented Jul 13, 2023

Rebased to the latest master.

@daipom daipom force-pushed the in_tail-ensure-discard-watcher-with-missing-target branch from 7ff9dd3 to f87ab7b Compare July 13, 2023 23:42
lib/fluent/plugin/in_tail.rb Outdated Show resolved Hide resolved
…w_inodes

For example, when a rotation process is slow, there is a small time lag
between moving and adding files.
There is a possibility that StatWatcher notifies too quickly to the
TailWatcher before the new file is moved to that target path.
From the TailWatcher, it appears as if the file is resurrected once
it disappeared.

In this case, `refresh_watcher` can't recognize it, so TailWatcher needs
to discard self correctly.
In the previous implementation, it was not done.
So it caused the handle leak and log duplication.
(Please check the added test-case)

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
@daipom daipom force-pushed the in_tail-ensure-discard-watcher-with-missing-target branch from f87ab7b to 781e313 Compare July 14, 2023 06:56
@ashie ashie merged commit 32082ea into fluent:master Jul 14, 2023
@ashie
Copy link
Member

ashie commented Jul 14, 2023

Thanks!

@daipom daipom deleted the in_tail-ensure-discard-watcher-with-missing-target branch July 14, 2023 07:48
@daipom
Copy link
Contributor Author

daipom commented Jul 14, 2023

Thanks for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants