Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: Check detaching inode when follow_inodes #4191

Merged
merged 2 commits into from
Jun 30, 2023

Conversation

garyzjq
Copy link
Contributor

@garyzjq garyzjq commented May 31, 2023

Which issue(s) this PR fixes:
Fixes #4190

What this PR does / why we need it:
Add validation to make sure detach_watcher is detaching expected watcher. This can avoid unexpectedly detach new watcher created for new log file and lead to log stuck transiently.

Add log to check that detaching inode is the same as the detaching TailWatcher's inode when enabling follow_inodes.

Note: If they do not match, canceling the detach (by adding return) may prevent an incorrect detach.
Since #4208 will prevent an incorrect detach, we will only add the warning log in this PR for now.

Docs Changes:
N/A

Release Note:
Fix transient log stuck in in_tail when log file rotated and follow_inodes is enabled
Same as the title.

@ashie ashie self-requested a review May 31, 2023 06:20
@garyzjq garyzjq force-pushed the jiaqz/inodestuck branch 2 times, most recently from 6e8bbde to fbc3c2d Compare May 31, 2023 06:30
daipom added a commit to daipom/fluentd that referenced this pull request Jun 20, 2023
If `refresh_watchers` run before `update_watcher`,
the old implementation of `update_watcher` detach wrongly the
new TailWatcher which is added in `refresh_watcher`.
This causes the problem of stopping tailing log and handle leak.

The test case `test_updateTW_after_refreshTW` reproduces this
problem.
This fix solves it.

There are another BUG about unwatching.
I adjusted some expected values of the tests for this BUG.
When `refresh_watcher` find the rotated old file AFTER
unwatching it (`rotate_wait`), then the logs will be collected in
duplicate.
If `refresh_watcher` find it BEFORE unwatching it (`rotate_wait`),
this problem doesn't occur because the position entry is still
alive.
We need to fix this and fix the adjusted expected values.

This fix is based on the content and discussion of the following
issue and PR:

* fluent#3614
* fluent#4185
* fluent#4191

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Co-authored-by: Katuya Kawakami <kkawakami-je@nec.com>
Co-authored-by: Masaki Hatada <mas-hatada@nec.com>
Co-authored-by: Gary Zhu <garyzjq@outlook.com>
Co-authored-by: Takuro Ashie <ashie@clear-code.com>
daipom added a commit to daipom/fluentd that referenced this pull request Jun 23, 2023
If `refresh_watchers` run before `update_watcher`,
the old implementation of `update_watcher` detach wrongly the
new TailWatcher which is added in `refresh_watcher`.
This causes the problem of stopping tailing log and handle leak.

The test case `test_updateTW_after_refreshTW` reproduces this
problem.
This fix solves it.

There are another BUG about unwatching.
I adjusted some expected values of the tests for this BUG.
When `refresh_watcher` find the rotated old file AFTER
unwatching it (`rotate_wait`), then the logs will be collected in
duplicate.
If `refresh_watcher` find it BEFORE unwatching it (`rotate_wait`),
this problem doesn't occur because the position entry is still
alive.
We need to fix this and fix the adjusted expected values.

This fix is based on the content and discussion of the following
issue and PR:

* fluent#3614
* fluent#4185
* fluent#4191

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Co-authored-by: Katuya Kawakami <kkawakami-je@nec.com>
Co-authored-by: Masaki Hatada <mas-hatada@nec.com>
Co-authored-by: Gary Zhu <garyzjq@outlook.com>
Co-authored-by: Takuro Ashie <ashie@clear-code.com>
@ashie
Copy link
Member

ashie commented Jun 26, 2023

@garyzjq Sorry for my late response.
We are going to fix this issue with #4208, please check it.
I believe it resolves both stuck issue & leak issue.

daipom added a commit to daipom/fluentd that referenced this pull request Jun 26, 2023
If `refresh_watchers` run before `update_watcher`,
the old implementation of `update_watcher` detach wrongly the
new TailWatcher which is added in `refresh_watcher`.
This causes the problem of stopping tailing log and handle leak.

The test case `test_updateTW_after_refreshTW` reproduces this
problem.
This fix solves it.

There are another BUG about unwatching.
I adjusted some expected values of the tests for this BUG.
When `refresh_watcher` find the rotated old file AFTER
unwatching it (`rotate_wait`), then the logs will be collected in
duplicate.
If `refresh_watcher` find it BEFORE unwatching it (`rotate_wait`),
this problem doesn't occur because the position entry is still
alive.
We need to fix this and fix the adjusted expected values.

This fix is based on the content and discussion of the following
issue and PR:

* fluent#3614
* fluent#4185
* fluent#4191

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Co-authored-by: Katuya Kawakami <kkawakami-je@nec.com>
Co-authored-by: Masaki Hatada <mas-hatada@nec.com>
Co-authored-by: Gary Zhu <garyzjq@outlook.com>
Co-authored-by: Takuro Ashie <ashie@clear-code.com>
@ashie
Copy link
Member

ashie commented Jun 27, 2023

Although the issue will be fixed by #4208, this fix still might be useful as a last guard.
Especially, it's might better to add the log even if we don't return here.
So we'll keep open this for a while to consider about it.

@garyzjq
Copy link
Contributor Author

garyzjq commented Jun 28, 2023

Thanks @ashie . I also checked #4208 and looks good. The watcher to detach is directly passed to detach_watcher method instead of find from tail map, so should be correct by design.
I think I can change my PR to add a log but won't return (for safety), to see whether there's any other corner case which may lead to detach unexpected watcher.

Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can change my PR to add a log but won't return (for safety), to see whether there's any other corner case which may lead to detach unexpected watcher.

Thanks, it's worth to merge to check other cases.

@ashie ashie requested a review from daipom June 29, 2023 00:18
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garyzjq
LGTM! Thanks!

Could you please do the following?

garyzjq added 2 commits June 29, 2023 12:05
Signed-off-by: Gary Zhu <garyzjq@outlook.com>
Signed-off-by: garyzjq <garyzjq@outlook.com>
Signed-off-by: garyzjq <garyzjq@outlook.com>
@garyzjq garyzjq force-pushed the jiaqz/inodestuck branch from da55678 to bcd9a1a Compare June 29, 2023 04:05
@garyzjq
Copy link
Contributor Author

garyzjq commented Jun 29, 2023

@garyzjq LGTM! Thanks!

Could you please do the following?

sure, done for DCO

@daipom daipom changed the title Check inode expectation to detach correct watcher when follow_inodes in_tail: Check detaching inode when follow_inodes Jun 30, 2023
@daipom daipom merged commit f436211 into fluent:master Jun 30, 2023
@daipom
Copy link
Contributor

daipom commented Jun 30, 2023

@garyzjq Thanks! I fixed the title and some issue comments and merge this!
#4190 and this PR have given us much insight to improve in_tail!
Thanks so much!

@garyzjq
Copy link
Contributor Author

garyzjq commented Jun 30, 2023

thanks @daipom and @ashie, really a nice experience for me to contribute to fluentd :)

@garyzjq garyzjq deleted the jiaqz/inodestuck branch July 2, 2023 16:20
@daipom daipom added this to the v1.16.2 milestone Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants