Add support for copytruncate method when rotating input logs with an external tool in `filestream` input #23457

kvch · 2021-01-12T18:16:42Z

What does this PR do?

The PR makes the filestream log rotation aware to make sure Filebeat can cooperate better with external log rotation tools. The first supported strategy is copytruncate.

When logrotate rotates e.g. boot.log with copytruncate the following things happen:

all archived files are renamed e.g. boot.log.2 is renamed boot.log.3 until boot.log.1 no longer exists
boot.log is copied to boot.log.1
boot.log is truncated

You can see my tests on my machine:

Before rotation:

root@sleipnir:/home/n# ls -lisaht /var/log/boot.log*
130476 30K -rw------- 1 root root 28K Jan 29 08:59 /var/log/boot.log
130577 36K -rw------- 1 root root 34K Jan 29 08:59 /var/log/boot.log.1
130657 60K -rw------- 1 root root 57K Jan  7 09:51 /var/log/boot.log.2

After rotation:

root@sleipnir:/home/n# ls -lisaht /var/log/boot.log*
130476   0 -rw------- 1 root root   0 May 25 12:41 /var/log/boot.log
130430 30K -rw------- 1 root root 28K May 25 12:41 /var/log/boot.log.1
130577 36K -rw------- 1 root root 34K Jan 29 08:59 /var/log/boot.log.2
130657 60K -rw------- 1 root root 57K Jan  7 09:51 /var/log/boot.log.3

On rotation, the active file is continued and archived files are kept open until EOF is reached.

Configuration

rotation.external.strategy.copytruncate:
  suffix_regex: \.\d$
  count: 10

Note: when Filebeat will be able to rotate input logs, its configuration will be under rotation.internal.*.

Why is it important?

Previously, Filebeat was not able to cooperate with external log rotation tools that used copytruncate method.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

elasticmachine · 2021-01-12T19:09:48Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #23457 updated
Start Time: 2021-06-24T11:53:32.530+0000
Duration: 106 min 1 sec
Commit: c2d872d

Test stats 🧪

Test	Results
Failed	0
Passed	14223
Skipped	2311
Total	16534

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	14223
Skipped	2311
Total	16534

elasticmachine · 2021-05-26T16:24:32Z

Pinging @elastic/agent (Team:Agent)

kvch · 2021-05-26T16:24:39Z

jenkins run tests

filebeat/input/filestream/copytruncate_prospector.go

filebeat/input/filestream/config.go

urso · 2021-05-26T20:51:36Z

Maybe we can clarify the PR description. FB will not rotate the files, but this change will make it possible to correctly follow log files that are rotated via copytruncate. Pretty cool!

Limitation (for docs): copytruncate via logrotate (or others) leads to data loss, as the file is copied while it is still written too. This can lead to log lines that can not be parsed correctly...

urso · 2021-06-01T11:53:37Z

filebeat/docs/inputs/input-filestream-file-options.asciidoc

+---
+
+`count` is the number of rotated files. {beatname_uc} needs this information to clean the
+registry when the rotated file has been purged.


What happens if we keep this 0? Will files be removed based on ttl? Shall we automatically set clean_removed for rotated files?

Regex and count seem to be helpful for other log rotation strategies (move files) as well, as it allows us to close files early and remove state from the registry early.
Closing files on EOF reduces the number of FDs we need to keep + while removing state from registry more early helps with inode reuse issues.

Apropos inode reuse: If we find a state to be used where we know for sure we did already reach EOF (rotated log file), then we can tell for sure that the inode was reused and the state must be deleted in favor of the new state.

What happens if we keep this 0? Will files be removed based on ttl? Shall we automatically set clean_removed for rotated files?

0 is an invalid setting as I use this number to when creating a slice for rotatedFiles.

Rotated files that are removed are immediately stopped and cleaned up, so we keep number of open files as low as possible and the size of the registry as small as possible. However, I am open to keeping the files open until they stop themselves and let the registry clean the state if you think that is better.

Regex and count seem to be helpful for other log rotation strategies (move files) as well, as it allows us to close files early and remove state from the registry early.
Closing files on EOF reduces the number of FDs we need to keep + while removing state from registry more early helps with inode reuse issues.

Indeed, that's why I enable close_eof and remove the state of rotated files when they are deleted.

0 is an invalid setting as I use this number to when creating a slice for rotatedFiles.

Hm, I wonder if we really the count setting at all. For slices we should use append, but we should always be ready to allow slices to grow. If someone updates the application settings to produce more logs when trying to hunt down an issue, the user should not have to remember to update/change the beats setting as well.

Rotated files that are removed are immediately stopped and cleaned up, so we keep number of open files as low as possible and the size of the registry as small as possible. However, I am open to keeping the files open until they stop themselves and let the registry clean the state if you think that is better.

This behavior should be configurable. Normally this is what close_removed and clean_removed had been used for. The number of file descriptors can indeed grow due to bandwidth limitation, but we don't want to drops logs, just because a file was rotated and we have to wait 1min for ACK from Elasticsearch if the hole environment is low EPS only. More sophisticated alternative is to configure a limit of removed files that should be kept open. By default we would keep open up to 2 removed files for a log-stream for example. Once the limit is reached, we delete the close and clean the files we have kept open the longest (oldest files).

I have kept removing and cleaning configurable. Also, removed count.

Do you still need to update the doc? I still see count being mentioned.

urso · 2021-06-01T11:54:57Z

filebeat/docs/inputs/input-filestream-file-options.asciidoc

+rotation.external.strategy.copytruncate:
+  suffix_regex: \-\d{6}$
+  count: 7
+  dateformat: -20060504


what is dateformat used for? Shouldn't the regex already find the file? Do I still need count in that case?

Given this example format, is it {year}{month}{day} or {year}{day}{month}. Depending on the logger used we might also see formats like {day}-{month}-{year}, even time might be included. If we want ordering, we do require a more 'intelligent' timestamp parsing, thus we can sort based on actual time.

This reminds me, we have a dtfmt package, but unfortunately we never added parsing support. Another use-case for timestamp parsing would be JSON. We have 2 hard coded timestamp formats for JSON parsing, but it would be nice to allow users to configure a custom pattern.

what is dateformat used for? Shouldn't the regex already find the file? Do I still need count in that case?

dateformat is used to tell Filebeat how to parse the date at the end of the file so the input can sort the rotated files.

Given this example format, is it {year}{month}{day} or {year}{day}{month}. Depending on the logger used we might also see formats like {day}-{month}-{year}, even time might be included. If we want ordering, we do require a more 'intelligent' timestamp parsing, thus we can sort based on actual time.

Yes, that is why we need dateformat. These are the example settings for the different suffixes you had listed above:

suffix format dateformat

{year}{month}{day} 20060102

{year}{day}{month} 20060201

{day}-{month}-{year} 02-01-2006

{year}{month}{day}{hour}{minute} 200601021514

This reminds me, we have a dtfmt package, but unfortunately we never added parsing support. Another use-case for timestamp parsing would be JSON. We have 2 hard coded timestamp formats for JSON parsing, but it would be nice to allow users to configure a custom pattern.

I agree, I am happy to add it in a follow-up PR.

👍

200601021514

Wow... I wouldn't call this intuitive :)

{year}{month}{day}{hour}{minute}

I kinda like this more human friendly readable format. dtfmt splits parsing and executing dataformat patterns. Maybe (as another follow up), we want to add this to dtfmt as well... :)

filebeat/input/filestream/copytruncate_prospector.go

urso · 2021-06-01T12:49:01Z

filebeat/input/filestream/copytruncate_prospector.go

+		if err != nil {
+			log.Errorf("Cannot continue file, error while getting the information of the original file: %+v", err)
+			log.Debugf("Starting possibly rotated file from the beginning: %s", fe.NewPath)
+			hg.Start(ctx, src)


Hm, shall we really try to collect if Stat failed?

I choose to collect it to make sure we are not losing any logs until the user notices the error message above. However, I am open to skipping the file, I am just worried about the implications.

What are the reasons err != nil. Most often I would assume that the file has been removed, or we just don't have permissions. Can you think of others. In general, the harvester would also fail to collect, unless we run into some other issue.

I wonder about network shares. Maybe we just failed because of network errors. Shall we really try to restart from offset 0?

In general, removing those copytruncated log files is a mistake. But I see where you are coming from. Then what happens when the file shows up in the end? Shall we scan for rotated files again to try to Continue again?

Tricky. I would lean towards not collecting and see what happens (if users run into issues).

On the other hand the harvester will fail as well, and if the harvester does not fail we still have the inode check (and in the future maybe some fingerprint?). Given that the harvester should detect if something is odd, we can keep the start here. => Let's keep the code you have written.

mergify · 2021-06-02T10:19:36Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature-filebeat-log-rotation-filestream upstream/feature-filebeat-log-rotation-filestream
git merge upstream/master
git push upstream feature-filebeat-log-rotation-filestream

kvch · 2021-06-02T12:04:53Z

@urso The PR is ready for another round of review.

I would like to address these things in a follow-up PR:

add integration tests
use dtfmt when setting dateformat
more efficient sorting

filebeat/input/filestream/copytruncate_prospector.go

mergify · 2021-06-07T14:17:59Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature-filebeat-log-rotation-filestream upstream/feature-filebeat-log-rotation-filestream
git merge upstream/master
git push upstream feature-filebeat-log-rotation-filestream

urso · 2021-06-07T19:28:22Z

The jenkins errors suggest that your recent changes did break the build.

urso · 2021-06-07T20:06:37Z

filebeat/input/filestream/copytruncate_prospector.go

+			switch fe.Op {
+			case loginp.OpCreate, loginp.OpWrite:
+				if fe.Op == loginp.OpCreate {
+					log.Debugf("A new file %s has been found", fe.NewPath)


nit; structured logging. We have quite a few debug message here that are quire short and maybe difficult to grep for if the path is interspersed. In order to ease support for grepping and even correlating events for a single source, it would be helpful to create a temporary logger for the current source. The logger would contain the path and if available the inode + internal file ID as context that is always printed. The log message would become A new file has been found.

For example: https://github.com/elastic/beats/blob/master/filebeat/input/log/input.go#L242

That approach did help us to understand some bugs, as it did enable us to correlate log message to a single inode.

urso · 2021-06-07T20:08:10Z

filebeat/input/filestream/input.go

-	identifier, err := newFileIdentifier(config.FileIdentity)
-	if err != nil {
-		return nil, nil, fmt.Errorf("error while creating file identifier: %v", err)
+		return nil, nil, fmt.Errorf("cannot create prospector: %+v", err)


use %w in order to capture the error.

urso · 2021-06-24T11:44:43Z

LGTM. Please mark the feature as experimental.

…external tool in `filestream` input (#23457) ## What does this PR do? The PR makes the `filestream` log rotation aware to make sure Filebeat can cooperate better with external log rotation tools. The first supported strategy is `copytruncate`. When `logrotate` rotates e.g. `boot.log` with `copytruncate` the following things happen: 1. all archived files are renamed e.g. `boot.log.2` is renamed `boot.log.3` until `boot.log.1` no longer exists 2. `boot.log` is copied to `boot.log.1` 3. `boot.log` is truncated You can see my tests on my machine: Before rotation: ``` root@sleipnir:/home/n# ls -lisaht /var/log/boot.log* 130476 30K -rw------- 1 root root 28K Jan 29 08:59 /var/log/boot.log 130577 36K -rw------- 1 root root 34K Jan 29 08:59 /var/log/boot.log.1 130657 60K -rw------- 1 root root 57K Jan 7 09:51 /var/log/boot.log.2 ``` After rotation: ``` root@sleipnir:/home/n# ls -lisaht /var/log/boot.log* 130476 0 -rw------- 1 root root 0 May 25 12:41 /var/log/boot.log 130430 30K -rw------- 1 root root 28K May 25 12:41 /var/log/boot.log.1 130577 36K -rw------- 1 root root 34K Jan 29 08:59 /var/log/boot.log.2 130657 60K -rw------- 1 root root 57K Jan 7 09:51 /var/log/boot.log.3 ``` On rotation, the active file is continued and archived files are kept open until EOF is reached. ### Configuration ```yaml rotation.external.strategy.copytruncate: suffix_regex: \.\d$ count: 10 ``` Note: when Filebeat will be able to rotate input logs, its configuration will be under `rotation.internal.*`. ## Why is it important? Previously, Filebeat was not able to cooperate with external log rotation tools that used `copytruncate` method. (cherry picked from commit 1f5198e)

…external tool in `filestream` input (#23457) (#26475) ## What does this PR do? The PR makes the `filestream` log rotation aware to make sure Filebeat can cooperate better with external log rotation tools. The first supported strategy is `copytruncate`. When `logrotate` rotates e.g. `boot.log` with `copytruncate` the following things happen: 1. all archived files are renamed e.g. `boot.log.2` is renamed `boot.log.3` until `boot.log.1` no longer exists 2. `boot.log` is copied to `boot.log.1` 3. `boot.log` is truncated You can see my tests on my machine: Before rotation: ``` root@sleipnir:/home/n# ls -lisaht /var/log/boot.log* 130476 30K -rw------- 1 root root 28K Jan 29 08:59 /var/log/boot.log 130577 36K -rw------- 1 root root 34K Jan 29 08:59 /var/log/boot.log.1 130657 60K -rw------- 1 root root 57K Jan 7 09:51 /var/log/boot.log.2 ``` After rotation: ``` root@sleipnir:/home/n# ls -lisaht /var/log/boot.log* 130476 0 -rw------- 1 root root 0 May 25 12:41 /var/log/boot.log 130430 30K -rw------- 1 root root 28K May 25 12:41 /var/log/boot.log.1 130577 36K -rw------- 1 root root 34K Jan 29 08:59 /var/log/boot.log.2 130657 60K -rw------- 1 root root 57K Jan 7 09:51 /var/log/boot.log.3 ``` On rotation, the active file is continued and archived files are kept open until EOF is reached. ### Configuration ```yaml rotation.external.strategy.copytruncate: suffix_regex: \.\d$ count: 10 ``` Note: when Filebeat will be able to rotate input logs, its configuration will be under `rotation.internal.*`. ## Why is it important? Previously, Filebeat was not able to cooperate with external log rotation tools that used `copytruncate` method. (cherry picked from commit 1f5198e) Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>

* master: (32 commits) [Metricbeat] Change Account ID to Project ID in `gcp.billing` module (elastic#26412) update libbeat fields.ecs.yml file and ecsVersion to 1.10.0 (elastic#26121) [Filebeat] Update AWS ELB ingest pipeline (elastic#26441) [FIlebeat] add strict_date_optional_time_nanos date format to PanOS module (elastic#26158) Fix the irregular and typo on prometheus module. (elastic#25726) [Filebeat] Parse additonal debug data fields for Okta module (elastic#25818) fix: update MSSQL Server linux image's Docker registry (elastic#26440) Update indexing.go godocs (elastic#26408) Do not close filestream harvester if an unexpected error is returned when close.on_state_change.* is enabled (elastic#26411) Add support for copytruncate method when rotating input logs with an external tool in `filestream` input (elastic#23457) Allow fields with ip_range datatype (elastic#26444) Add Anomali ThreatStream support to threatintel module (elastic#26350) fix: use the right param type (elastic#26469) [Automation] Update elastic stack version to 8.0.0-7640093f for testing (elastic#26460) Set SM Filebeat modules as GA (elastic#26226) Fix rfc5464 date parsing in the syslog input (elastic#26419) Add linked account information into billing metricset (elastic#26285) [Filebeat] Update HA Proxy log grok patterns (elastic#25835) disable metricbeat logstash test_node_stats (elastic#26436) chore: pass BEAT_VERSION when running E2E tests (elastic#26291) ...

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 12, 2021

andresrc added the Team:Elastic-Agent Label for the Agent team label Jan 12, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 12, 2021

andresrc assigned kvch Jan 12, 2021

kvch force-pushed the feature-filebeat-log-rotation-filestream branch from 0be2e9b to 68d8934 Compare May 17, 2021 10:16

kvch mentioned this pull request May 20, 2021

Log file collector based on v2 input API #20243

Closed

18 tasks

kvch marked this pull request as ready for review May 26, 2021 16:24

kvch requested a review from urso May 26, 2021 16:24

urso reviewed May 26, 2021

View reviewed changes

filebeat/input/filestream/copytruncate_prospector.go Outdated Show resolved Hide resolved

urso reviewed May 26, 2021

View reviewed changes

filebeat/input/filestream/config.go Show resolved Hide resolved

kvch added the needs_docs label May 27, 2021

kvch changed the title ~~Add inital support for copytruncate method in filestream input~~ Add support for copytruncate method when rotating input logs with an external tool in filestream input May 27, 2021

kvch force-pushed the feature-filebeat-log-rotation-filestream branch from f634df5 to 235e17a Compare May 31, 2021 10:20

urso reviewed Jun 1, 2021

View reviewed changes

kvch removed the needs_docs label Jun 1, 2021

kvch force-pushed the feature-filebeat-log-rotation-filestream branch from e11374c to a7775b9 Compare June 2, 2021 10:22

kvch requested a review from urso June 4, 2021 11:11

urso reviewed Jun 4, 2021

View reviewed changes

filebeat/input/filestream/copytruncate_prospector.go Outdated Show resolved Hide resolved

kvch force-pushed the feature-filebeat-log-rotation-filestream branch 2 times, most recently from 52ce161 to 63e8211 Compare June 7, 2021 16:11

urso reviewed Jun 7, 2021

View reviewed changes

kvch added 19 commits June 24, 2021 13:35

address review notes

d4ead6e

fix docs

daf99a4

really fix docs

c5f7ff4

fix docs

94dbbfc

Do not handle rotated files differently on rotation

bd7807c

move regexp to global var

c87d9df

remove count

e409550

address review notes

a91938f

fix test

52b3b6f

add nil check for testing

05bb157

return correct syscall.stat_t type in fileinfo mock

f071814

When creating new resouce check if it is deleted

2bae36d

move version increase to updatettl

e872429

do not write state if resource is deleted

4f86ee1

rm unnecessary mock

a187ef8

follow up tests

12af5df

more follow up

3d77546

invalidate resource after it has been persisted to the registry

3e00f2a

move marking the resource invalid to updateTTL

8a8a73c

kvch force-pushed the feature-filebeat-log-rotation-filestream branch from e45d20d to 8a8a73c Compare June 24, 2021 11:35

urso approved these changes Jun 24, 2021

View reviewed changes

mark copytruncate support experimental

c2d872d

kvch added the backport-v7.14.0 Automated backport with mergify label Jun 24, 2021

kvch merged commit 1f5198e into elastic:master Jun 24, 2021

mergify bot mentioned this pull request Jun 24, 2021

[7.x](backport #23457) Add support for copytruncate method when rotating input logs with an external tool in filestream input #26475

Merged

yyuuttaaoo mentioned this pull request May 5, 2023

[FEATURE]: 支持copytruncate日志轮转模式采集 alibaba/loongcollector#831

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for copytruncate method when rotating input logs with an external tool in `filestream` input #23457

Add support for copytruncate method when rotating input logs with an external tool in `filestream` input #23457

kvch commented Jan 12, 2021 •

edited

Loading

elasticmachine commented Jan 12, 2021 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

Trends 🧪

Test stats 🧪

elasticmachine commented May 26, 2021

kvch commented May 26, 2021

urso commented May 26, 2021

urso Jun 1, 2021

kvch Jun 1, 2021

urso Jun 4, 2021

kvch Jun 7, 2021 •

edited

Loading

urso Jun 7, 2021

urso Jun 1, 2021

kvch Jun 3, 2021

urso Jun 4, 2021

urso Jun 1, 2021

kvch Jun 1, 2021

urso Jun 4, 2021

kvch Jun 7, 2021

urso Jun 7, 2021

mergify bot commented Jun 2, 2021

kvch commented Jun 2, 2021

mergify bot commented Jun 7, 2021

urso commented Jun 7, 2021

urso Jun 7, 2021

urso Jun 7, 2021

urso commented Jun 24, 2021

suffix format	`dateformat`
`{year}{month}{day}`	20060102
`{year}{day}{month}`	20060201
`{day}-{month}-{year}`	02-01-2006
`{year}{month}{day}{hour}{minute}`	200601021514

Add support for copytruncate method when rotating input logs with an external tool in filestream input #23457

Add support for copytruncate method when rotating input logs with an external tool in filestream input #23457

Conversation

kvch commented Jan 12, 2021 • edited Loading

What does this PR do?

Configuration

Why is it important?

Checklist

elasticmachine commented Jan 12, 2021 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

elasticmachine commented May 26, 2021

kvch commented May 26, 2021

urso commented May 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kvch Jun 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Jun 2, 2021

kvch commented Jun 2, 2021

mergify bot commented Jun 7, 2021

urso commented Jun 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso commented Jun 24, 2021

Add support for copytruncate method when rotating input logs with an external tool in `filestream` input #23457

Add support for copytruncate method when rotating input logs with an external tool in `filestream` input #23457

kvch commented Jan 12, 2021 •

edited

Loading

elasticmachine commented Jan 12, 2021 •

edited by jenkins-beats-ci bot

Loading

kvch Jun 7, 2021 •

edited

Loading