Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] Support option to ignore files older than certain age #31053

Closed
ycombinator opened this issue Feb 5, 2024 · 8 comments
Closed

Comments

@ycombinator
Copy link
Contributor

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

For users who keep log files around for long periods of time, it might be useful to tell the filelog receiver to only consider files that are younger than a certain age or, in other words, to ignore files that are older than a certain age.

Describe the solution you'd like

A new, optional, configuration option for the filelog receiver, something like exclude_older_than, which accepts a duration as its value.

Describe alternatives you've considered

No response

Additional context

Filebeat, Elastic's log shipper, supports such an option in it's filestream input, which is conceptually the same as OTel Collector's filelog receiver.

@ycombinator ycombinator added enhancement New feature or request needs triage New item requiring triage labels Feb 5, 2024
Copy link
Contributor

github-actions bot commented Feb 5, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

@ycombinator, have you looked into whether filelog's ordering_criteria.sort_by.sort_type = mtime works for your use case?

@ycombinator
Copy link
Contributor Author

Thanks @djaglowski. I did look into that option in combination with ordering_criteria.top_n but the way I understood it, that combination would let me say something to the effect of "only consider the top (latest) N files, sorted by mtime". I couldn't figure out a way to say "only consider all files newer than a certain mtime", which is what I'm proposing in this issue. If I missed some combination of the ordering_criteria.* options that could satisfy the latter use case, please let me know.

@djaglowski
Copy link
Member

Thanks for confirming @ycombinator. I see the difference and agree it is a sensible feature. I think it would make sense to add it in the matcher package.

@crobert-1 crobert-1 added pkg/stanza and removed needs triage New item requiring triage labels Feb 6, 2024
Copy link
Contributor

github-actions bot commented Feb 6, 2024

Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@thomasriera-akkodis
Copy link

Hello

This feature would indeed be very useful. I have been trying to set up a similar behaviour using ordering by date and top_n. It works well enough for now but a way to ignore old files would be better.

@ycombinator
Copy link
Contributor Author

ycombinator commented Mar 22, 2024

Hi @djaglowski, I'd like to work on this issue if that's okay.

[UPDATE] I've put up a PR to try and resolve this issue: #31916.

djaglowski added a commit that referenced this issue Apr 15, 2024
…1916)

**Description:** 
This PR implements a new matcher criterion in the Stanza fileconsumer
matcher:

```
ExcludeOlderThan time.Duration        `mapstructure:"exclude_older_than"`
```

and the corresponding setting in the `filelog` receiver configuration:

| Field | Default | Description |

|-------------------------------------|--------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `exclude_older_than` | | Exclude files whose modification time is
older than the specified age. |


When specified, the matcher will exclude files whose modification times
are older than the specified time.

**Link to tracking Issue:** #31053

**Testing:** Added unit tests.

**Documentation:** Documented `exclude_older_than` configuration setting
in the `filelogreceiver`'s README.

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
@ycombinator
Copy link
Contributor Author

Implemented via #31916

ghost pushed a commit to opsramp/opentelemetry-collector-contrib that referenced this issue May 5, 2024
…en-telemetry#31916)

**Description:**
This PR implements a new matcher criterion in the Stanza fileconsumer
matcher:

```
ExcludeOlderThan time.Duration        `mapstructure:"exclude_older_than"`
```

and the corresponding setting in the `filelog` receiver configuration:

| Field | Default | Description |

|-------------------------------------|--------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `exclude_older_than` | | Exclude files whose modification time is
older than the specified age. |

When specified, the matcher will exclude files whose modification times
are older than the specified time.

**Link to tracking Issue:** open-telemetry#31053

**Testing:** Added unit tests.

**Documentation:** Documented `exclude_older_than` configuration setting
in the `filelogreceiver`'s README.

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants