Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] Intelligent File Detection and Reading for Rotated Files #22998

Closed
Mrod1598 opened this issue Jun 1, 2023 · 4 comments
Closed
Assignees
Labels
enhancement New feature or request receiver/filelog

Comments

@Mrod1598
Copy link
Contributor

Mrod1598 commented Jun 1, 2023

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

Yes, our feature request is related to a problem experienced with file detection and reading an environment with no concept
of a current file with a set name. It has a large group of files, all timestamped, which rotates continuously. It has been
challenging to accurately identify and read the "current" file within this pool of rotating files. The inability to effectively filter
these files leads to excessive CPU usage, as the system attempts to read more than just the current file as we need to check
that none of the other files have been updated.

Describe the solution you'd like

We propose an approach that involves utilizing a sequence of ordering filter rules to determine the most recent file. In cases where multiple groups are necessary, it would be more effective to use multiple receivers.

We also consider some assumptions:

It might be possible to have only one group, which could simplify the process, assuming the user specifies a matching
pattern that matches one group.
The most recent file could be determined by an integer in the filename, which would facilitate the process.
The filename format could be year, month, day, sequence number.

EX:

err.2023053001.log
err.2023053002.log
err.2023053003.log
err.2023053101.log
err.2023053102.log
err.2023053103.log

The solution should provide the capability to define alternate ordering strategies with different parsing/sorting techniques
such as:

  • Timestamp only
  • Integer only
  • Timestamp & integer, with primary sort based on timestamp and secondary sort based on integer.

Lastly, we suggest creating a configuration section that applies these sorting methods in order of priority.
In the proposed solution, we will introduce a new top-level key, tentatively named file_name_filtering_rules. This key will
have a list of filtering rules as its value, and these rules will be applied in sequence.

A single rule will comprise the following fields:

regex: A regular expression with a single capture group called value. This will be used against each filename, and the
contents of value will be used for the rule.

sort_type: Determines how the values of value are compared and sorted. Valid entries are timestamp, integer, and
alphabetical.

format : If sort_type is timestamp, this field determines how to parse the timestamp. The stanza timestamp parsing logic can likely be applied here.

ascending: A boolean value which, if true, signals to sort in ascending order. If false, it sorts in descending order.

Example Config:

filelog:
  include: [dir/Error.*.log]
  file_name_filtering_rules:
    - regex: ¹/dir/Error\.(?P<value>\d{8}).*'
      sort_type: timestamp
      format: '%Y%M³D'
      ascending: true
    - regex: '/dir/Error\.\d{8}(?P<value>\d{2}).*'
      sort_type: integer
      ascending: true

Describe alternatives you've considered

No response

Additional context

No response

@Mrod1598 Mrod1598 added enhancement New feature or request needs triage New item requiring triage labels Jun 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

The first step will involve creating an algorithm to group files based on a sequence of rotations, effectively sorting matching filenames into their respective groups.

It's not clear to me whether this proposal is attempting to address this in any way. Am I missing it? Let's say I have the following files - how does one group these into two groups?

  • group1-20230601.log
  • group1-20230602.log
  • group2-20230601.log
  • group2-20230602.log

@Mrod1598
Copy link
Contributor Author

Mrod1598 commented Jun 1, 2023

For now grouping should be done through two different receivers.

@djaglowski
Copy link
Member

Closed by #23633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request receiver/filelog
Projects
None yet
Development

No branches or pull requests

3 participants