Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIBBEAT: Enhancement replace_string processor for replacing strings values of fields. #17342

Merged
merged 23 commits into from
Apr 28, 2020

Conversation

premendrasingh
Copy link
Contributor

@premendrasingh premendrasingh commented Mar 30, 2020

What does this PR do?

This PR is to add a replace processor. This processor takes in a field name, search string and replacement string. Searches field value for pattern and replaces it with replacement string.

Why is it important?

This PR will help remove extra strings or add additional string to values

How to test this PR locally

Added unit test cases.

Use cases

While using auditbeat we get full path to file inside the pod on Kubernetes
"/run/containerd/io.containerd.runtime.v1.linux/k8s.io/${data.kubernetes.container.id}/rootfs/etc/runit/runsvdir/default/mcelog/supervise/pid.new"
This PR helps trim the beginning part of the string to get
/etc/runit/runsvdir/default/mcelog/supervise/pid.new"

Using config below

      processors:
        - replace:
            fields:
            - field: "file.path"
              pattern: "/run/containerd/io.containerd.runtime.v1.linux/k8s.io/${data.kubernetes.container.id}/rootfs/"
              replacement: "/"

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@ChrsMark ChrsMark added review Team:Services (Deprecated) Label for the former Integrations-Services team labels Mar 31, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@exekias exekias added the Team:Integrations Label for the Integrations team label Apr 1, 2020
@exekias
Copy link
Contributor

exekias commented Apr 1, 2020

This seems very similar to https://www.elastic.co/guide/en/elasticsearch/reference/master/gsub-processor.html, could you please use the same signature? I think having field, pattern and replacement would be enough

ReplaceWith string `config:"replace_with"`
type replaceConfig struct {
Field string `config:"field"`
Pattern string `config:"pattern"`
Copy link
Contributor

@vjsamuel vjsamuel Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso @exekias should https://github.com/elastic/beats/blob/master/libbeat/common/match/matchers.go be enhanced to do replacing strings and just use matcher.Match here?

using plain old regex could be slower as compared to the optimized one in libbeat.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for Pattern in the config use Patter *regexp.Regexp. Config unpacking will automatically try to compile the regex and fail with error + setting name if this has failed.

matcher.Match is an optimization for some custom cases only, but falls back to regexp if the case becomes more 'complicated'. Given the optimizations we have in matcher, I only see the case for a constant string match being helpful (which would become a sub-string search, or in some cases string-prefix/suffix comparison).

The matcher package also replaces capturing-group-matches with non-capturing-groups (greatly reduces allocations). Having patterns and replacement like gsub, do we want to allow users to use capturing group in the replacement in the future? E.g.

pattern: 'some (?P<important>[a-zA-Z]) string'
replace: 'found: {{important}}'

For now I would not enhance the matcher package. Only if we figure this is indeed a common problem. When doing so we might have to remove some of the optimizations. In case we find we really need to optimize another type (e.g. matcher.Replacer) might give us better flexibility in applying the kind of optimizations we need for the use-case, while not un-optimizing matcher.Matcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso Changed to pattern, as suggested.

@urso
Copy link

urso commented Apr 16, 2020

The implementation itself LGTM.

Please fix the code format. The intake CI job fails already. See: https://travis-ci.org/github/elastic/beats/jobs/670699207

Please add some reference documentation as well. See docs sub-directory for examples.

Thank you!

fields:
- field: "file.path"
pattern: "/run/containerd/io.containerd.runtime.v1.linux/k8s.io/${data.kubernetes.container.id}/rootfs/"
replacement: "/"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the implementation patterns can not reference events contents. A more correct regexp would be /run/containerd/io.containerd.runtime.v1.linux/k8s.io/.+/rootfs/.

Maybe we can have a simpler example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso I have updated the documentation to a simpler example. Can you please help fix the CI issue? It keeps failing with this error.

Error: copy failed: cannot stat source file ../../vendor/github.com/elastic/beats/libbeat/common/file: stat ../../vendor/github.com/elastic/beats/libbeat/common/file: no such file or directory

Thanks

@urso
Copy link

urso commented Apr 22, 2020

Do not import github.com/elastic/beats/libbeat directly. Beats has switched to use go modules. When using the import like this the go build tools try to fetch libbeat from github, which is not the version/branch you are developing against. Replace the imports to say github.com/elastic/beats/v7/libbeat. Also check the go.mod and go.sum files. If they mention github.com/elastic/beats you will need to clean this up as well.

@urso
Copy link

urso commented Apr 23, 2020

jenkins run the tests please

1 similar comment
@urso
Copy link

urso commented Apr 24, 2020

jenkins run the tests please

@urso
Copy link

urso commented Apr 27, 2020

jenkins run the tests please

@urso
Copy link

urso commented Apr 27, 2020

Appropriate tests have been run through on Travis. I triggered a (hopefully) final test run on Jenkins.

@urso urso added the needs_backport PR is waiting to be backported to other branches. label Apr 28, 2020
@urso urso merged commit 09fd4df into elastic:master Apr 28, 2020
@urso urso added v7.8.0 and removed needs_backport PR is waiting to be backported to other branches. labels Apr 28, 2020
urso pushed a commit to urso/beats that referenced this pull request Apr 28, 2020
…alues of fields. (elastic#17342)

This PR is to add a replace processor. This processor takes in a field name, search string and replacement string. Searches field value for pattern and replaces it with replacement string.

(cherry picked from commit 09fd4df)
v1v added a commit to v1v/beats that referenced this pull request Apr 28, 2020
…unbld

* upstream/master:
  ci: comment PRs with the build status (elastic#17971)
  Add domain state metricset to kvm module (elastic#17673)
  [Agent] Allow CLI paths override (elastic#17781)
  Fix generated metricbeat so create-metricset works. (elastic#18020)
  LIBBEAT: Enhancement replace_string processor for replacing strings values of fields. (elastic#17342)
  Update stale references to _xpack to refer to _license instead (elastic#18030)
  Review dependency patterns collection in Jenkins (elastic#18004)
@premendrasingh
Copy link
Contributor Author

Thanks a lot @urso

urso pushed a commit that referenced this pull request Apr 29, 2020
…sor for replacing strings values of fields. (#18047)
@urso
Copy link

urso commented Apr 29, 2020

Changes have been backported and will be released in 7.8.0. Thank you for contributing.

@urso urso added the test-plan Add this PR to be manual test plan label Apr 29, 2020
@andresrc andresrc added test-plan-added This PR has been added to the test plan and removed [zube]: Done labels May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement :Processors review Team:Integrations Label for the Integrations team Team:Services (Deprecated) Label for the former Integrations-Services team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants