Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regex processor plugin #3839

Merged
merged 4 commits into from
May 21, 2018
Merged

Conversation

44px
Copy link
Contributor

@44px 44px commented Feb 26, 2018

Based on proposal from @danielnelson in #2667 (comment)

Regex processor plugin allows to change tag and field values or create new tags/fields from existing ones.

My use-case for this plugin: it allows to use a simple predefined pattern such as %{COMBINED_LOG_FORMAT} in logparser plugin then do some processing to extract additional data from request field.

Configuration:

[[processors.regex]]
  namepass = ["nginx_requests"]

  [[processors.regex.tags]]
    key = "resp_code"
    pattern = "^(\\d)\\d\\d$"
    replacement = "${1}xx"

  [[processors.regex.fields]]
    key = "request"
    pattern = "^/api(?P<method>/[\\w/]+)\\S*"
    replacement = "${method}"
    result_key = "method"

  [[processors.regex.fields]]
    key = "request"
    pattern = ".*category=(\\w+).*"
    replacement = "${1}"
    result_key = "search_category"

Source Metric (from logparser):

nginx_requests,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000

Example Output:

nginx_requests,verb=GET,resp_code=2xx request="/api/search/?category=plugins&q=regex&sort=asc",method="/search/",search_category="plugins",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@danielnelson danielnelson added this to the 1.7.0 milestone Mar 7, 2018
@ada-foss ada-foss mentioned this pull request Mar 15, 2018
3 tasks
@syepes
Copy link

syepes commented Apr 7, 2018

very nice

@voiprodrigo
Copy link
Contributor

Would be possible to use this to convert a field k/v pair to a tag, or extract tag name and value from partial field name (while renaming the field name too)?

@44px
Copy link
Contributor Author

44px commented Apr 8, 2018

Thanks for your interest!

As for now, this plugin operates only on values, so it can't convert tag to field or change its name.

I thought about some option which allows to convert between tag and field but later decided it would be better to have separate plugin for this. Such plugin can have more obvious name – "tag2field" instead of too general "regex" – so it would be easier to find it.

extract tag name and value from partial field name (while renaming the field name too)

I'm not sure I properly understand your use case here. Can you give example of field name you have and what tag you want to create from it?

@voiprodrigo
Copy link
Contributor

voiprodrigo commented Apr 8, 2018

extract tag name and value from partial field name (while renaming the field name too)

I'm not sure I properly understand your use case here. Can you give example of field name you have and what tag you want to create from it?

Actually, what I was thinking would require splitting the metric point into multiple ones, which should probably be handled by the input plugin itself. Or by a stand-alone processor maybe (I have no idea if a processor can create metric points).

Let's say a metric has field names which denote a thread index in a multi-threaded process:

measurement,pname=whatever thread0_valueA=0,thread1_valueA=0,thread0_valueB=1,thread1_valueB=2

This would translate to:

measurement,pname=whatever,thread=0 valueA=0,valueB=1
measurement,pname=whatever,thread=1 valueA=0,valueB=2

So this would be something more or less like:

  • Match all fields named ^(thread)(\d+)_(.+)$
  • Create metric point with new tag \1=\2 for each distinct \2
  • Add matching fields to the new point, renamed to \3

I implement something like this as an option to an existing input plugin (targeting specific field names created by the plugin), so I was wondering if your plugin could be used as a generic way to do the same kind of thing.

@danielnelson
Copy link
Contributor

@voiprodrigo This seems like a fairly specialized operation, I'm not sure if it can be generalized nicely. I think for this it would make sense if we had a processor that could run a user script to filter messages. To avoid slowdown we would run the processor once and feed data in/out via stdin/stdout.

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I just have a few performance suggestions since this will be ran so frequently.

}

func getValue(c converter, value string) string {
regex := regexp.MustCompile(c.Pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should compile the regex pattern only once on startup, you could do this in Apply with a boolean field on the Regex type so it is only executed once.

func (r *Regex) Apply(in ...telegraf.Metric) []telegraf.Metric {
for _, metric := range in {
for _, converter := range r.Tags {
if value, ok := metric.Tags()[converter.Key]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use metric.GetTag to avoid allocating.

}

for _, converter := range r.Fields {
if value, ok := metric.Fields()[converter.Key]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use metric.GetField here as well.

@44px
Copy link
Contributor Author

44px commented May 20, 2018

Thanks for the tips! I applied GetTag/GetField and added cache for compiled regex patterns.

Here are results of running go test -bench=. -benchmem for this package:

Before:

pkg: github.com/influxdata/telegraf/plugins/processors/regex
BenchmarkConversions-4            100000             14945 ns/op           11192 B/op        147 allocs/op
PASS
ok      github.com/influxdata/telegraf/plugins/processors/regex 1.676s

After:

pkg: github.com/influxdata/telegraf/plugins/processors/regex
BenchmarkConversions-4            500000              2326 ns/op             632 B/op         25 allocs/op
PASS
ok      github.com/influxdata/telegraf/plugins/processors/regex 1.209s

@danielnelson danielnelson merged commit ccc4a85 into influxdata:master May 21, 2018
@44px 44px deleted the processor_regex branch May 22, 2018 07:28
leodido pushed a commit that referenced this pull request May 22, 2018
maxunt pushed a commit that referenced this pull request Jun 26, 2018
otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants