Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send metric tags as riemann tags and attributes #1845

Closed
wants to merge 3 commits into from
Closed

Send metric tags as riemann tags and attributes #1845

wants to merge 3 commits into from

Conversation

JamesClonk
Copy link
Contributor

  • send Telegraf metric tags as Riemann tags and attributes
  • config option for sending measurement name as a Riemann tag and attribute
  • config option for overwriting of Telegraf tags with Riemann output specific tags

Signed-off-by: Fabio Berchtold fabio.berchtold@swisscom.com

allow sending measurement name as tag

Signed-off-by: Fabio Berchtold <fabio.berchtold@swisscom.com>
@JamesClonk
Copy link
Contributor Author

Fixes #1501.
Since Riemann has its own concept and understanding of tags and attributes, they should be sent accordingly rather than concatenating them into the service name.
It makes also sense to be able to overwrite the tags for Riemann and instead configure a set of tags specifically in the Riemann output configuration.
Telegraf tags will always also be sent as Riemann attributes. Both are of type map[string]string, their conceptual use is a direct match between the two systems.

## set measurement name as a Riemann tag instead of prepending it to the Riemann service name
measurement_as_tag = false
## list of Riemann tags, if specified use these instead of any Telegraf tags
tags = ["telegraf","custom_tag"]
Copy link
Contributor

@sparrc sparrc Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the use-case of this? why do you want to completely overwrite all telegraf tags?

Many tags in telegraf are used to differentiate metrics that are otherwise identical, so using this option has the potential to ruin many collected metrics. For example, cpu metrics typically look like this:

cpu,cpu=cpu0 usage=99
cpu,cpu=cpu1 usage=88

if the user then used this option, all they would see in Riemann is:

cpu,telegraf=custom_tag usage=88

which is basically a useless metric because there is no context on which cpu it has collected, and you've also lost the data of cpu0.

Copy link
Contributor Author

@JamesClonk JamesClonk Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sparrc The reasoning behind this is that in Riemann tags are an entirely different thing than in Telegraf.

In Riemann tags are just an array of strings. Tags in Telegraf though are key=value pairs and its more appropriate to send these as Riemann attributes, which are also key=value pairs of type map[string]string.

It actually does not make too much sense to try and squeeze Telegraf tags into Riemann tags as the old code did, or even the fallback code in this PR.
Maybe it might even be better to always only send Riemann tags if they are specifically configured in the outputs.riemann configuration, and the option renamed to riemann_tags to make it more clear that they are a different thing?


If you check the Riemann library event struct you can see that attributes in Riemann are what tags are to Telegraf/Influxdb, while Riemann tags are not:
https://github.com/amir/raidman/blob/master/raidman.go#L39-L49

Here's an example Telegraf configuration and the resulting Riemann event from one of our systems, without setting outputs.riemann.tags:

[global_tags]
  deployment = "postgresql-dev-2"
  id = "1e612b44-e92f-4d27-9f30-5e2f53947870"
  az = "z1"

[[outputs.riemann]]
  url = "localhost:5555"
  measurement_as_tag = true

...

#riemann.codec.Event{
:host "postgresql-1e612b44-e92f-4d27-9f30-5e2f53947870", :state nil, :description nil, :ttl nil,
:measurement "disk", :service "used_percent", :metric 73.90288667552354, :path "/boot", :fstype "ext4",
:tags ["disk" "z1" "postgresql-dev-2" "ext4" "1e612b44-e92f-4d27-9f30-5e2f53947870" "postgresql" "/boot"], 
:deployment "postgresql-dev-2", :id "1e612b44-e92f-4d27-9f30-5e2f53947870", :az "z1", :time 1475605281}

And the same example and resulting event with setting outputs.riemann.tags:

[global_tags]
  deployment = "postgresql-dev-2"
  id = "1e612b44-e92f-4d27-9f30-5e2f53947870"
  az = "z1"

[[outputs.riemann]]
  url = "localhost:5555"
  measurement_as_tag = true
  tags = ["telegraf","postgres_cluster"]

...

#riemann.codec.Event{
:host "postgresql-1e612b44-e92f-4d27-9f30-5e2f53947870", :state nil, :description nil, :ttl nil,
:measurement "disk", :service "used_percent", :metric 73.16736001949994, :path "/boot", :fstype "ext4",
:tags ["telegraf" "postgres_cluster"],
:deployment "postgresql-dev-2", :id "1e612b44-e92f-4d27-9f30-5e2f53947870", :az "z1", :time 1475605021}

As you can see in the first example most :tags are almost nonsensical and not really useful.
While in the second example they have a specific meaning and Riemann can be more easily configured to filter for those.
The actual Telegraf tags on the other hand are always present as additional event attributes.

I hope my explanation makes sense 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that that may make sense for your use-case, but the vast majority of telegraf users are collecting metrics from a diverse array of sources, not just a single type of source (postgres_cluster) that they're going to want to hardcode into their config file.

What if the riemann output allowed selecting particular telegraf tags to have their values applied as riemann tags? so your config could look something like this?:

[global_tags]
  deployment = "postgresql-dev-2"
  id = "1e612b44-e92f-4d27-9f30-5e2f53947870"
  az = "z1"
  description = "postgres_cluster"

[[outputs.riemann]]
  url = "localhost:5555"
  measurement_as_tag = true
  rieman_tag_keys = ["description"]

...

#riemann.codec.Event{
:host "postgresql-1e612b44-e92f-4d27-9f30-5e2f53947870", :state nil, :description nil, :ttl nil,
:measurement "disk", :service "used_percent", :metric 73.16736001949994, :path "/boot", :fstype "ext4",
:tags ["telegraf" "postgres_cluster"],
:deployment "postgresql-dev-2", :id "1e612b44-e92f-4d27-9f30-5e2f53947870", :az "z1", :time 1475605021}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sparrc hmm ok, makes sense. I modified the code to match your proposal. If riemann_tag_keys are specified / non-empty it will send only their tag values as Riemann tags, otherwise it will send all tag values.

Signed-off-by: Fabio Berchtold <fabio.berchtold@swisscom.com>
@dgnorton dgnorton added this to the 1.1.0 milestone Oct 6, 2016

# ## set measurement name as a Riemann tag instead of prepending it to the Riemann service name
# measurement_as_tag = false
# ## list of tag keys to specify, whose values get sent as Riemann tags. If empty, all Telegraf tag values will be sent to Riemann as tags.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines shouldn't be more than 80 characters

@sparrc sparrc modified the milestones: Future Milestone, 1.1.0 Oct 11, 2016
@sparrc
Copy link
Contributor

sparrc commented Oct 11, 2016

@JamesClonk Overall I think the changes look OK, but I'm concerned that the riemann plugin isn't in a very good state, and also lacks unit tests.

Currently the Riemann output plugin appears to have a strange behavior that isn't generally very useful. Rather than tacking on features and boolean switches to fix the behavior, I'd prefer if we instead had a full rewrite of the plugin. Given it's current state, I think it's safe to phase out the current plugin as a "legacy" version and phase in a newer one.

@JamesClonk would you be willing to work on a rewrite of the plugin? I'd like to see a short writeup on how telegraf data being written to Riemann should be structured, a README giving examples of how to query the data in Riemann, and unit tests specifying exactly how the outputted Riemann data is intended to look.

@sparrc
Copy link
Contributor

sparrc commented Oct 11, 2016

@JamesClonk see #1878, I'm going to close this PR for now, let's keep the discussion on the design of the plugin in #1878 until a PR for a new plugin is ready.

@sparrc sparrc closed this Oct 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants