Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry output plugin #9228

Merged
merged 14 commits into from
Aug 4, 2021
Merged

OpenTelemetry output plugin #9228

merged 14 commits into from
Aug 4, 2021

Conversation

jacobmarble
Copy link
Member

@jacobmarble jacobmarble commented May 2, 2021

  • Updated associated README.md.
  • Wrote appropriate unit tests.

Related:

This change adds an output plugin opentelemetry, which sends OTLP signals via gRPC.

OpenTelemetry is an open-source telemetry ecosystem, supporting traces, metrics and logs. The mapping of the OpenTelemetry data model to line protocol is a work-in-progress, and I welcome any feedback to that document.

The core conversion logic lives in influxdata/influxdb-observability/influx2otel, intended to stand as a canonical implementation of line protocol->OTel.

Note that, unlike the OpenTelemetry input plugin, this output plugin only supports metrics, and not traces or logs. That may be added later.

@telegraf-tiger telegraf-tiger bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins labels May 2, 2021
@codeboten codeboten mentioned this pull request May 3, 2021
3 tasks
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to distance this a bit from prometheus if possible (otel has no direct relationship). Could also use tests.

plugins/outputs/opentelemetry/README.md Outdated Show resolved Hide resolved
@jacobmarble jacobmarble requested a review from ssoroka May 10, 2021 23:19
@jacobmarble
Copy link
Member Author

@ssoroka friendly reminder

tigrannajaryan pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request May 11, 2021
This change adds a receiver for [InfluxData line protocol](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/), the native wire transfer protocol for [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/) and [InfluxDB](https://www.influxdata.com/products/influxdb/).

The conversion logic lives in [package `influx2otel`](https://github.com/influxdata/influxdb-observability/tree/main/influx2otel), and is also imported by the [proposed Telegraf OpenTelemetry output plugin](influxdata/telegraf#9228). The objective of this strategy is to create a canonical mapping, and an open-source implementation, of OpenTelemetry <-> Telegraf/InfluxDB translation.

**Link to tracking Issue:** #2951

**Testing:** The translation logic is tested in [package `influx2otel`](https://github.com/influxdata/influxdb-observability/tree/main/influx2otel).

**Documentation:** README.md included in new receiver.
@sspaink
Copy link
Contributor

sspaink commented May 11, 2021

@jacobmarble this output plugins looks good, but the library it is using is marked experimental. Do you think this plugin is ready for end users and be planned to ship with the next Telegraf feature release?

@codeboten
Copy link
Contributor

codeboten commented May 12, 2021

I ran a manual test using telegraf with this plugin to export data to an opentelemetry collector locally and didn't see any data being generated.

here's the output from the telegraf logs:

2021-05-12T21:12:35Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-05-12T21:12:35Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-05-12T21:12:35Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-05-12T21:12:35Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-05-12T21:12:35Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-05-12T21:12:35Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-05-12T21:12:35Z W! [outputs.opentelemetry] failed to add point: gauge field not found

and the data as seen on the collector side:

otel-collector              | InstrumentationLibraryMetrics #0
otel-collector              | InstrumentationLibrary
otel-collector              | Metric #0
otel-collector              | Descriptor:
otel-collector              |      -> Name: swap
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #1
otel-collector              | Descriptor:
otel-collector              |      -> Name: disk
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #2
otel-collector              | Descriptor:
otel-collector              |      -> Name: cpu
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #3
otel-collector              | Descriptor:
otel-collector              |      -> Name: mem
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #4
otel-collector              | Descriptor:
otel-collector              |      -> Name: system
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #5
otel-collector              | Descriptor:
otel-collector              |      -> Name: diskio
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleSum
otel-collector              |      -> IsMonotonic: true
otel-collector              |      -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
otel-collector              | Metric #6
otel-collector              | Descriptor:
otel-collector              |      -> Name: processes
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge

Haven't dug into the code to understand what is happening yet though. Please let me know if other details would be helpful.

@jacobmarble
Copy link
Member Author

I ran a manual test using telegraf with this plugin to export data to an opentelemetry collector locally and didn't see any data being generated.

I'll work on tests to reproduce and prevent future regression.

@jacobmarble
Copy link
Member Author

@jacobmarble this output plugins looks good, but the library it is using is marked experimental. Do you think this plugin is ready for end users and be planned to ship with the next Telegraf feature release?

Fair point. I have added integration tests to the library, removed "experimental" markers, and released v0.1.0. That exposed one bug and a fix (testing FTW!).

@jacobmarble
Copy link
Member Author

@codeboten I wrote two sets of tests:

All of the above tests work as intended. Your error "gauge field not found" is a schema issue. I suspect you are consuming metrics with a line protocol schema that is incompatible with your chosen metrics_schema = "prometheus-v1" or metrics_schema = "prometheus-v2".

If you are using a schema different to these two, then we need to agree on a strategy for converting other schemata. Not a hard thing, but could we punt it to follow-on issues and PRs against this plugin?

@jacobmarble
Copy link
Member Author

@ssoroka @codeboten PTAL

@ssoroka
Copy link
Contributor

ssoroka commented Jun 17, 2021

All of the above tests work as intended. Your error "gauge field not found" is a schema issue. I suspect you are consuming metrics with a line protocol schema that is incompatible with your chosen metrics_schema = "prometheus-v1" or metrics_schema = "prometheus-v2".

should this be possible? shouldn't all line protocol be convertible to valid OTEL ?

Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look fine. are there supposed to be test files now?

@jacobmarble
Copy link
Member Author

All of the above tests work as intended. Your error "gauge field not found" is a schema issue. I suspect you are consuming metrics with a line protocol schema that is incompatible with your chosen metrics_schema = "prometheus-v1" or metrics_schema = "prometheus-v2".

should this be possible? shouldn't all line protocol be convertible to valid OTEL ?

I agree, any valid line protocol should find its way into some kind of Otel struct. In the example shared by @codeboten , influx2otel has failed to detect the metric type (gauge, sum/counter, histogram) and so has fallen back on gauge. Then it assumes that a field gauge exists. Obviously this is incorrect.

I would like to punt this to influxdata/influxdb-observability#2

@jacobmarble
Copy link
Member Author

changes look fine. are there supposed to be test files now?

I have added various tests to the library responsible for conversion.
https://github.com/influxdata/influxdb-observability/tree/main/tests-integration
https://github.com/influxdata/influxdb-observability/tree/main/tests-synthetic

This plugin adapts the conversion library for Telegraf, so there isn't a lot to test. Share your thoughts?

@codeboten
Copy link
Contributor

All of the above tests work as intended. Your error "gauge field not found" is a schema issue. I suspect you are consuming metrics with a line protocol schema that is incompatible with your chosen metrics_schema = "prometheus-v1" or metrics_schema = "prometheus-v2".

If you are using a schema different to these two, then we need to agree on a strategy for converting other schemata. Not a hard thing, but could we punt it to follow-on issues and PRs against this plugin?

I tested locally again with both schema options and both are giving me errors. With prometheus-v1 i see the following:

2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: value type conflict for metric 'swap'; expected 'gauge' or 'untyped', got 'sum
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found
2021-06-21T22:57:31Z W! [outputs.opentelemetry] failed to add point: gauge field not found

With prometheus-v2 i see the following:

2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'swap'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'swap'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'disk'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'mem'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'cpu'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'system'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'system'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'system'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'diskio'
2021-06-21T23:00:31Z W! [outputs.opentelemetry] failed to add point: unexpected measurement name 'processes'

This is testing with a default configuration generated by running ./telegraf config > config.toml and enabling the OpenTelemetry output plugin. The values for the data doesn't appear in the opentelemetry collector logs:

otel-collector              | 2021-06-21T22:57:31.504Z	DEBUG	loggingexporter/logging_exporter.go:402	ResourceMetrics #0
otel-collector              | InstrumentationLibraryMetrics #0
otel-collector              | InstrumentationLibrary
otel-collector              | Metric #0
otel-collector              | Descriptor:
otel-collector              |      -> Name: swap
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #1
otel-collector              | Descriptor:
otel-collector              |      -> Name: mem
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #2
otel-collector              | Descriptor:
otel-collector              |      -> Name: disk
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #3
otel-collector              | Descriptor:
otel-collector              |      -> Name: cpu
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #4
otel-collector              | Descriptor:
otel-collector              |      -> Name: system
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleGauge
otel-collector              | Metric #5
otel-collector              | Descriptor:
otel-collector              |      -> Name: diskio
otel-collector              |      -> Description:
otel-collector              |      -> Unit:
otel-collector              |      -> DataType: DoubleSum
otel-collector              |      -> IsMonotonic: true
otel-collector              |      -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE

@codeboten
Copy link
Contributor

I should add that I recompiled telegraf using the following commit: b5b71662b4df09e3965dbd6129b4a2fb6ce0f324

@jacobmarble
Copy link
Member Author

This PR is taking a long time. I'm sorry about that.

I have reproduced the bug that @codeboten has shared. It's clear that requiring a Telegraf/Prometheus schema in this plugin is unreasonable. To properly fix this issue, I'll do a refactor to influxdb-observability such that the schema will be inferred from the data, rather than a config.

@jacobmarble jacobmarble requested a review from ssoroka July 15, 2021 19:34
@jacobmarble
Copy link
Member Author

jacobmarble commented Jul 15, 2021

@codeboten PTAL

Turns out that many input plugins type their output (gauge, counter, etc) without any particular schema, so that needed to be accounted for. The swap input plugin gave concise examples to work with.

@ssoroka
Copy link
Contributor

ssoroka commented Jul 21, 2021

I agree, any valid line protocol should find its way into some kind of Otel struct. In the example shared by @codeboten , influx2otel has failed to detect the metric type (gauge, sum/counter, histogram) and so has fallen back on gauge. Then it assumes that a field gauge exists. Obviously this is incorrect.

I would like to punt this to influxdata/influxdb-observability#2

We're pretty bound by backwards-compatibility; which can make it hard to change later without breaking things. Is there a better option than gauge by default? Is this a case of missing configuration?

Edit: I see you're making a change for it to infer the schema from the metric. Seems like a good call.

@jacobmarble
Copy link
Member Author

@ssoroka there is no "untyped" metric in OpenTelemetry, unfortunately.

@jacobmarble
Copy link
Member Author

The failing tests appear to be flakey. Let me know if I'm wrong about that.

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed that metrics are now being generated, thanks for addressing that. Only question i have remaining is regarding the headers configuration option.

tls.ClientConfig
Timeout config.Duration `toml:"timeout"`
Compression string `toml:"compression"`
Headers map[string]string `toml:"headers"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing where this is currently used, these headers should be setting grpc metadata in the client connections. See https://github.com/influxdata/telegraf/pull/9109/files#diff-2702903f71c7896e7e1d358e50075b5dc48150603438ee5eaea24948b8230d18R111

@ssoroka ssoroka merged commit bfcd0f6 into influxdata:master Aug 4, 2021
@ssoroka
Copy link
Contributor

ssoroka commented Aug 4, 2021

Merged! thanks @jacobmarble. @codeboten let's fix that in a separate PR

@codeboten
Copy link
Contributor

@ssoroka 👍, followed up with the fix in this PR: #9587

phemmer added a commit to phemmer/telegraf that referenced this pull request Aug 13, 2021
* origin/master: (183 commits)
  fix: CrateDB replace dots in tag keys with underscores (influxdata#9566)
  feat: Pull metrics from multiple AWS CloudWatch namespaces (influxdata#9386)
  fix: improve Clickhouse corner cases for empty recordset in aggregation queries, fix dictionaries behavior (influxdata#9401)
  fix(opcua): clean client on disconnect so that connect works cleanly (influxdata#9583)
  fix: Refactor ec2 init for config-api (influxdata#9576)
  fix: sort logs by timestamp before writing to Loki (influxdata#9571)
  fix: muting tests for udp_listener (influxdata#9578)
  fix: Do not return on disconnect to avoid breaking reconnect (influxdata#9524)
  fix: Fixing k8s nodes and pods parsing error (influxdata#9581)
  feat: OpenTelemetry output plugin (influxdata#9228)
  feat: Support AWS Web Identity Provider (influxdata#9411)
  fix: upgraded sensu/go to v2.9.0 (influxdata#9577)
  fix: Normalize unix socket path (influxdata#9554)
  docs: fix aws ec2 readme inconsistency (influxdata#9567)
  feat: Modbus Rtu over tcp enhancement (influxdata#9570)
  docs: information on new conventional commit format (influxdata#9573)
  docs: Add logo (influxdata#9574)
  docs: Adding links to net_irtt and dht_sensor external plugins (influxdata#9569)
  Upgrade hashicorp/consul/api to 1.9.1 (influxdata#9565)
  Update vmware/govmomi to v0.26.0 (influxdata#9552)
  Do not skip good quality nodes after a bad quality node is encountered (influxdata#9550)
  fix test so it hits a fake service (influxdata#9564)
  Update changelog
  Fix procstat plugin README to match sample config (influxdata#9553)
  Fix metrics reported as written but not actually written  (influxdata#9526)
  Prevent segfault in persistent volume claims (influxdata#9549)
  Update procstat to support cgroup globs & include systemd unit children (Copy of influxdata#7890) (influxdata#9488)
  Fix attempt to connect to an empty list of servers. (influxdata#9503)
  Fix handling bool in sql input plugin (influxdata#9540)
  Suricata alerts (influxdata#9322)
  Linter fixes for plugins/inputs/[fg]* (influxdata#9387)
  For Prometheus Input add ability to query Consul Service catalog (influxdata#5464)
  Support Landing page on Prometheus landing page (influxdata#8641)
  [Docs] Clarify tagging behavior (influxdata#9461)
  Change the timeout from all queries to per query (influxdata#9471)
  Attach the pod labels to the `kubernetes_pod_volume` & `kubernetes_pod_network` metrics. (influxdata#9438)
  feat(http_listener_v2): allows multiple paths and add path_tag (influxdata#9529)
  Bug Fix Snmp empty metric name (influxdata#9519)
  Worktable workfile stats (influxdata#8587)
  Update Go to v1.16.6 (influxdata#9542)
  ...
@jacobmarble jacobmarble deleted the jgm-opentelemetry branch October 27, 2021 23:17
@shruthikudlur
Copy link

Is there any plugin that converts Influx Line Protocol Metrix to OTEL Traces.

@sjwang90
Copy link
Contributor

@shruthikudlur There is the OpenTelemetry output plugin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants