Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheusremotewrite] TFC agent metrics are dropped before being forwarded to Grafana Cloud due to invalid temporality and type combination for metric error #30435

Closed
karvounis-form3 opened this issue Jan 11, 2024 · 9 comments
Labels
bug Something isn't working exporter/prometheusremotewrite needs triage New item requiring triage

Comments

@karvounis-form3
Copy link

karvounis-form3 commented Jan 11, 2024

Component(s)

prometheusremotewrite exporter

What happened?

Description

Goal is to send Terraform Cloud Agent metrics to Grafana Cloud. In order to do this, the agent Docker container is configured to send metrics to an OLTP receiver container running on the same host. There, the prometheusremotewrite exporter is supposed to forward those metrics to Grafana Cloud. HashiCorp suggests using v0.73.0 of the OpenTelemetry Collector.
However, not all of the metrics of the TFC Agent end up in Grafana Cloud due to numerous errors. The weird thing is that using datadog as well as fileexporter exporters, the metrics and their values can be found both in Datadog and file. It is only Grafana Cloud and prometheusremotewrite exporter that error.
Also, when I tried v0.89.0 (latest at the time) version of Otel, there were no errors like before but the metrics still did not appear in Grafana Cloud.

Expected Result

All TFC Agent metrics appear in Grafana Cloud.

Actual Result

tfe-agent-otel-monitoring  | 2024-01-11T13:31:29.163Z	error	exporterhelper/queued_retry.go:401	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"tfc_agent.core.register.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.update_status.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.fetch_job.milliseconds\"", "dropped_items": 4}
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:401
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/metrics.go:136
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:205
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/internal/bounded_memory_queue.go:60
tfe-agent-otel-monitoring  | 2024-01-11T13:31:59.036Z	error	exporterhelper/queued_retry.go:401	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"tfc_agent.core.update_status.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.fetch_job.milliseconds\"", "dropped_items": 16}
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:401
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/metrics.go:136
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:205
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/internal/bounded_memory_queue.go:60
tfe-agent-otel-monitoring  | 2024-01-11T13:32:28.911Z	error	exporterhelper/queued_retry.go:401	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"tfc_agent.core.update_status.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.fetch_job.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.output_stream.upload_chunk.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_variables.write_file.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_variables.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.configure_terraform_cli.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.output_stream.upload_chunk.milliseconds\"", "dropped_items": 22}
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:401
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/metrics.go:136
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:205
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/internal/bounded_memory_queue.go:60
tfe-agent-otel-monitoring  | 2024-01-11T13:32:38.961Z	error	exporterhelper/queued_retry.go:401	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"tfc_agent.core.terraform.output_stream.upload_chunk.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.output_stream.upload_chunk.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.run_meta.additions\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.run_meta.changes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.run_meta.destructions\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.run_meta.imports\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.terraform_plan.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.plan_json.generate.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.plan_json.generate.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.plan_json.upload.milliseconds\"", "dropped_items": 25}
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:401
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/metrics.go:136
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:205
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/internal/bounded_memory_queue.go:60
tfe-agent-otel-monitoring  | 2024-01-11T13:33:14.055Z	error	exporterhelper/queued_retry.go:401	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"tfc_agent.core.update_status.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.fetch_job.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.output_stream.upload_chunk.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_variables.write_file.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_variables.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.configure_terraform_cli.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.output_stream.upload_chunk.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_binary.download.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_binary.download.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_binary.unpack.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_binary.unpack.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.setup_terraform_binary.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.terraform_version.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.terraform_init.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.restore_filesystem.download.bytes\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.restore_filesystem.download.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.restore_filesystem.unpack.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.terraform.restore_filesystem.milliseconds\"", "dropped_items": 35}
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:401
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/metrics.go:136
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:205
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/internal/bounded_memory_queue.go:60
tfe-agent-otel-monitoring  | 2024-01-11T13:34:53.920Z	error	exporterhelper/queued_retry.go:401	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"tfc_agent.core.update_status.milliseconds\"; invalid temporality and type combination for metric \"tfc_agent.core.fetch_job.milliseconds\"", "dropped_items": 17}
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:401
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/metrics.go:136
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:205
tfe-agent-otel-monitoring  | go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
tfe-agent-otel-monitoring  | 	go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/internal/bounded_memory_queue.go:60

Collector version

v0.73.0

Environment information

Environment

OS: Amazon Linux 2

OpenTelemetry Collector configuration

extensions:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.73.0/extension/healthcheckextension
  health_check:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.73.0/extension/basicauthextension
  basicauth/prw:
    client_auth:
      username: ${env:GRAFANA_USERNAME}
      password: ${env:GRAFANA_API_KEY}

receivers:
  # Receiver that scrapes self telemetry metrics. See telemetry service at the bottom
  prometheus/otel_monitoring_telemetry:
    config:
      scrape_configs:
        - job_name: 'otel-metrics-collector'
          scrape_interval: 15s
          static_configs:
            - targets: ['tfe-agent-otel-monitoring:8888']
  # https://github.com/open-telemetry/opentelemetry-collector/tree/v0.73.0/receiver/otlpreceiver
  otlp/tfe_agent:
    protocols:
      grpc:
        tls:

processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024
    spike_limit_percentage: 20

exporters:
  logging:
  prometheusremotewrite:
    endpoint: ${env:GRAFANA_URL}
    auth:
      authenticator: basicauth/prw
    resource_to_telemetry_conversion:
      enabled: true
    external_labels:
      stack: ${env:STACK}
  datadog:
    api:
      key: ${env:DD_API_KEY}
      site: datadoghq.eu
  file/no_rotation:
    path: ./foo
service:
  extensions: [health_check,basicauth/prw]
  pipelines:
    metrics/tfe_agent:
      receivers:
        - otlp/tfe_agent
      processors:
        - batch
      exporters:
        - logging
        - file/no_rotation
        - prometheusremotewrite
        - datadog
    metrics/otel:
      receivers:
        - prometheus/otel_monitoring_telemetry
      processors:
        - batch
      exporters:
        - logging
  telemetry:
    metrics:
      address: :8888
      level: detailed

Log output

No response

Additional context

No response

@karvounis-form3 karvounis-form3 added bug Something isn't working needs triage New item requiring triage labels Jan 11, 2024
@karvounis-form3
Copy link
Author

Below, you can find the fileexporter metrics output:

tfe-agent-fileexporter.txt

As you can see, all TFC agent metrics appear there.

@karvounis-form3 karvounis-form3 changed the title TFC agent metrics are dropped before being forwarded to Grafana Cloud due to invalid temporality and type combination for metric error [prometheusremotewrite] TFC agent metrics are dropped before being forwarded to Grafana Cloud due to invalid temporality and type combination for metric error Jan 11, 2024
@codeboten
Copy link
Contributor

Is this a grafana cloud specific issue? Maybe @jpkrohling can help here

Copy link
Contributor

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jmichalek132
Copy link
Contributor

jmichalek132 commented Jan 11, 2024

As far as I know invalid temporality and type combination for metric is logged only here

errs = multierr.Append(errs, fmt.Errorf("invalid temporality and type combination for metric %q", metric.Name()))

when

func isValidAggregationTemporality(metric pmetric.Metric) bool {
returns false, because the translation of these is not implemented in remote write exporter. Which is documented in the readme for it here:
:warning: Non-cumulative monotonic, histogram, and summary OTLP metrics are

@karvounis-form3
Copy link
Author

Apparently, it seems that the TFC agent emits metrics in Delta temporality, which prometheusremotewrite does not like but does like Cumulative temporality. However, datadogexporter is more than happy to work with them.

It looks like we have the following options here:

  1. Send via OLTP HTTP directly to Grafana Cloud using otlphttpexporter
  2. Use combination of prometheusexporter, self-scraping and prometheusremotewriteexporter
  3. Wait for deltatocumulativeprocessor

We are going to start working on the 1st option for now.

@0x006EA1E5
Copy link

the TFC agent emits metrics in Delta temporality, which prometheusremotewrite does not like

You might want to check this, but when considering prometheusexporter and self-scraping, the delta temporality datapoints must have a start_time_unix_nano that aligns with the previous datapoint, or the prometheusexporter doesn't work well (it seems each delta as a "reset" and never accumulates)

@jmichalek132
Copy link
Contributor

Apparently, it seems that the TFC agent emits metrics in Delta temporality, which prometheusremotewrite does not like but does like Cumulative temporality. However, datadogexporter is more than happy to work with them.

It looks like we have the following options here:

  1. Send via OLTP HTTP directly to Grafana Cloud using otlphttpexporter
  2. Use combination of prometheusexporter, self-scraping and prometheusremotewriteexporter
  3. Wait for deltatocumulativeprocessor

We are going to start working on the 1st option for now.

FYI option one might not work either, since Grafana cloud is based on mimir, which does support OTLP ingestion but has the same limitation as the remote write exporter because both of them really on the same translation layer. So unless there is something in Grafana cloud that accumulates those deltas before sending them into mimir you will run into the same issue.

@karvounis-form3
Copy link
Author

You were both right. I tried the otlphttp exporter to directly send metrics to Grafana Cloud but I could still see no metrics there (no errors on the Otel collector). Also, I tried the prometheus exporter to expose metrics in a cumulative temporality but I cannot see all the metrics when I scrape the metrics endpoint.

exporters:
  prometheus:
    endpoint: "0.0.0.0:8890"
    metric_expiration: 15m
    resource_to_telemetry_conversion:
      enabled: true

Prometheus metrics that lacks the Terraform component metrics of the TFC agent.

# HELP tfc_agent_core_runtime_go_gc_count
# TYPE tfc_agent_core_runtime_go_gc_count gauge
tfc_agent_core_runtime_go_gc_count{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 14
# HELP tfc_agent_core_runtime_go_gc_pause_total_nanoseconds
# TYPE tfc_agent_core_runtime_go_gc_pause_total_nanoseconds gauge
tfc_agent_core_runtime_go_gc_pause_total_nanoseconds{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 1.078229e+06
# HELP tfc_agent_core_runtime_go_goroutines_count
# TYPE tfc_agent_core_runtime_go_goroutines_count gauge
tfc_agent_core_runtime_go_goroutines_count{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 34
# HELP tfc_agent_core_runtime_go_mem_free_count
# TYPE tfc_agent_core_runtime_go_mem_free_count gauge
tfc_agent_core_runtime_go_mem_free_count{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 185713
# HELP tfc_agent_core_runtime_go_mem_heap_alloc_bytes
# TYPE tfc_agent_core_runtime_go_mem_heap_alloc_bytes gauge
tfc_agent_core_runtime_go_mem_heap_alloc_bytes{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 3.666016e+06
# HELP tfc_agent_core_runtime_go_mem_heap_idle_bytes
# TYPE tfc_agent_core_runtime_go_mem_heap_idle_bytes gauge
tfc_agent_core_runtime_go_mem_heap_idle_bytes{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 6.537216e+06
# HELP tfc_agent_core_runtime_go_mem_heap_inuse_bytes
# TYPE tfc_agent_core_runtime_go_mem_heap_inuse_bytes gauge
tfc_agent_core_runtime_go_mem_heap_inuse_bytes{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 5.423104e+06
# HELP tfc_agent_core_runtime_go_mem_heap_objects_count
# TYPE tfc_agent_core_runtime_go_mem_heap_objects_count gauge
tfc_agent_core_runtime_go_mem_heap_objects_count{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 18981
# HELP tfc_agent_core_runtime_go_mem_heap_released_bytes
# TYPE tfc_agent_core_runtime_go_mem_heap_released_bytes gauge
tfc_agent_core_runtime_go_mem_heap_released_bytes{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 4.718592e+06
# HELP tfc_agent_core_runtime_go_mem_heap_sys_bytes
# TYPE tfc_agent_core_runtime_go_mem_heap_sys_bytes gauge
tfc_agent_core_runtime_go_mem_heap_sys_bytes{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 1.196032e+07
# HELP tfc_agent_core_runtime_go_mem_lookups_count
# TYPE tfc_agent_core_runtime_go_mem_lookups_count gauge
tfc_agent_core_runtime_go_mem_lookups_count{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 0
# HELP tfc_agent_core_runtime_go_mem_malloc_count
# TYPE tfc_agent_core_runtime_go_mem_malloc_count gauge
tfc_agent_core_runtime_go_mem_malloc_count{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 204694
# HELP tfc_agent_core_runtime_uptime_milliseconds
# TYPE tfc_agent_core_runtime_uptime_milliseconds gauge
tfc_agent_core_runtime_uptime_milliseconds{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 110000
# HELP tfc_agent_core_status_busy
# TYPE tfc_agent_core_status_busy gauge
tfc_agent_core_status_busy{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 0
# HELP tfc_agent_core_status_idle
# TYPE tfc_agent_core_status_idle gauge
tfc_agent_core_status_idle{agent_id="agent-H7U1wma7Lqjp4475",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.2",job="tfc-agent",service_name="tfc-agent"} 1
tfc_agent_core_status_idle{agent_id="agent-f9uTnCD31bxTZhe7",agent_name="test-karvworker-12345",agent_pool_id="apool-ZKuo7Ar1oo1EpXM5",agent_version="1.14.1",job="tfc-agent",service_name="tfc-agent"} 1

It looks like I can only wait for the Delta to Cumulative Processor to be delivered.

@karvounis-form3
Copy link
Author

karvounis-form3 commented Jan 18, 2024

I managed to make it work! I used v0.92.0 version of the collector because the prometheusexporter supports accumulating histograms with delta temporality, which was added in 0.92.0.

The following diagram shows the logic:

sequenceDiagram
    participant T as TFC Agent
    participant GRPC as Otel gRPC receiver
    T->>GRPC: Emits TFC agent metrics
    GRPC-->>Prometheus Exporter (delta to cumulative): Metrics are transformed to cumulative temporality by the exporter
    Prometheus Receiver->>Prometheus Exporter (delta to cumulative): Scrapes Prometheus exporter for metrics
    Prometheus Receiver-->>Prometheusremotewrite Exporter: Metrics piped to the exporter
    Prometheusremotewrite Exporter->>Grafana Cloud: Ships metrics in Cumulative temporality
Loading

Thank you for your suggestions and your help! I hope the above solution helps other people with the same problem in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheusremotewrite needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

5 participants