Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selectively rename metrics #22735

Closed
PW999 opened this issue May 24, 2023 · 6 comments
Closed

Selectively rename metrics #22735

PW999 opened this issue May 24, 2023 · 6 comments
Labels
enhancement New feature or request processor/transform Transform processor

Comments

@PW999
Copy link

PW999 commented May 24, 2023

Component(s)

processor/transform

Is your feature request related to a problem? Please describe.

Due to open-telemetry/opentelemetry-collector#3410 it is currently impossible to use the metric_relabel_configs to rename metrics. A use case we have from https://github.com/datastax/metric-collector-for-apache-cassandra requires us to do a lot of specific renamings (see https://github.com/datastax/metric-collector-for-apache-cassandra/releases/download/v0.3.4/datastax-mcac-dashboards-0.3.4.zip).

A simple example is

 - source_labels: ["mcac"]
   regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+)
   target_label: __name__
   replacement: mcac_keyspace_${1}

The regex matches different metrics collectd_mcac_micros_count_total, collectd_mcac_micros_sum, collectd_mcac_micros_bucket_654949, etc ... However, these are shared by a lot of other different "metrics" which are distinguished by the "mcac" label.

E.g.

collectd_mcac_micros_count_total{mcac="org.apache.cassandra.metrics.dropped_message.internal_dropped_latency.read_repair",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 0 1684912308142
collectd_mcac_micros_count_total{mcac="org.apache.cassandra.metrics.dropped_message.internal_dropped_latency.request_response",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 0 1684912308341
collectd_mcac_micros_count_total{mcac="org.apache.cassandra.metrics.keyspace.cas_commit_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 1123147 1684912308145
collectd_mcac_micros_count_total{mcac="org.apache.cassandra.metrics.keyspace.cas_prepare_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 1123191 1684912308349
collectd_mcac_micros_count_total{mcac="org.apache.cassandra.metrics.keyspace.cas_propose_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 1123174 1684912308145
collectd_mcac_micros_count_total{mcac="org.apache.cassandra.metrics.keyspace.range_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 115970287 1684912308370

I've tried a lot of different ways to specifically rename the metrics and limit the amount of impacted metrics, resulting in this ugly piece of (semi-generated) code

      transform:
        error_mode: propagate
        metric_statements:
          - context: datapoint
            statements:
              - set(attributes["is_rename"], "true") where IsMatch(attributes["mcac"], "org\\.apache\\.cassandra\\.metrics\\.keyspace\\.(\\w+)\\.(\\w+)")
              - set(attributes["new_name"], attributes["mcac"]) where IsMatch(attributes["mcac"], "org\\.apache\\.cassandra\\.metrics\\.keyspace\\.(\\w+)\\.(\\w+)")
              - set(attributes["rararandom"], "86434") where IsMatch(attributes["mcac"], "org\\.apache\\.cassandra\\.metrics\\.keyspace\\.(\\w+)\\.(\\w+)")
              - replace_pattern(attributes["new_name"], "org\\.apache\\.cassandra\\.metrics\\.keyspace\\.(\\w+)\\.(\\w+)", "mcac_keyspace_$$1")
              - set(metric.name, attributes["new_name"]) where attributes["is_rename"] == "true" and  attributes["rararandom"] == "86434"

The new_name attribute is set correctly (even when running all ~30 similar blocks of code), however, the set(metric.name, attributes["new_name"]) renames all metric values.

So, taking the above example, the result (without the extra labels) would be

mcac_keyspace_dev{mcac="org.apache.cassandra.metrics.dropped_message.internal_dropped_latency.read_repair",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 0 1684912308142
mcac_keyspace_dev{mcac="org.apache.cassandra.metrics.dropped_message.internal_dropped_latency.request_response",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 0 1684912308341
mcac_keyspace_dev{mcac="org.apache.cassandra.metrics.keyspace.cas_commit_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 1123147 1684912308145
mcac_keyspace_dev{mcac="org.apache.cassandra.metrics.keyspace.cas_prepare_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 1123191 1684912308349
mcac_keyspace_dev{mcac="org.apache.cassandra.metrics.keyspace.cas_propose_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 1123174 1684912308145
mcac_keyspace_dev{mcac="org.apache.cassandra.metrics.keyspace.range_latency.dev",instance="10.102.107.214",mcac_filtered="true",cluster="dev-backend-cassandra3-cluster",dc="eu-west-dev-backend",rack="1c"} 115970287 1684912308370

Describe the solution you'd like

I'm not 100% sure what is exactly possible, but preferably it would be great that where statements in a set(metric.name, attributes["new_name"]) line could be honored.

If that would result in undesirable side-effects when implemented directly into the processor, it would be great if there would be a possibility to do those steps by hand using something like a copy_metric(from_name, to_name). This way we could copy the metric to a new one with as name attributes["new_name"]

A note in the documentation about these limitations would also be very nice.

Describe alternatives you've considered

The metricstransformprocessor can also rename metrics, but substitution only works on with groups from the metric name and not from the experimental_match_labels .

I've also thought about using the metricsgenerationprocessor and scale the metrics with 1.0, but with it's development stability level it's a bit too fresh for our tastes.

Additional context

No response

@PW999 PW999 added enhancement New feature or request needs triage New item requiring triage labels May 24, 2023
@github-actions github-actions bot added the processor/transform Transform processor label May 24, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented May 24, 2023

I am going to share a nuance of the transformprocessor that I think you're running into but I'm not entirely sure.

Since you need access to the attributes you're running in the datapoint context which is correct. The last statement updates the metric name of the datapoint, which is also good. But the transformprocessor is basically just a big loop, so when you say "update the name of the metric associated with this datapoint" it will do that ever every single data point in the metric that passes the where clause. As a result, the value of the metric name ends up being the result of the statement when it is run on the very last datapoint. This isn't an issue when datapoint attributes are similar, but in your case they aren't, so I think it is an issue.

If new_name ends up the same for all datapoints in a metric then this nuance doesn't matter and the issue is something else.

When debugging transformprocessor statements I find the best solution is to add a loggingexporter with verbosity: detailed and start incrementally adding statements and check if the output is what I expect. In this situation it appears that the final where clause is not being honored, so we need to check that attributes["is_rename"] and attributes["rararandom"] are really what we think they are.

@PW999
Copy link
Author

PW999 commented Jun 4, 2023

We indeed noticed that the final name to which the metrics are renamed is the last one which is outputted by the cassandra exporter. When we skip the rename step, we can verify that the values of "is_rename", "new_name" and "rararandom" (after a couple of days of trying I got a bit tired 😅) are correct. However, skipping the rename or any other step which was provided by datastax generates a ton of errors in the collector because it's not happy with the intermediate state and conflicting data.

The biggest problem we're facing with debugging this is the vast amount of metrics which are created by the exporter and the fact that we're not always 100% sure how the exported metrics are supposed to work (we're extremely happy that datastax provided the transformations and an out of the box grafana dashboard that uses these transformed metrics).

For now we've settled with having an intermediate prometheus which forwards then forwards the metrics to our managed AWS prometheus instance, though this is far from ideal.

@TylerHelmuth
Copy link
Member

@PW999 if you can I would recommend debugging locally with a payload that you can trigger manually. This will make it easier to identify how the metric is changing.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

I feel confident that the transformprocessor can handle this situation. I am going to close this issue, but ping me if you feel it should remain open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/transform Transform processor
Projects
None yet
Development

No branches or pull requests

3 participants