Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support prometheus client 1.x #4406

Closed
pjfanning opened this issue Nov 27, 2023 · 9 comments
Closed

Support prometheus client 1.x #4406

pjfanning opened this issue Nov 27, 2023 · 9 comments
Labels
superseded An issue that has been superseded by another

Comments

@pjfanning
Copy link
Contributor

pjfanning commented Nov 27, 2023

Please describe the feature request.

Prometheus Client 1.x releases differ from the 0.16 release.

Rationale

Use the v1 release instead of the v0 releases.

Additional context

@jonatan-ivanov jonatan-ivanov changed the title support prometheus client 1.1 Support prometheus client 1.x Nov 28, 2023
@jonatan-ivanov jonatan-ivanov added enhancement A general enhancement registry: prometheus A Prometheus Registry related issue and removed waiting-for-triage labels Nov 28, 2023
@jonatan-ivanov
Copy link
Member

This is related: #3987
Unfortunately this is not very easy since Prometheus Client 1.0 was released with a lot of breaking changes.

@jonatan-ivanov jonatan-ivanov linked a pull request Nov 28, 2023 that will close this issue
@cachescrubber
Copy link
Contributor

Any chance the v1 client might improve the situation with #877 ?

@shakuzen
Copy link
Member

shakuzen commented Mar 6, 2024

Any chance the v1 client might improve the situation with #877 ?

It's possible. We'll revisit that when we've figured out the details of the upgrade.

jonatan-ivanov added a commit to jonatan-ivanov/micrometer that referenced this issue Mar 11, 2024
Dependencies were upgraded to the 1.x versions of the Prometheus client
artifacts in micrometer-registry-prometheus.
This is a breaking change compared to the code that was there before.
This is unavoidable due to the breaking changes
in the Prometheus client from 0.x to 1.x.
The previous code from micrometer-registry-prometheus was moved to
a new module micrometer-registry-prometheus-simpleclient that
is deprecated but available to give users a backward compatible option
in case they need to make changes to use the new Prometheus client.

The base package used in micrometer-registry-prometheus was changed to
io.micrometer.prometheusmetrics to differentiate it from the
prior package now used in micrometer-registry-prometheus-simpleclient.
This avoids split packages and allows both modules to be used
in the same application even.

Tests were adapted to the new API and differences in the scrape output.
Tests involving exemplars are commented out until
exemplar support is added.

PrometheusHistogram's cumulative buckets were converted to delta in the
registry. The new Prometheus client wants delta buckets which is weird
since it converts them back to cumulative internally.

PrometheusNamingConvention behavior was changed, it does not append
"_total" to counters anymore. The new Prometheus client validates
Counter names and check if they end with "_total".
If they don't, it appends "_total". If they do, it throws an exception:
java.lang.IllegalArgumentException: 'cnt_total': Illegal metric name...
Because of this behavior, if we want to use PrometheusNamingConvention,
we need to modify it so that it does not append "_total".

Known issues in the micrometer-registry-prometheus module after this PR:
- Exemplars and Native Histograms do not work.
- VictoriaMetrics histograms are not supported.
- The OSGi test was failing when the new Prometheus client was used, so
it was updated to use the -simpleclient module for now.

See micrometer-metricsgh-3987
See micrometer-metricsgh-4406

Co-authored-by: Tommy Ludwig <8924140+shakuzen@users.noreply.github.com>
jonatan-ivanov added a commit that referenced this issue Mar 11, 2024
Dependencies were upgraded to the 1.x versions of the Prometheus client
artifacts in micrometer-registry-prometheus.
This is a breaking change compared to the code that was there before.
This is unavoidable due to the breaking changes
in the Prometheus client from 0.x to 1.x.
The previous code from micrometer-registry-prometheus was moved to
a new module micrometer-registry-prometheus-simpleclient that
is deprecated but available to give users a backward compatible option
in case they need to make changes to use the new Prometheus client.

The base package used in micrometer-registry-prometheus was changed to
io.micrometer.prometheusmetrics to differentiate it from the
prior package now used in micrometer-registry-prometheus-simpleclient.
This avoids split packages and allows both modules to be used
in the same application even.

Tests were adapted to the new API and differences in the scrape output.
Tests involving exemplars are commented out until
exemplar support is added.

PrometheusHistogram's cumulative buckets were converted to delta in the
registry. The new Prometheus client wants delta buckets which is weird
since it converts them back to cumulative internally.

PrometheusNamingConvention behavior was changed, it does not append
"_total" to counters anymore. The new Prometheus client validates
Counter names and check if they end with "_total".
If they don't, it appends "_total". If they do, it throws an exception:
java.lang.IllegalArgumentException: 'cnt_total': Illegal metric name...
Because of this behavior, if we want to use PrometheusNamingConvention,
we need to modify it so that it does not append "_total".

Known issues in the micrometer-registry-prometheus module after this PR:
- Exemplars and Native Histograms do not work.
- VictoriaMetrics histograms are not supported.
- The OSGi test was failing when the new Prometheus client was used, so
it was updated to use the -simpleclient module for now.

See gh-3987
See gh-4406

Co-authored-by: Tommy Ludwig <8924140+shakuzen@users.noreply.github.com>
@jonatan-ivanov
Copy link
Member

Fyi: we added initial support for this (see the PR description, TL;DR: exemplars and native histograms do not work yet): #4846.
It was released yesterday in 1.13.0-M2 please feel free to try it out and give us feedback.

@cachescrubber
Copy link
Contributor

Here is my initial feedback. Since the current spring-boot auto-configuration is based on the old simpleclient, it's not straight forward to test the new implementation. I used the following configuration which registers the PrometheusMetricsServlet instead of the actuator endpoint.

@Configuration
public class MicrometerPrometheusConfiguration {

  @Bean
  public PrometheusMeterRegistry PrometheusMeterRegistry() {
    return new PrometheusMeterRegistry(PrometheusConfig.DEFAULT, new PrometheusRegistry(), Clock.SYSTEM);
  }
  @Bean
  public ServletRegistrationBean<PrometheusMetricsServlet> createPrometheusMetricsEndpoint(PrometheusMeterRegistry prometheusMeterRegistry) {
    return new ServletRegistrationBean<>(new PrometheusMetricsServlet(prometheusMeterRegistry.getPrometheusRegistry()), "/metrics/*");
  }
}

I spotted another illegal metric name:

2024-03-20T09:50:46.394+01:00 ERROR 89680 --- [vf-av-check-app] [omcat-handler-0] .[.[.[.[createPrometheusMetricsEndpoint] : Servlet.service() for servlet [createPrometheusMetricsEndpoint] in context with path [] threw exception

java.lang.IllegalArgumentException: 'jvm_info': Illegal metric name. The metric name must not include the '_info' suffix. Call PrometheusNaming.sanitizeMetricName(name) to avoid this error.
	at io.prometheus.metrics.model.snapshots.MetricMetadata.validate(MetricMetadata.java:106)
	at io.prometheus.metrics.model.snapshots.MetricMetadata.<init>(MetricMetadata.java:63)
	at io.micrometer.prometheusmetrics.PrometheusMeterRegistry.getMetadata(PrometheusMeterRegistry.java:529)
	at io.micrometer.prometheusmetrics.PrometheusMeterRegistry.getMetadata(PrometheusMeterRegistry.java:522)
	at io.micrometer.prometheusmetrics.PrometheusMeterRegistry.lambda$newGauge$10(PrometheusMeterRegistry.java:315)
	at io.micrometer.prometheusmetrics.MicrometerCollector.collect(MicrometerCollector.java:77)
	at io.prometheus.metrics.model.registry.MultiCollector.collect(MultiCollector.java:26)
	at io.prometheus.metrics.model.registry.PrometheusRegistry.scrape(PrometheusRegistry.java:72)
	at io.prometheus.metrics.exporter.common.PrometheusScrapeHandler.scrape(PrometheusScrapeHandler.java:112)
	at io.prometheus.metrics.exporter.common.PrometheusScrapeHandler.handleRequest(PrometheusScrapeHandler.java:53)
	at io.prometheus.metrics.exporter.servlet.jakarta.PrometheusMetricsServlet.doGet(PrometheusMetricsServlet.java:39)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:564)

After I disabled the JVM metrics (using management.metrics.enable.jvm=false) I could successfully scrape the application.

I haven't currently analyzed the scrape output for other incompatibilities or feed it into prometheus.

@cachescrubber
Copy link
Contributor

cachescrubber commented Mar 21, 2024

I haven't currently analyzed the scrape output for other incompatibilities or feed it into prometheus.

Most notably difference in the scrape output is a change in the names of TYPE summary metrics. Is this expected with the new client? It seems in line with https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md

Old (Simpleclient)

# HELP spring_security_authorizations_active_seconds
# TYPE spring_security_authorizations_active_seconds summary
spring_security_authorizations_active_seconds_active_count{application="my-app",spring_security_authentication_type="n/a",spring_security_authorization_decision="unknown",spring_security_object="request",} 0.0
spring_security_authorizations_active_seconds_duration_sum{application="my-app",spring_security_authentication_type="n/a",spring_security_authorization_decision="unknown",spring_security_object="request",} 0.0

New Client

# HELP spring_security_authorizations_active_seconds
# TYPE spring_security_authorizations_active_seconds summary
spring_security_authorizations_active_seconds_count{application="my-app",spring_security_authentication_type="n/a",spring_security_authorization_decision="unknown",spring_security_object="request"} 0
spring_security_authorizations_active_seconds_sum{application="my-app",spring_security_authentication_type="n/a",spring_security_authorization_decision="unknown",spring_security_object="request"} 0.0

@shakuzen
Copy link
Member

@cachescrubber thank you for trying things out and giving feedback. It's very appreciated. We are aware of the issues you pointed out, and we should capture them in an upgrade guide. Thanks for writing them out; that will make it easier to write the guide for other users to follow. We were hoping to get auto-config support into Boot in the milestone that was just released, but unfortunately we weren't able to get it in. The current plan is for exemplar support and auto-configuration to be worked out in time for RC1. The jvm_info issue should be fixed in snapshots when #4866 is merged. The change to the LongTaskTimer timeseries' name in Prometheus output is a breaking change that aligns it with what Prometheus convention expects (but we were never doing). We may take another look at what we could/should be doing there. Conceptually LTTs are GaugeHistograms, but Prometheus format doesn't have this concept - only OpenMetrics does.

@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented Mar 22, 2024

@cachescrubber Thank you very much!

Since the current spring-boot auto-configuration is based on the old simpleclient, it's not straight forward to test the new implementation.

We already have a draft PR for Boot: spring-projects/spring-boot#40023 as Tommy mentioned above, we hope we can have this in Boot 3.3.0-RC1.

I used the following configuration which registers the PrometheusMetricsServlet instead of the actuator endpoint.

I think that might be the simplest way to try this out with Boot.

Fyi: I merged #4866 in so if you use the latest -SNAPSHOT jvm.info should be ok.

On top of what Tommy said about the breaking changes around LongTaskTimer: in the 0.x version of the Prometheus client we had more control over over the output. With 1.x, we lost that control and things that did not follow Prometheus conventions now need to.

The next thing is adding Exemplars support and updating the Boot PR with the new bits: #4867

@jonatan-ivanov jonatan-ivanov removed this from the 1.13.x milestone Apr 5, 2024
@jonatan-ivanov jonatan-ivanov added superseded An issue that has been superseded by another and removed enhancement A general enhancement registry: prometheus A Prometheus Registry related issue labels Apr 5, 2024
@jonatan-ivanov jonatan-ivanov closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
superseded An issue that has been superseded by another
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants