-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatible dropwizard metrics #9416
Conversation
45d4721
to
5eb9ab5
Compare
Update on the issue: Metrics created using Dropwizard and Yammer are very close, but not exactly the same. By running using the new CompoundPinotMetricsFactory as factory, we can produce both metrics. The first difference is the id of the beans created. JMX metrics have several coordinates. Some of them are standard (like domain, name) and some other are not. Specifically, our usage of Yammer and Dropwizard uses these three coordinates:
As the last part, we configure the prometheus exporter to map JMX metrics to prometheus. In files like - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.([^\\.]*?)\\.totalServerResponseSize\"><>(\\w+)"
name: "pinot_broker_totalServerResponseSize_$2"
cache: true
labels:
table: "$1" which follows the path: In order to be able to read Prometheus metrics, we should change the pattern to something like: - pattern: "\"?org.apache.pinot.common.metrics\"?<type=\\w, name=\"pinot.broker.([^\\.]*?)\\.totalServerResponseSize\"><>(\\w+)" Probably we could even skip the type and generate something like: - pattern: "\"?org.apache.pinot.common.metrics\"?<name=\"pinot.broker.([^\\.]*?)\\.totalServerResponseSize\"><>(\\w+)" This changes should not be done when using |
26cb41a
to
be36697
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9416 +/- ##
============================================
+ Coverage 61.75% 63.64% +1.89%
- Complexity 207 1555 +1348
============================================
Files 2436 2652 +216
Lines 133233 145485 +12252
Branches 20636 22218 +1582
============================================
+ Hits 82274 92595 +10321
- Misses 44911 46059 +1148
- Partials 6048 6831 +783
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Very promising work! @jackjlli Can you help review this? |
If we use Compound metrics registry, would the prometheus storage double? |
That depends on our Prometheus config. We could configure Prometheus to store one set of the metrics or both. In the latter case the storage would be doubled.
I think we should not use Compound in actual deployments. It may be useful to use that in a integration test in order to verify that both metric registries are actually returning compatible information and that we can create the same Prometheus metrics from one or the other. In case we want to actually use Compound and we want to register both metrics in Prometheus, we would need to have two rules per metric. One can be the same we have right now (which is going to read from Yammer) and then we can add another like:
The latter will only match with Dropwizard because the domain does not start with |
import org.slf4j.LoggerFactory; | ||
|
||
@MetricsFactory | ||
public class CompoundPinotMetricsFactory implements PinotMetricsFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some javadoc (e.g. how to use it and what to note) for this class in case ppl would like to try it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some lines. Is easier to read now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I saw some tests failed. Could you fix them before merging it? Thanks!
import org.slf4j.LoggerFactory; | ||
|
||
@MetricsFactory | ||
public class CompoundPinotMetricsFactory implements PinotMetricsFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for that!
…ained in PinotConfiguration
…ch less libraries into the classpath.
092b090
to
358672d
Compare
@gortiz Is this PR still being worked on/active? The compound metrics plugin concept will also help for future open telemetry plugin (when/if it's added). |
No, it is not. But if it is useful for someone we can add it. Anyway, it would need a +1 from some committer. |
77d0a82
to
0879e08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve the conflict
Resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackjlli Can you please take another look at this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Thanks for making the changes in this PR! 👍
We currently use Yammer as the metric library, but Yammer is not maintained and it doesn’t have new features other more modern metric libraries (like Drowizard or Micrometer) have. For example, Yammer histograms are misleading in some situations (I strongly recommend to read this article to learn more about the problem). This issue also affects Dropwizard (in fact the referenced article is focused on Dropwizard) but Dropwizard has evolved in a way that the problem can be fixed by changing the reservoir.
We already have a Dropwizard implementation, but we cannot currently use it because it produces metrics with other JMX names, which could break alerts and dashboards our users have right now.
At this moment this PR is a draft on which we can discuss, not a ready to merge PR. This PR changes some default values and therefore we would need to test it properly in environments we can break.
This PR creates a new metric plugin called compound. This metric plugin contains a list of other metric plugins. Each time a metric is registered or unregistered in the compound registry, it is registered or unregistered in all other metric plugins. The metric plugins that are notified by the compound registry can be configured, but by default it notifies all other metric plugins in the classpath.
By using this compound metric plugin and including both Dropwizard and Yammer plugins in the classpath, Pinot will register each metric twice: One in Yammer and one in Dropwizard. As said above, each registry produces its own JMX names, so alerts and dashboards created based on the Yammer metrics will continue to work as expected, but given that Dropwizard JMX names will also be published, dashboard and alerts can be migrated to use the new ones and therefore can use the new features.
This PR also adds the ability to change the domain on which Dropwizard metrics are published by changing the property
pinot.metrics.dropwizard.domain
, whose default value ismetrics
, following the Dropwizard default domain. This domain is the prefix used by Dropwizard to create its MBean names, which are something like"<domain>";type="<type>";name="<metric_name>"
.By default a Pinot metric called
myTimer
of typetimer
in the Pinot server is exposed:"org.apache.pinot.common.metrics";type="ServerMetrics";name="myTymer"
"metrics";type="timers";name="myTymer"
When the new
pinot.metrics.dropwizard.domain
property is changed toorg.apache.pinot.common.metrics
, Dropwizard will export the metric asorg.apache.pinot.common.metrics;type=timer;name=myTymer
. Note that this name does not double quote the names. For example the domain will bepinot.metrics.dropwizard.domain
, not"pinot.metrics.dropwizard.domain"
as it is used in Yammer. As far as I know it is not possible to tell Dropwizard to use other value astype
. It always use the type of the metric (gauge, timer, histogram, counter, etc).Therefore the idea would be to do the following:
pinot.X.metrics.factory.className
should valueorg.apache.pinot.plugin.metrics.compound.CompoundPinotMetricsFactory
.pinot.X.metrics.dropwizard.domain
to beorg.apache.pinot.common.metrics
"org.apache.pinot.common.metrics";type="XMetrics";name="whateverMetricName"
org.apache.pinot.common.metrics;type=T;name=whateverMetricName
If we do that, Prometheus should be exporting the Yammer values, as it used to. But we can either use another Prometheus exporter process to check the differences or change the config of the Prometheus agent we usually do to either export both metrics or start exporting using the values from Dropwizard metrics instead of the ones generated by Yammer. The later can be done without changing the Prometheus names and therefore without requiring changes on the alerts or Grafana dashboards.