Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Add metrics view API #89

Closed
wants to merge 11 commits into from
Closed

Add metrics view API #89

wants to merge 11 commits into from

Conversation

c24t
Copy link
Member

@c24t c24t commented Mar 5, 2020

This OTEP addresses the Future Work: Configurable Aggregations / View API section of api-metrics.md. It proposes a specification for a view API that allows users to configure aggregations for individual metric instruments.

It covers some of the same ground as open-telemetry/opentelemetry-specification#347, and summarizes a few weeks of conversation in metrics SIG meetings.

See also the google doc draft.

@tsloughter
Copy link
Member

I will do my best to make the metrics sig meeting on the 11th where this might be easier to discuss, but my first issue with this proposal is simply allowing Counter to be used for anything but Sum.

Having an explicit Counter instrument type allows for optimizations that are lost if the Counter has to be recorded the same as a Measure, where within a collection interval all values are recorded and then aggregated. If the user wants this I'd expect them to use a Measure and if they want a Sum they'd use a Counter, something that can rely on high performance counters on some runtimes.

@c24t
Copy link
Member Author

c24t commented Mar 6, 2020

recorded the same as a Measure, where within a collection interval all values are recorded and then aggregated

The values should actually be continuously aggregated, so we update the aggregate sum with each new measurement. Other aggregations would work this way too, e.g. a histogram aggregation would update a bucket count for each new recorded value, and wouldn't keep a reference to the measurement. Only the "exact" aggregation would store measurements.

It is odd that the view API lets you register a non-sum aggregation for a counter instrument though. What do you recommend here?

@tsloughter
Copy link
Member

I would just remove the non-sum aggregation from the view API for counter instrument.


The potentially contentious changes this OTEP proposes include:

- Require views to be registered: don't record measurements from API for metric instruments that don't appear in any view.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If views are registered by the operator/application developer, does this mean that that person will get no metric data if they don't add additional configuration? How can they know a priori what metrics they they need to register views for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I've done it is each library exports a function with predefined views. In the simplest case the application developer either enables them or not, while still having the ability to instead define their own, or take a subset of the default views a library exports.

So the operator/application developer has to allow metrics from each library, instead of them being automatic and having to have a way to disable what you don't want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now thinking of the auto-instrumentation projects, where there may be dozens of libraries being instrumented that the operator may or may not be familiar with. This is going to be a very surprising thing for "normal" auto-instrumentation agent users to have to do, as they are used to getting things OOTB without extra configuration. Would the auto-instrumentation agent then be expected to just register all the defaults provided with the instrumented libraries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that that person will get no metric data if they don't add additional configuration?

It does!

Would the auto-instrumentation agent then be expected to just register all the defaults provided with the instrumented libraries?

That's what I would expect, and IIRC it's what we did in OC. For example, the predefined default views for gRPC are here: https://github.com/census-instrumentation/opencensus-java/blob/8b1fd5bbf98b27d0ad27394891e0c64c1171cb2b/contrib/grpc_metrics/src/main/java/io/opencensus/contrib/grpc/metrics/RpcViewConstants.java.

Auto-instrumentation could enable all these views when it loads the gRPC integration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask @prydin @tylerbenson thoughts on this? It definitely seems on the surface like it would make the auto-instrumentation job significantly more difficult. Especially since auto-instrumentation still has to support multiple backends with potentially different aggregations/metric types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c24t This only seems viable to me if there's a way for an exporter author to say (for example) "for everything that's a Measure, create a view with a MinMaxSumCount aggregation.". Otherwise, I feel like this is a recipe for users thinking that nothing is working, causing a support nightmare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good solution to me, and since measures already have unique names I don't see any reason in principle that we couldn't do this.

@bogdandrutu might be able to shed some light on the design of OC views.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note about this in 46948ac.

@jkwatson
Copy link
Contributor

jkwatson commented Mar 6, 2020

Does this proposed Views API live at the same level as the other APIs, or is it a default-SDK-only thing? I asked, because this would be the first "API" that didn't target instrumentors, and instead targets operators/app developers. It feels like that will lead to confusion (it confuses me!) from the user base, especially the instrumentation-building user.

@c24t
Copy link
Member Author

c24t commented Mar 6, 2020

Does this proposed Views API live at the same level as the other APIs, or is it a default-SDK-only thing?

Views would live in the API package. Developers (as I understand the persona) would use metric instruments in application code, operators (same caveat) would specify views in either static config or code. Aggregators are SDK objects, and a view might rely on some third party package to define a particular aggregator class.

@c24t c24t added the metrics Relates to the Metrics API/SDK label Mar 26, 2020
@lzchen
Copy link
Contributor

lzchen commented Apr 30, 2020

Does this proposed Views API live at the same level as the other APIs, or is it a default-SDK-only thing?

Views would live in the API package. Developers (as I understand the persona) would use metric instruments in application code, operators (same caveat) would specify views in either static config or code. Aggregators are SDK objects, and a view might rely on some third party package to define a particular aggregator class.

I vague remember discussion about this in SIG meeting - Views API would be part of the SDK, not in API. It makes sense to me to keep Views API in the SDK only, as the intended users are application writers/operations, and not Library authors.

Yes I believe the comment was made before the discussion in the SIG meeting. You are correct.


Each view describes a relationship between a metric instrument and an aggregation, and specifies the set of label keys to preserve in the aggregated data.

Views are tracked in a registry, which exporters may use to get the current aggregate values for all registered aggregations at export time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Views a global thing, and hence must be registered with MeterProvider, so that all meters created the MeterProvider will get access to View.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional thought around this: would it be valuable to enable specific aggregations to be tied to specific exporters?

I feel like this is another reason why aggregations should be tied to a MeterProvider. Then you can choose different meterprovider sets for different metrics, as necessary.

Something like:

MeterProvider
    - exporters
    - aggregators
    - metrics (this may be unnecessary)

Because we don't require the user to specify the set of label keys up front, and because we don't prevent users from recording measurements with missing labels in the API, some label values may be undefined.
Aggregators should preserve undefined label values, and exporters may convert them as required by the backend.

For example, consider a Sum-aggregated Counter instrument that captures four consecutive measurements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a note about boundinstruments and views? Boundinstruments are supposed to be the fastest way to record a value, as it avoids a lookup for the time series. With Views, bound instruments should still be the fastest, and should not be doing lookups.

This maybe implementation detail, but calling it out in spec as well.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @c24t for getting this started!
Have left some comments - mostly after looking at Python view prototype, and .NET prototype I am working on.

Most important things I'd like to see addressed:

  1. Where is Views registered? MeterProvider? And Should we allow views to added/removed after MeterProvider is initialized?

  2. Recommendation about View being configured though static files/json etc.

  3. Default behavior if no Views registered. I believe the consensus is to do default aggregation and retain all labels, unless a user configured View overrides this.

  4. Should Views be part of SDK only?


The aggregation may be configured with options specific to its type.
For example, a _Histogram_ aggregation may be configured with bucket boundaries, e.g. `{0, 1, 10, 100, 200, 1000, inf}`.
A _Sketch_ aggregation that estimates order statistics (i.e. quantiles), may be configured with a set of predetermined quantiles, e.g. `{.5, .95, .99, 1.00}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be called "quantile" aggregation? A sketch doesn't immediately make me think of quantiles.

Implementations should refuse to register two views with the same name.

Note that a view does not describe the type (e.g. `int`, `float`) or unit of measurement (e.g. "bytes", "milliseconds") of the metric to be exported.
The unit is determined by the metric instrument, and the aggregation and exporter may preserve or change the unit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's only two operations which are appropriate for a unit modification on a metric instrument:

  1. preserve the unit
  2. change the unit entirely (e.g. to count)

I feel like it may be valuable to call out that there won't be a modification such as "seconds" to "milliseconds". But maybe that's just obvious.

- A **float**-valued _mean_ metric with "ms" units
- An **int** valued, unitless (i.e. unit "1") _count_ metric

This OTEP doesn't propose a particular API for Aggregators, just that the API is sufficient for exporters to get all this information, including:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the otep recommend an API? reading the examples, it seems like something that may work would be (name, value, unit). I believe it seems it satisfies the examples below.

min([1, 2, 3, 4, 5, 6]) = min([min([1]), min([2, 3]), min([4, 5, 6])])
```

but quantile aggregations, e.g. _p95_ and _p99_ are not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that quantiles are not supported as an aggregation? There's a strong need for quantiles, so I assume there's some way to ensure that it is measured and exported, but not clear how.

but quantile aggregations, e.g. _p95_ and _p99_ are not.
Applications that export quantile metrics should use a mergeable aggregations such as [DDSketch](https://arxiv.org/abs/1908.10693), which estimates quantile values with bounded errors, or export raw measurements without aggregation and compute exact quantiles on the backend.

We require aggregations to be mergeable so that they produce the same results regardless of the collection interval, or the number of collection events per export interval.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess again looking at quantiles as the edge case: what's the need for ensuring that the produced result are the same, regardless of collection interval? Is there some caveat where the collection interval cannot be guaranteed? I guess in current API design it's in the pushController side of things.


Every measurement is associated with a _LabelSet_, a set of key-value pairs that describes the environment in which the measurement was captured.
Label keys and values may be extracted from the [correlation context](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-correlationcontext.md) at the time of measurement.
They may also be set by the user [at the time of measurement](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#direct-instrument-calling-convention) or at the time at which [the metric instrument is bound](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#bound-instrument-calling-convention).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How specifically are values from the correlation context piped in to the labelset? I figured that the user would still be responsible for that, but as this is calling out label extraction as a separate line, it sounds like that's not the case.

@jkwatson
Copy link
Contributor

jkwatson commented Jul 13, 2020

My opinions on this:

  1. I think we absolutely should have default aggregations, and I think the metric SIG did agree to that.
  2. I don't think the exporters should have to interact with the Views API unless they want to tell the SDK that they want only delta aggregations for a given instrument type (for example). Exporters shouldn't get their data via views, but the SDK uses the view definition to figure out what aggregation/batching strategy should be used.
  3. Things I think we should shoot for for GA: Being able to specify label reductions, delta/cumulative temporality, and the type of aggregation to use, based on the instrument descriptor.

This OTEP seems to range far and wide outside of the scope for a SDK-only basic views API that I think we should shoot for for GA.

@jmacd
Copy link
Contributor

jmacd commented May 25, 2021

Closing this as obsolete.
❤️ @c24t

@jmacd jmacd closed this May 25, 2021
@morigs
Copy link

morigs commented Jun 30, 2021

Excuse me, it's unclear what's the current vision on metrics aggregation. Are there other OTEPs with the same target?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
metrics Relates to the Metrics API/SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants