Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify compliant metric SDK specification implementation: MetricReader #3660

Closed
2 tasks done
Tracked by #3674
MrAlias opened this issue Feb 3, 2023 · 13 comments
Closed
2 tasks done
Tracked by #3674
Assignees
Labels
area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Feb 3, 2023

  • Identify all the normative requirements, recommendations, and options the specification defines as comments to this issue
  • Ensure the current metric SDK implementation is compliant with these normative requirements, recommendations, and options in those comments.
@MrAlias MrAlias converted this from a draft issue Feb 3, 2023
@MrAlias MrAlias added pkg:SDK Related to an SDK package area:metrics Part of OpenTelemetry Metrics labels Feb 3, 2023
@MrAlias MrAlias self-assigned this Jun 6, 2023
@MrAlias MrAlias moved this from Todo to In Progress in Go: Metric SDK (GA) Jun 6, 2023
@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

To construct a MetricReader when setting up an SDK, the caller SHOULD provide at least the following:

  • The exporter to use, which is a MetricExporter instance.
  • The default output aggregation (optional), a function of instrument kind. If not configured, the default aggregation SHOULD be used.
  • The default output temporality (optional), a function of instrument kind. If not configured, the Cumulative temporality SHOULD be used.
  • The default aggregation cardinality limit to use, a function of instrument kind. If not configured, a default value of 2000 SHOULD be used.

This recommendation is confusing. It applies to all readers but not all readers accept these parameters.

Additionally, the default aggregation cardinality limit should be marked experimental.

This is going to require a specification issue/PR.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

A common sub-class of MetricReader, the periodic exporting MetricReader SHOULD be provided to be used typically with push-based metrics collection.

We are compliant with this. We provide a PeriodicReader.

TODO: PR the specification to remove the "sub-class" term as our language does not have classes.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

The MetricReader MUST ensure that data points from OpenTelemetry instruments are output in the configured aggregation temporality for each instrument kind.

This requirement is not clear to me. The reader is not the Aggregator, it does not ensure the temporality of an aggregation. I'm not sure how this requirement can be fulfilled. It seems like a specification issue/PR is needed to clarify this.

For synchronous instruments being output with Cumulative temporality, this means converting Delta to Cumulative aggregation temporality.
For asynchronous instruments being output with Delta temporality, this means converting Cumulative to Delta aggregation temporality.

This assumes all synchronous instruments record delta values and asynchronous instruments cumulative. This is not a valid assumption. A specification issue/PR is needed.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

The MetricReader is not required to ensure data points from a non-SDK MetricProducer are output in the configured aggregation temporality, as these data points are not collected using OpenTelemetry instruments.

This is non-normative, but I'm guessing the intent was to have it be normative.

Again, given the temproality is not handled by the MetricReader but by the Aggregator I'm not sure this is entirely clear. A specification issue/PR is needed.

cc @dashpole

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

The SDK MUST support multiple MetricReader instances to be registered on the same MeterProvider [...]

A variable number of WithReader options can be passed to NewMeterProvider. Each option will register the Reader with returned MeterProvider:

mp := &MeterProvider{
pipes: newPipelines(conf.res, conf.readers, conf.views),
forceFlush: flush,
shutdown: sdown,
}

func newPipelines(res *resource.Resource, readers []Reader, views []View) pipelines {
pipes := make([]*pipeline, 0, len(readers))
for _, r := range readers {
p := newPipeline(res, r, views)
r.register(p)
pipes = append(pipes, p)
}
return pipes
}

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

[...] the MetricReader.Collect invocation on one MetricReader instance SHOULD NOT introduce side-effects to other MetricReader instances.

Each Reader registered with a MeterProvider has its own pipeline:

func newPipelines(res *resource.Resource, readers []Reader, views []View) pipelines {
pipes := make([]*pipeline, 0, len(readers))
for _, r := range readers {
p := newPipeline(res, r, views)
r.register(p)
pipes = append(pipes, p)
}
return pipes
}

That pipeline contains independent aggregators that are not shared with other pipelines:

aggregations map[instrumentation.Scope][]instrumentSync

Recording a value for an instrument will make a measurement for each of these aggregations inpendently:

func (i *float64Inst) aggregate(ctx context.Context, val float64, s attribute.Set) {
if err := ctx.Err(); err != nil {
return
}
for _, in := range i.measures {
in(ctx, val, s)
}
}

func (i *int64Inst) aggregate(ctx context.Context, val int64, s attribute.Set) { // nolint:revive // okay to shadow pkg with method.
if err := ctx.Err(); err != nil {
return
}
for _, in := range i.measures {
in(ctx, val, s)
}
}

Therefore, due to the separation of aggregators, the invocation of Collect for one Reader cannot affect the data of another reader.

I think adding a test for this would be helpful to ensure compatibility. However, it is probably unlikely the test will be able to check all edge cases for this and isn't strictly required for the resolution of this issue.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

The SDK MUST NOT allow a MetricReader instance to be registered on more than one MeterProvider instance.

Both implementation of the Reader interface only allow themselves to be registered at most once:

func (r *PeriodicReader) register(p sdkProducer) {
// Only register once. If producer is already set, do nothing.
if !r.sdkProducer.CompareAndSwap(nil, produceHolder{produce: p.produce}) {
msg := "did not register periodic reader"
global.Error(errDuplicateRegister, msg)
}
}

// register stores the sdkProducer which enables the caller
// to read metrics from the SDK on demand.
func (mr *ManualReader) register(p sdkProducer) {
// Only register once. If producer is already set, do nothing.
if !mr.sdkProducer.CompareAndSwap(nil, produceHolder{produce: p.produce}) {
msg := "did not register manual reader"
global.Error(errDuplicateRegister, msg)
}
}

Given the method to register with the MeterProvider is unexported, 3rd-party users do not have the ability to override this method. Meaning the SDK does not allow any implementation of a Reader to be registered more than once. Thus the implementation is compliant.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2023

The SDK SHOULD provide a way to allow MetricReader to respond to MeterProvider.ForceFlush and MeterProvider.Shutdown. OpenTelemetry SDK authors MAY decide the language idiomatic approach, for example, as OnForceFlush and OnShutdown callback functions.

The ForceFlush and Shutdown functions are methods on the Reader interface:

// ForceFlush flushes all metric measurements held in an export pipeline.
//
// This deadline or cancellation of the passed context are honored. An appropriate
// error will be returned in these situations. There is no guaranteed that all
// telemetry be flushed or all resources have been released in these
// situations.
//
// This method needs to be concurrent safe.
ForceFlush(context.Context) error
// Shutdown flushes all metric measurements held in an export pipeline and releases any
// held computational resources.
//
// This deadline or cancellation of the passed context are honored. An appropriate
// error will be returned in these situations. There is no guaranteed that all
// telemetry be flushed or all resources have been released in these
// situations.
//
// After Shutdown is called, calls to Collect will perform no operation and instead will return
// an error indicating the shutdown state.
//
// This method needs to be concurrent safe.
Shutdown(context.Context) error

Each of our implementations use the Go idiomatic way to define how each will respond by having methods of their own that implement the interface.

@dashpole
Copy link
Contributor

The MetricReader is not required to ensure data points from a non-SDK MetricProducer are output in the configured aggregation temporality, as these data points are not collected using OpenTelemetry instruments.

This is non-normative, but I'm guessing the intent was to have it be normative.

Again, given the temporality is not handled by the MetricReader but by the Aggregator I'm not sure this is entirely clear. A specification issue/PR is needed.

cc @dashpole

This was just meant to carve out an exception to the rule above for MetricProducers.

@dashpole
Copy link
Contributor

dashpole commented Jul 19, 2023

My reading of "must ensure that datapoints ... are output in [X] aggregation temporality" was that it could be fulfilled by passing aggregation temporality config to something that correctly handles aggregation temporality (e.g. an Aggregator, if such a thing exists).

From the spec: Note: the term aggregation is used instead of aggregator. It is RECOMMENDED that implementors reserve the "aggregator" term for the future when the SDK allows custom aggregation implementations.

I suspect at the time it was written, there was only the MetricReader, and no notion of an Aggregator. The "reader" essentially meant something like "reader and all measurement -> aggregation details".

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 28, 2023

A common sub-class of MetricReader, the periodic exporting MetricReader SHOULD be provided to be used typically with push-based metrics collection.

We are compliant with this. We provide a PeriodicReader.

TODO: PR the specification to remove the "sub-class" term as our language does not have classes.

open-telemetry/opentelemetry-specification#3628

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 28, 2023

To construct a MetricReader when setting up an SDK, the caller SHOULD provide at least the following:

  • The exporter to use, which is a MetricExporter instance.
  • The default output aggregation (optional), a function of instrument kind. If not configured, the default aggregation SHOULD be used.
  • The default output temporality (optional), a function of instrument kind. If not configured, the Cumulative temporality SHOULD be used.
  • The default aggregation cardinality limit to use, a function of instrument kind. If not configured, a default value of 2000 SHOULD be used.

This recommendation is confusing. It applies to all readers but not all readers accept these parameters.

Additionally, the default aggregation cardinality limit should be marked experimental.

This is going to require a specification issue/PR.

Given the parameters are recommendations, and are defined in RFC 2119 to mean ...

that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

There exist a valid reason to ignore these parameters (e.g. does not export, has a dependency on cumulative temporality).

I think the specification could have been better phrased, but I don't want to block this issue on fixing the phrasing in the specification. Moving on.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 28, 2023

Everything has been identified and verified to be compliant. Closing.

@MrAlias MrAlias closed this as completed Jul 28, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Go: Metric SDK (GA) Jul 28, 2023
@MrAlias MrAlias added this to the v1.17.0/v0.40.0 milestone Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package
Projects
No open projects
Development

No branches or pull requests

2 participants