Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have OTel TC review metric SDK and sign-off on stable release #3674

Closed
33 tasks done
MrAlias opened this issue Feb 3, 2023 · 2 comments
Closed
33 tasks done

Have OTel TC review metric SDK and sign-off on stable release #3674

MrAlias opened this issue Feb 3, 2023 · 2 comments
Assignees
Milestone

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Feb 3, 2023

Blocked by:

Create an issue in the community repository to request a TC member review the metric SDK (similar to open-telemetry/community#1355) once internal review is done.

@MrAlias MrAlias converted this from a draft issue Feb 3, 2023
@dashpole dashpole moved this from Todo to Blocked in Go: Metric SDK (GA) Jul 24, 2023
@MrAlias MrAlias moved this from Blocked to Todo in Go: Metric SDK (GA) Aug 28, 2023
@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 31, 2023

Community issue opened: open-telemetry/community#1663

@MrAlias MrAlias moved this from Todo to In Progress in Go: Metric SDK (GA) Aug 31, 2023
@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 11, 2023

open-telemetry/community#1663 (comment)

Speaking for the TC, I have reviewed the v1.17.0/v0.40.0 OTel-Go metrics SDK.

Thank you @MrAlias, for conducting a thorough internal review. Having witnessed the output of that review, I am not surprised to find the OTel-Go metrics SDK in a healthy state of conformity. Here are some areas of feedback for consideration as the user-base grows.

Backwards/Forwards compatibility protection

I really admire the work that has been put in to protecting the interfaces from breaking changes. Users are explicitly prevented from providing implementations of interfaces that they are not supposed to, to protect the future ability of maintainers to improve the code. This sort of protection goes beyond what OTel requires, but ultimately this rigor will help OTel adoption and prevent the need for future breaking changes.

Straight-forward code is easy to follow

I have constructed at least two Go metrics SDK prototypes by now, so I am familiar with the territory. What I admire most about this code is that it remains simple and straightforward, despite there being plenty of opportunity to do complicated things to appease demanding users. It is a good thing for a community-maintained SDK to stay simple and avoid premature optimization, and if there are specific problems with performance, users will bring those to the maintainers and (because of the protections mentioned above) there will be a path forward for future optimizations.

As an example of this statement, I noticed that for each temporality, each aggregator has a method named cumulative() and delta(). These methods are nearly identical, and it might be tempting to factor the code in such a way that temporality controls were less transparent -- for example, the logic to aggregate could be the same if there were another piece of code responsible for resetting the aggregator state when temporality==delta. Of course, this is a very high-level claim, and the details in the code really matter. It could be that there are more compact or more performance ways to implement this SDK, but I'm glad to have a simple starting point.

Memory limits

As discussed in open-telemetry/opentelemetry-go#3006, the SDK contains some TODOs about what was eventually specified in open-telemetry/opentelemetry-specification#2960. I believe it is OK to call the SDK stable without addressing this issue first (as long as the maintainers agree). On the other hand, I'm familiar with cardinality explosions impacting the OTel Collector already, so this would be a nice safety feature to prioritize.

Performance concerns

I expect there will be users who are not entirely satisfied by the performance of the SDK, but I don't see it as a problem with the SDK itself. A number of memory-allocation optimizations have been applied already (e.g., support for re-use across collections, use of sync.Pool in various places). My one reservation is about the cost of constructing attribute.Set objects which actually happens in the API, not in the SDK. One of the downsides of the compatibility work mentioned above is the appearance of functional options in the instrument methods (e.g., AddOption, RecordOption, ObserveOption). While this pattern is ultimately very flexible, these are performance-sensitive calls and I would prefer to see a more direct path into the SDK.

I am hopeful that OTel will continue to evolve in ways that encourage more efficient APIs to exist, which is not to say what's here has to change. With what we have in the Go API today, the user may construct a []AddOption to precompute the input to an Add operation--however there's no way to mutate and reuse this object for a different attribute set. For users upgrading from a statsd-like API surface (where attributes are provided as a list to every operation), this forces a fairly expensive sequence of allocations into the call path. If the API didn't force this, the SDK could be optimized to avoid allocating an attribute.Set when it already has the same set indexed. This is more-or-less borrowing an idea from the Prometheus client library, which is to say that computing a hash function over attributes can allow searching for an existing attribute set without allocations (i.e., faster, less overhead). All of the options that I see for improving this situation, which includes an old idea inside OTel known as "Bound instruments", involve changing the API. When the users speak up, I expect there will be pressure to extend the API with new code paths--none of this will go against the specification, however.

One last area of concern (also on performance), is the use of sync.Mutex in places where writes are infrequent and updates are frequent. I've noticed users (of this SDK) with serious lock contention problems in such situations, and have replaced most sync.Mutex objects with other approaches.

Summary

This SDK passes review. Well done @MrAlias, @MadVikingGod, @pellared and the @open-telemetry/go-approvers! 🎉

@MrAlias MrAlias closed this as completed Sep 11, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Go: Metric SDK (GA) Sep 11, 2023
@XSAM XSAM added this to the untracked milestone Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants