Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transition plan for upcoming breaking changes to the unstable HTTP semantic conventions #3404

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ release.
([#3388](https://github.com/open-telemetry/opentelemetry-specification/pull/3388))
- Change http.server.duration and http.client.duration units to seconds
([#3390](https://github.com/open-telemetry/opentelemetry-specification/pull/3390))
- Add transition plan for upcoming breaking changes to the unstable HTTP semantic
conventions.
([#3404](https://github.com/open-telemetry/opentelemetry-specification/pull/3404))

### Compatibility

Expand Down
17 changes: 17 additions & 0 deletions specification/metrics/semantic_conventions/http-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,23 @@ operations. By adding HTTP attributes to metric events it allows for finely tune

<!-- tocstop -->

> **Warning**
> v1.21.0 of this document will introduce significant breaking changes to the (not yet stable)
lmolkova marked this conversation as resolved.
Show resolved Hide resolved
> HTTP semantic conventions.
>
> HTTP instrumentations updating from
> [v1.20.0 of this document](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.20.0/specification/metrics/semantic_conventions/http-metrics.md)
> (or prior) to this new version SHOULD bump their major version
> even though the instrumentation has not been declared stable.
trask marked this conversation as resolved.
Show resolved Hide resolved
> They SHOULD NOT release a stable version of that new major version until
> at least August 1, 2023 (this is to give backends some time to support the
> new changes).
> And the previous major version SHOULD be supported until February 1, 2024
> (e.g. at least security patches).
>
> This major version bump will signal that users should take care when upgrading the
> instrumentation, including verifying whether their backend supports the new major version.

## HTTP Server

### Metric: `http.server.duration`
Expand Down
17 changes: 17 additions & 0 deletions specification/trace/semantic_conventions/http.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,23 @@ and various HTTP versions like 1.1, 2 and SPDY.

<!-- tocstop -->

> **Warning**
> v1.21.0 of this document will introduce significant breaking changes to the (not yet stable)
> HTTP semantic conventions.
Comment on lines +31 to +32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the breaking changes were already released in 1.20.0 and 1.19.0:

  • Rename net.app.protocol.(name|version) to net.protocol.(name|version) (#3272)
  • Replace http.flavor with net.protocol.(name|version) (#3272)
  • Remove http.status_code attribute from the http.server.active_requests metric. (#3366)
  • Rename http.user_agent to user_agent.original. (#3190)

We could extend the guidance here to cover this, at least for those libraries that did not already update in the meantime.

>
> HTTP instrumentations updating from
> [v1.20.0 of this document](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.20.0/specification/trace/semantic_conventions/http.md)
> (or prior) to this new version SHOULD bump their major version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As brought up by @Aneurysm9, @dyladan and others:
This will not work for instrumentation libraries that are at 0.x today since bumping them to 1.x would also indicate that the library itself would be considered stable now as per semver. This might not be accurate if the only change applied to the library is switching it over to a new semantic convention version. The instrumentation and its configuration might still be unstable at this point.
For 0.x libraries I think it should be expected by users that breaking changes are happening, also when there's no major version bump.

Copy link
Member

@arminru arminru Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for instrumentation libraries to separately indicate which semantic convention version they use and comply with? E.g.

my.http.instrumentation version semantic convention schema
0.17 1.20.0
0.18 1.21.0
... ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for instrumentation libraries to separately indicate which semantic convention version they use and comply with?

Just to clarify: the libraries are supposed to indicate this in the telemetry by using schema_url parameter when obtaining the Tracer/Meter/Logger. I think if they in addition indicate this in the documentation that would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to everything @arminru has said, to which I'd add that using 1.x-alpha or 1.x-RCy for an extended duration to indicate that generated telemetry had changed while the instrumentation's API remained less-than-solidified will be problematic for many dependency management systems that treat such pre-release versions as non-existent or not-latest. For instance, https://pkg.go.dev/go.opentelemetry.io/otel indicates that v1.14.0 is the latest release of the Go SDK even though v1.15.0-rc.2 has been available for over a month as we work through the metrics API stabilization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work for instrumentation libraries that are at 0.x today since bumping them to 1.x would also indicate that the library itself would be considered stable now as per semver.

This is a good point. We don't want to force instrumentations to declare themselves stable just because they want to adopt the new conventions. @trask @reyang thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tedsuo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. If we are NOT going to force a major version update then we are back to my original request: we must prevent instrumentations from adopting the new conventions until a specific date so that all parties have time to prepare. Otherwise it is too easy for users to update to the next minor version and break stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the transition plan to include two options:

  • either instrumentation can bump the major version (which will allow it to update to v1.21.0 right away, with no stable releases on the new major version until Oct 1)
  • or instrumentation should wait to update HTTP instrumentation to v1.21.0 until Oct 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to force instrumentations to declare themselves stable just because they want to adopt the new conventions. @trask @reyang thoughts?

+1. If the intention is to communicate breaking changes, I think any of the following version change is reasonable:

  • 0.1-alpha1 to 0.2-alpha1
  • 1.0-beta2 to 1.0-rc1
  • 1.0-rc1 to 1.0-rc2

These are okay, although I don't see it as a must-have:

  • 1.0-beta1 to 1.1-beta2
  • 1.0-rc1 to 2.0-alpha1

What about 0.38.0? Should it just go to 0.39.0? Would be nice if there was some way to differentiate this from any other minor version changes because even though users should expect changes to 0.x software, they are still very likely to be surprised by such an impactful change. Especially true since there is no compile-time indication of this type of break.

> even though the instrumentation has not been declared stable.
> They SHOULD NOT release a stable version of that new major version until
> at least August 1, 2023 (this is to give backends some time to support the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to push this to Nov 1. Otherwise looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reason for this? One of the things we discussed was adoption cycles for client-side-software of major version bumps. My expectation is that the major-version bump will cause an inherent delay in the "core" of otel usage moving to this new version. As such, we're actually giving backends a decent chunk of time here. Yes some early adopters may start using the latest version, but the bulk of instrumentation won't be moving for some time after this date.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking for a reason for having a specific date at all or are you questioning the need for 6 months? Let me try to address both.

Firstly, why we need a date at all.

I don't know how quickly some instrumentations will react. If they are agile enough they may cause problems in mere weeks after we merge the new conventions. IMO, it is imperative that we give well-defined time period for vendors to prepare. We can't just say it is a major version increase so it will take (an unspecified) time before users adopt it.

Secondly, why I think 3 months is not enough.

Quarterly planning/execution periods are quite typical. 3 months is just not enough notice for anything that happens on a quarterly cycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondly, why I think 3 months is not enough.

Quarterly planning/execution periods are quite typical. 3 months is just not enough notice for anything that happens on a quarterly cycle.

I think this is a valid point. Personally, I feel a combination of explicit grace period + major version bump + advance notice is already very generous. While I do have personal opinions, I suggest that we take a step back and have some framework for such kind of discussion.

For example, in Microsoft Azure folks follow 6 months semester planning cycle, @tigrannajaryan if we follow the thinking here it could leave us to "let's give 12 months". And if NASA is using 5 years cycle, would we say "let's give 10 years"? (https://www.zdnet.com/article/how-microsofts-azure-organization-makes-the-windows-sausage/ Feature priorities are decided sometimes in a six-month or a year-long boundary, he said. The biggest take-away: "It's not a tyranny of organizations anymore" when it comes to deciding on timing and feature sets.)

I would suggest the following framework to facilitate the discussion/decision here:

  • The community is not trying to optimize or wait for a single company/organization.
  • The goal is not to have a bar that is lower than every company's own bar, instead, there is a balance.
  • If certain vendor/community has difficulties, we can look at the individual case and make conscious adjustments/decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyang I explained where I get 6 months from. So far 3 months is a number that looks arbitrarily chosen to me. Show me the calculation :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's all arbitrary to the extent that some people would prefer a shorter time and some people would prefer a longer time 😅

@tigrannajaryan I'm going to bump it to Oct 1, which is 6 months after the underlying issue (#3362) for this PR was created. This will still give (a little) time for stable releases and adoption before year end.

> new changes).
> And the previous major version SHOULD be supported until February 1, 2024
> (e.g. at least security patches).
>
> This major version bump will signal that users should take care when upgrading the
> instrumentation, including verifying whether their backend supports the new major version.

## Name

HTTP spans MUST follow the overall [guidelines for span names](../api.md#span).
Expand Down