Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Envoy API versions #6271

Closed
htuch opened this issue Mar 12, 2019 · 18 comments
Closed

Stable Envoy API versions #6271

htuch opened this issue Mar 12, 2019 · 18 comments
Assignees
Labels
area/xds design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue

Comments

@htuch
Copy link
Member

htuch commented Mar 12, 2019

We've had feedback from folks (in particular gRPC-LB) that the current Breaking Change Policy is insufficient for handling long-term compatibility at the wire and generated proto level.

Specifically:

  1. Coupling API field validity and deprecation to Envoy release cycles is an implicit API state that mostly makes sense for Envoy but not for independent clients. Even Structured version and feature reporting #5405 introduces too much pain for non-Envoy clients.
  2. Upgrading to oneof and other wire compatible changes breaks Go code generation for protobuf. There are no safe mutations to existing proto fields.

The proposal here is that we switch to semantic versioning by making use of package namespaces. Any API that is not marked as alpha, e.g. v2alpha, becomes immutable modulo field addition. Deprecation would happen by moving to v3, v4, etc.

What do folks think? CC @envoyproxy/maintainers @alyssawilk

@htuch htuch self-assigned this Mar 12, 2019
@htuch htuch added the design proposal Needs design doc/proposal before implementation label Mar 12, 2019
@snowp
Copy link
Contributor

snowp commented Mar 12, 2019

One concern that springs to mind is the issues we've had with moving service between packages, since that breaks gRPC wire compatibility due to the :path header. Would this make changes to any protobufs that's referenced by a service very painful?

@htuch
Copy link
Member Author

htuch commented Mar 12, 2019

@snowp package namespace changes would imply service endpoint changes. The management server would need to support both during a transition period.

@moderation
Copy link
Contributor

Not sure if this is relevant for this particular issue but CockroachDB recently switched away from SemVer - Why we're switching to calendar versioning. Along the same lines the SemVer specification is getting an upgrade - What’s next for SemVer.

@mattklein123
Copy link
Member

What do folks think?

I'm not thrilled about this since I do think it will increase deprecation overhead/velocity quite a bit. I think I'm not fully understanding the problem that gRPC is facing. Can you potentially describe in more detail?

@htuch
Copy link
Member Author

htuch commented Mar 12, 2019

@markdroth can you provide a summary for your team?

@dfawley
Copy link
Member

dfawley commented Mar 13, 2019

API stability is an extremely important factor for anyone outside Envoy looking to adopt the xDS protocol. gRPC needs the ability to easily express to its users what version(s) of the protocol it works with. At this point, we cannot say "gRPC works with xDS v2", because features present in the initial release of v2 may have been removed and replaced -- or may be in the future -- in accordance with the breaking change policy. This means if gRPC uses the old fields, it won't work with new servers and vice-versa.

In addition to those concerns, gRPC-Go has twice in the past two months (#6118, #5432) been broken by wire-compatible changes to the proto format that break the go generated API. This is not only disruptive to our development, but it also affects our users: they will need to vendor these protos at the right version. The problem with this is, if they are developing a project that has other dependencies on the xDS API, they may not be able to find a single version that satisfies all of their dependencies (some may require newer versions and some may require older versions). Multiple copies of the same proto definitions are not permitted, and will result in init-time panics.

@mattklein123
Copy link
Member

re: the Go issues, I think that is more straightforward. We can just not do the things that break Go such as add a oneof after the fact (and anything else).

The other concern is much trickier. I understand it, but adhering to it will greatly reduce our velocity. Can we discuss this on the next community call and brainstorm possible outcomes?

@dfawley
Copy link
Member

dfawley commented Mar 13, 2019

@mattklein123 In order to maintain backward compatibility for the Go generated API, the other condition also needs to be met. Removing a field will break the generated code API (in addition to breaking wire compatibility).

I have no desire to reduce your velocity. New features can be added with a clear "experimental" tag to be exempted from these requirements. gRPC and other users requiring a stable API would not use those features until they are finalized.

Discussing this further at the next community meeting sounds fine.

@htuch
Copy link
Member Author

htuch commented Mar 13, 2019 via email

@dfawley
Copy link
Member

dfawley commented Mar 13, 2019

@htuch, yes, I plan to attend, and I have invited a few others. Thanks for sharing the link.

@alyssawilk
Copy link
Contributor

alyssawilk commented Mar 26, 2019

Resharing Doug's doc which was presented at the community call via my chromium account (so folks can read and comment)
https://docs.google.com/document/d/1nAomWSjvSSFCNZ1wQJ7D01f5VGDGLn5zcNxcDozYqYI/edit?ts=5c9a516b#heading=h.hv6zo3sm8vz7

@stale
Copy link

stale bot commented Apr 25, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 25, 2019
@htuch htuch removed the stale stalebot believes this issue/PR has not been touched recently label Apr 25, 2019
@htuch
Copy link
Member Author

htuch commented Apr 29, 2019

We now have a design document to share:

@stale
Copy link

stale bot commented May 29, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label May 29, 2019
@htuch htuch added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels May 30, 2019
@htuch htuch changed the title Semantic versioning and stricter API versioning constraints Stable Envoy API versions Jun 7, 2019
@htuch
Copy link
Member Author

htuch commented Jun 7, 2019

It looks like we have general buy-in on the proposal, so the following calendar is what I think I can commit to:

Before end of Q2:

  • TDS (runTime Discovery Service (TDS) #6708)
  • Unknown field support for dynamic config, stats tracking, etc. (Log and count unknown proto fields #6818 and bootstrap and xDS should have separate support for allow unknown fields #6651).
  • Add presubmit check forcing sign-off from @envoyproxy/api-shepherds for any changes to api/. @envoyproxy/udpa-wg will be tagged on all such PRs for optional review.
  • Manual inspection of all API changes to avoid any breaking changes at either wire or language stub level.
  • Minor/patch version proto annotations added to v2 API.
  • Semantic version reporting in xDS for core APIs.
  • Introduce v3alpha for some relatively isolated example package. Write a translation module in Envoy implementation, providing the pattern for the "translate old versions of API to new on API ingest".
  • Node enhancements for user agent and client feature reporting.

Before end of Q3:

  • Node enhancements for extension version reporting, corresponding filter interface changes to support this.
  • TBD tool enforcement of minor/patch version updates when material proto changes occur.
  • TBD tool to enforce ODR and ensure transitive upgrades work as expected.
  • TBD tool to enforce package dependency constraints, e.g. verify the property that stable APIs do not depend on experimental APIs, that the latest stable version of any API only depends on the latest stable versions (including transitively) of other APIs. Also, to ensure we have well formed package namespace.
  • TBD tooling to support reducing the manual stub/code toil when moving from vN to v(N+1) where most fields are the same.
  • Finalize incremental xDS APIs (config: partial rejections in incremental xDS #5739).
  • protolock presubmit (build: evaluate using protolock in API build #3368, build: evaluate using protolock in API build #3368)
  • Extend protolock to handle PGV annotations.

At the end of Q3, we will start the annual release calendar for new API major versions, and the APIs that were in v3alpha will be frozen as v3, v4alpha will begin, etc. Support for v2 APIs that have newer major versions will be removed at the end of 2020 Q3.

Field will continue to be deprecated via annotation according to https://github.com/envoyproxy/envoy/blob/master//CONTRIBUTING.md#breaking-change-policy. However, we will not remove any further fields from the v2 fields from now on. Deprecated fields will become disabled by default following the existing schedule, this may be overriden by TDS. The goal of deprecation across major versions will be to allow advanced notion of significant structural changes to the API when moving from vN to v(N+1), but no fields will be removed in vN.

@mattklein123 @alyssawilk @dfawley @envoyproxy/api-shepherds @envoyproxy/maintainers (and anyone else who cares about API stability!) does this sound like a reasonable plan-of-record?

@mattklein123
Copy link
Member

@htuch +1 looks great to me. Thanks for driving.

alyssawilk added a commit that referenced this issue Jul 11, 2019
Functionally this

removes the script which filed bugs to remove old config (per plan of record in #6271)
adds a script which files bugs to remove old code (cleanup of runtime guarded features)
updates our release process
In practice, this just hot swaps what our deprecation script does and updates the docs.

Risk Level: n/a (tooling + comment)
Testing: manual testing. Will do one more manual test with actual bug filing just before submitting.
Docs Changes: updated
Release Notes: n/a
Fixes #6472

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@htuch
Copy link
Member Author

htuch commented Aug 29, 2019

See #8085 for the v3alpha and v3 release plans. The burndown is at https://github.com/envoyproxy/envoy/issues?q=is%3Aopen+is%3Aissue+label%3A%22v3+API%22+milestone%3A1.12.0

@htuch htuch mentioned this issue Aug 29, 2019
5 tasks
htuch added a commit to htuch/envoy that referenced this issue Oct 2, 2019
* [#not-implemented-warn:] was barely used and its purposes are better
  served by [#not-implemented-hide:].

* [#proto-status:] was there for an earlier style of versioning, where
  APIs were "frozen" or "draft", etc. Now we have semantic versioning
  and a regular API clock as per envoyproxy#6271.

Part of envoyproxy#8371.

Risk level: Low (docs only).
Testing: Docs rebuild.

Signed-off-by: Harvey Tuch <htuch@google.com>
htuch added a commit that referenced this issue Oct 3, 2019
* [#not-implemented-warn:] was barely used and its purposes are better
  served by [#not-implemented-hide:].

* [#proto-status:] was there for an earlier style of versioning, where
  APIs were "frozen" or "draft", etc. Now we have semantic versioning
  and a regular API clock as per #6271.

Part of #8371.

Risk level: Low (docs only).
Testing: Docs rebuild.

Signed-off-by: Harvey Tuch <htuch@google.com>
danzh2010 pushed a commit to danzh2010/envoy that referenced this issue Oct 4, 2019
* [#not-implemented-warn:] was barely used and its purposes are better
  served by [#not-implemented-hide:].

* [#proto-status:] was there for an earlier style of versioning, where
  APIs were "frozen" or "draft", etc. Now we have semantic versioning
  and a regular API clock as per envoyproxy#6271.

Part of envoyproxy#8371.

Risk level: Low (docs only).
Testing: Docs rebuild.

Signed-off-by: Harvey Tuch <htuch@google.com>
nandu-vinodan pushed a commit to nandu-vinodan/envoy that referenced this issue Oct 17, 2019
* [#not-implemented-warn:] was barely used and its purposes are better
  served by [#not-implemented-hide:].

* [#proto-status:] was there for an earlier style of versioning, where
  APIs were "frozen" or "draft", etc. Now we have semantic versioning
  and a regular API clock as per envoyproxy#6271.

Part of envoyproxy#8371.

Risk level: Low (docs only).
Testing: Docs rebuild.

Signed-off-by: Harvey Tuch <htuch@google.com>
@htuch
Copy link
Member Author

htuch commented Mar 24, 2020

I'm going to consider this issue resolved via the introduction of v3, additional improvements, e.g. minor versioning, can be tracked via independent issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/xds design proposal Needs design doc/proposal before implementation no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

6 participants