Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea v2.0 #4832

Closed
9 tasks done
tnqn opened this issue Apr 10, 2023 · 12 comments
Closed
9 tasks done

Antrea v2.0 #4832

tnqn opened this issue Apr 10, 2023 · 12 comments
Labels
proposal A concrete proposal for adding a feature

Comments

@tnqn
Copy link
Member

tnqn commented Apr 10, 2023

Describe what you are trying to solve

During the developenment Antrea v1.x, a few features have been graduated to Beta, however most of their API versions stay in Alpha, which may be misleading to users. Secondly, none of the feature gates has ever graduated to GA, although some of them are already commonly used in productions. Lastly, not all configuration options are organzied in the same way and some configurations have been deprecated for quite some time but have not been removed. There has been discussions around releasing Antrea v2.0, we could leverage this milestone to revisit the APIs, feature gates, configurations.

API

API Promotion

The following list summarizes the proposal from @antoninbas and a few feedbacks:

  • AntreaAgentInfo / AntreaControllerInfo: v1beta1 -> v1
  • Egress : v1alpha2 -> v1beta1
  • ExternalIPPool: v1alpha2 -> v1beta1
  • Traceflow : v1alpha1 -> /v1beta1
  • NetworkPolicy / ClusterNetworkPolicy: v1alpha1 -> v1beta1
  • Tier: v1alpha1 -> v1beta1
  • ClusterGroup/Group: v1alpha3 -> v1beta1

There have been some issues identified in the current APIs, which should be fixed when creating the new versions.

  • The SrcIP field in IPHeader and IPv6Header in Traceflow API is never used and redundant with the field with same name in the parent struct.
  • The IPHeader field should be a pointer like IPv6Header to make more sense for IPv6 case.
  • Make the Flags field a pointer to support explicitly setting it to 0, see Set default flag to 2 for TCP traceflow #4948 (comment)
  • The schema of AntreaAgentInfo / AntreaControllerInfo CRDs are still unstructured, using x-kubernetes-preserve-unknown-fields: true for the whole struct

There may be other APIs we could also consider promotion:

  • MulticastGroup
  • AntreaClusterNetworkPolicyStats / AntreaNetworkPolicyStats / NetworkPolicyStats

API Deprecation/Removal

It's not hard to add a new version of API and deprecate an old version, following https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#specify-multiple-versions. However, we need to remove the deprecated versions eventually, which is not as simple as adding new versions. The removal may need to be taken into consideration earlier as there could be two options around what Antrea v2.0 means to APIs: end of support for older versions, or start of support for new versions.

  1. If the former, we need to add new API versions and deprecate the stale versions at least two minor releases earlier than v2.0 (e.g. v1.12) to allow graceful migration to new API versions, and remove the deprecated API versions in v2.0.
  2. If the latter, we could just add new API versions in v2.0, and remove the deprecated API versions at least two minor releases later, e.g. v2.2.

Regardless of the options we choose, we need to consider version removal carefully, especially for users that have persisted objects of APIs declared Beta in etcd with the versions planned for removal, e.g. ClusterNetworkPolicy, Egress. Changing the storage version of an API only affects newly created and updated objects, while the unmodified objects will stay in the version when they were stored forever. To be able to remove a version, two preconditions must be met:

  1. All existing stored data has been migrated to the newer API version, otherwise the API would stop working once the old version is removed.
  2. The old version has been removed from the status.storedVersions of the CRD.

Kubernetes has documented two options to do it in https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#upgrade-existing-objects-to-a-new-stored-version. However, neither options sound very convenient: the former requires users to clone a repo, build images, build manifest, then kubectl apply -f to create migration jobs, while the latter is pure manual and requires one operation for each API. And when Antrea changes API group from *.antrea.tanzu.vmware.com to *.antrea.io in v1.0, a mirroring controller was added to mirror objects in one group to another, though that scenario is much complexer than this one. Some projects do this by providing a tool which performs the 2nd option documented in Kubernetes doc, typically via their CLIs. In summary, there could be 3 options:

Solution Pros Cons
Guide users to follow Kubernetes's two options Simply for Antrea A little hard for users
A controller running in antrea-controller no user intervention required more resource consuming, hard to control
antctl upgrade api-storage (Refer to cmctl) no impact on long running processes, run on-demand, can be a long term tool for API removal user intervention required

Personally I think the 3rd one is good as more and more projects offer installation, upgrade, manage functionalities via their CLIs. And CLI is indeed more flexible for customizing project specific logic operations, especially for upgrade.

Feature

There are some features which may be promising to next stage before or in v2.0:

  • UI: Octant -> Antrea UI
  • AntreaProxy: Beta -> GA
  • EndpointSlice: Beta -> GA
  • TopologyAwareHints: Alpha -> Beta (upstream is still Beta in 1.27, keep them consistent)
  • NodeIPAM: Alpha -> Beta (Done) -> GA
  • Multicast: Alpha -> Beta (Done)
  • AntreaPolicy: Beta -> GA? (could it be confusing if the API stage doesn't match feature gate stage?)
  • Traceflow: Beta -> GA? (as we are considering graduating its API to v1)
  • ServiceExternalIP: Alpha -> Beta?
  • IPsecCertAuth: Alpha -> Beta?
  • ExternalNode: Alpha -> Beta?
  • SupportBundleCollection: Alpha -> Beta?
  • FlowExporter: Alpha -> Beta?
  • L7NetworkPolicy: Alpha -> Beta?

Note that some features may need new configuration options as previously their feature gates were used as the enablement toggles while they are undesired to run in some cases or for some users. For example, running AntreaProxy is undesired in ExternalNode case, and for users who want to use kube-proxy ipvs mode for the backend selection algorithms. It may apply to AntreaProxy, FlowExporter.

Configuration Option

There are some configuration options that have been deprecated for quite some time. We could remove them in v2.0:

  • enableIPSecTunnel
  • nplPortRange
  • multicastInterfaces
  • multicluster.enable
  • legacyCRDMirroring

And some configuration options are specific to a feature which usually should be grouped together. We could add new option group and deprecate them before v2.0, and remove the deprecated ones in v2.0.

  • flowCollectorAddr
  • flowPollInterval
  • activeFlowExportTimeout
  • idleFlowExportTimeout

K8s Compatibility

Graduate Helm support

Describe the solution you have in mind

In summary, for Antrea v2.0, we could consider the following preparations:

Describe how your solution impacts user flows

Test plan

@tnqn tnqn added the proposal A concrete proposal for adding a feature label Apr 10, 2023
@antoninbas
Copy link
Contributor

For the version timeline: I prefer to introduce the new APIs in a v1.x release, and remove all deprecated APIs together when we release Antrea v2.0. However, that means that we won't release Antrea v2.0 for about a year (you wrote "at least two minor releases"; for any beta API, we are supposed to wait 9 months: https://github.com/antrea-io/antrea/blob/main/docs/versioning.md#apis-deprecation-policy). The rationale is that the new major version (v2) is the one that should break backward compatibility (with API removal)

For upgrade tooling, I also prefer the 3rd option (one-time antctl command run by the user to upgrade affected CRs).

@antoninbas
Copy link
Contributor

@tnqn one more idea since this is an "umbrella" issue for v2 release, and not just for API version upgrades: this could be a good opportunity to drop support for older K8s versions. At the moment, we support K8s v1.16 and that hasn't changed for a few years (even though we no longer test with such old K8s versions). We could increase the K8s version requirement to v1.19 or more. For example, at the moment we cannot use the deprecated field for CRDs (https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#version-deprecation), as it was introduced in K8s v1.19.

@tnqn
Copy link
Member Author

tnqn commented Apr 11, 2023

For the version timeline: I prefer to introduce the new APIs in a v1.x release, and remove all deprecated APIs together when we release Antrea v2.0. However, that means that we won't release Antrea v2.0 for about a year (you wrote "at least two minor releases"; for any beta API, we are supposed to wait 9 months: https://github.com/antrea-io/antrea/blob/main/docs/versioning.md#apis-deprecation-policy). The rationale is that the new major version (v2) is the one that should break backward compatibility (with API removal)

Most of the deprecated APIs (except AntreaControllerInfo and AntreaAgentInfo) are Alpha, or you mean their feature gate stage? For example, we add Egress v1beta1 and deprecate v1alpha2 in 1.12 (May), does it break our version policy if we remove v1alpha2 in 2.0 which follows 1.13 (August) and is released in Octobor or December?

For upgrade tooling, I also prefer the 3rd option (one-time antctl command run by the user to upgrade affected CRs).

Thanks for your input.

@tnqn one more idea since this is an "umbrella" issue for v2 release, and not just for API version upgrades: this could be a good opportunity to drop support for older K8s versions. At the moment, we support K8s v1.16 and that hasn't changed for a few years (even though we no longer test with such old K8s versions). We could increase the K8s version requirement to v1.19 or more. For example, at the moment we cannot use the deprecated field for CRDs (https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#version-deprecation), as it was introduced in K8s v1.19.

Great point. I will add it to the task list.

@antoninbas
Copy link
Contributor

Most of the deprecated APIs (except AntreaControllerInfo and AntreaAgentInfo) are Alpha, or you mean their feature gate stage? For example, we add Egress v1beta1 and deprecate v1alpha2 in 1.12 (May), does it break our version policy if we remove v1alpha2 in 2.0 which follows 1.13 (August) and is released in Octobor or December?

I was referring to the AntreaControllerInfo / AntreaAgentInfo CRDs, which are currently in Beta. We should theoretically wait 9 months between API deprecation and removal.

@vicky-liu
Copy link

Antonin, based on the API list summarized by Quan, Only AntreaControllerInfo / AntreaAgentInfo needs to promote from v1beta1 -> v1. If it's the only reason that we wait 9 months, could we still keep this API as v1beta1, not deprecate it. Compared with other APIs, I don't think AntreaControllerInfo / AntreaAgentInfo with top priority.

@antoninbas
Copy link
Contributor

I am fine with taking the alternate approach: introducing the new APIs in v2.0, and removing the deprecated APIs later on, when our removal policy allows it

@antoninbas
Copy link
Contributor

Let's "graduate" Helm support in v2.0 and remove the following disclaimer:

Helm installation is currently considered Alpha.

@tnqn
Copy link
Member Author

tnqn commented May 31, 2023

Let's "graduate" Helm support in v2.0 and remove the following disclaimer:

Helm installation is currently considered Alpha.

Sure, added to the proposal.

@antoninbas
Copy link
Contributor

I would like to add this proposal for consideration for Antrea v2: #5630

@luolanzone luolanzone added this to the Antrea v1.16 release milestone Jan 17, 2024
antoninbas added a commit to antoninbas/antrea that referenced this issue Feb 13, 2024
There was a disclaimer in the documentation that the Helm installation
method was still considered "Alpha".

We now consider this installation method stable, and we remove the
disclaimer for the Antrea v2.0 release.

For antrea-io#4832

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Feb 20, 2024
There was a disclaimer in the documentation that the Helm installation
method was still considered "Alpha".

We now consider this installation method stable, and we remove the
disclaimer for the Antrea v2.0 release.

For #4832

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas
Copy link
Contributor

I am wondering if v2 would be a good opportunity to stop publishing the unified image (antrea/antrea-ubuntu) to the registry (we can also delete the corresponding Dockerfiles), even though we only introduced split images in the previous release (v1.15). What do you think @tnqn @luolanzone ?

@tnqn
Copy link
Member Author

tnqn commented Apr 2, 2024

I am wondering if v2 would be a good opportunity to stop publishing the unified image (antrea/antrea-ubuntu) to the registry (we can also delete the corresponding Dockerfiles), even though we only introduced split images in the previous release (v1.15). What do you think @tnqn @luolanzone ?

Sounds good to me.

antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 2, 2024
From now on, we will only publish the new "split" images
(e.g., antrea/antrea-agent-ubuntu and antrea/antrea-controller-ubuntu).

For antrea-io#4832

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 2, 2024
From now on, we will only publish the new "split" images
(e.g., antrea/antrea-agent-ubuntu and antrea/antrea-controller-ubuntu).

For antrea-io#4832

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Apr 10, 2024
From now on, we will only publish the new "split" images
(e.g., antrea/antrea-agent-ubuntu and antrea/antrea-controller-ubuntu).

For #4832

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas
Copy link
Contributor

I think we have addressed all items for this issue, so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal A concrete proposal for adding a feature
Projects
None yet
Development

No branches or pull requests

4 participants