-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add command 'antctl upgrade api-storage' in antctl #5198
Conversation
} | ||
|
||
// patchCRDStoredVersions is used to patch field status.storedVersion of a CRD. | ||
func patchCRDStoredVersions(k8sClient client.Client, crd *apiextv1.CustomResourceDefinition) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose user have to set one and only one version's storage
as true when the CRD has multiple versions, so why do we need to patch the status.storedVersion of a CRD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may a CRD which has more than one value in status.storedVersion
in this case:
- create v1alpha1 of the CRD, and field of
storage
in v1alpha is set totrue
. - create a CR of version v1alpha1,
- add v1beta1 of the CRD, and field of
storage
in v1beta1 is set totrue
andstorage
in v1beta1 is set tofalse
. - create a CR of version v1beta1
We will get a CRD whose status.storedVersion
has two values.
User may have the objects of the CRD in multiple versions. After upgrading all objects, the status.storedVersion
should only have the latest version value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will status.storedVersion be managed and updated by K8s automatically? Have you tried to update all CRs to latest version and check after upgrade later?
cc @tnqn may have more ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember K8s doesn't remove versions from it automatically, this is also the doc says.
e71e08b
to
a5644a5
Compare
a5644a5
to
9d29e39
Compare
docs/antctl.md
Outdated
antctl upgrade api-storage | ||
``` | ||
|
||
This command is used to upgrade persisted resources for Antrea CRD `antreaagentinfos.crd.antrea.io` to the latest API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
pkg/antctl/command_definition.go
Outdated
Short: "Execute a user-provided upgrade", | ||
Long: "Execute a user-provided upgrade", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not clear to understand, maybe:
Short: "Execute a user-provided upgrade", | |
Long: "Execute a user-provided upgrade", | |
Short: "Upgrade version of Kubernetes resources", | |
Long: "Upgrade version of Kubernetes resources", |
@antoninbas @tnqn any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
"Upgrade Antrea CRD resources"
"Upgrade Antrea CRD resources to the latest API version"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of that upgrade
will be a group of commands. We could upgrade not only CRD resources, maybe we could upgrade something else used by Antrea. Do we need to limit the command upgrade
to K8s resources or CRD right now?
abd6ca8
to
16fbaf1
Compare
a7cd65a
to
aa88234
Compare
Duration: time.Second, | ||
Steps: 5, | ||
}, func(err error) bool { | ||
return validateErr(err) != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validateErr
also checks if err is nil, which can never happen here.
It's wrong to retry on NotFound error.
It should just check apierrors.IsConflict(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not retry on NotFound, but we should retry on other errors, not only IsConfict. Is it right?
}, func(err error) bool { | ||
return validateErr(err) != nil | ||
}, func() error { | ||
return k8sClient.Update(context.TODO(), ©Obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it get the object once when retrying? otherwise this can never succeed since it has a stale resource version. Even if it succeeds by removing the resource version in the request, it would override the updates made after you list the objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here, every time when to update the object, get the object first by the namespace+name from the list.
expectedStorageVersion := getCRDStorageVersion(crd) | ||
// This ensures that the storage version is not changed during the upgrade. | ||
if getCRDStorageVersion(freshCRD) != expectedStorageVersion { | ||
return newUnexpectedChangeError(crd) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps unnecessary because:
- Even if you check it, it's still possible that the CRD is updated after you get the "fresh" version.
- We should leverage the resourceVersion to let apiserver fail the update request when the CRD is updated after we get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Even if you check it, it's still possible that the CRD is updated after you get the "fresh" version.
I got this and removed related code.
- We should leverage the resourceVersion to let apiserver fail the update request when the CRD is updated after we get it.
Could you give more details about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems not removed. What I mean is, we could assume the CRD is not updated during the upgrade. It could just update the StoredVersions
to the storage version in the CRD's spec. If the CRD is updated during the upgrade, the update will fail because the resourceVersion doesn't match. ResourceVersion is used for this situation, preventing one client overriding another's write accidently.
If the update request fails with Conflict error, it means the CRD might have been updated in some way by users. We could just return error and abort the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation part looks good to me.
aa88234
to
b7644d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, one comment about logs.
failed++ | ||
} | ||
} | ||
fmt.Fprintf(writer, "Successfully upgraded %d objects of CRD %q.\n", len(objList.Items)-failed, crd.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was not really about logging. The command should NOT succeed when there is anything wrong. Think about when the tool is integrated into a cluster management system: how can its caller, a program, know if it succeeds or fails. The exit code must indicate that.
I think it should just return error and abort the process when encountering the first error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got that, I have updated the PR as you suggested.
expectedStorageVersion := getCRDStorageVersion(crd) | ||
// This ensures that the storage version is not changed during the upgrade. | ||
if getCRDStorageVersion(freshCRD) != expectedStorageVersion { | ||
return newUnexpectedChangeError(crd) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems not removed. What I mean is, we could assume the CRD is not updated during the upgrade. It could just update the StoredVersions
to the storage version in the CRD's spec. If the CRD is updated during the upgrade, the update will fail because the resourceVersion doesn't match. ResourceVersion is used for this situation, preventing one client overriding another's write accidently.
If the update request fails with Conflict error, it means the CRD might have been updated in some way by users. We could just return error and abort the process.
b7644d8
to
84bded8
Compare
84bded8
to
4a66a72
Compare
19a029c
to
4beeeba
Compare
docs/antctl.md
Outdated
### Upgrade existing objects of CRDs | ||
|
||
antctl supports upgrading existing objects of Antrea CRDs to the latest version. | ||
The related sub-commands should be run out-of-cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention the required permissions. It includes the get, list, and update of each CRD and its objects.
And make sure proper error is printed when some permissions are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like multicluster sub-commands, currently, antctl upgrade
can be only used outside the cluster with kubeconfig file to access to K8s-apiserver. If so, do we still need to mention the required permissions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running outside the cluster doesn't mean the kubeconfig file has admin priviledge. Normally we should add the required permissions of this command to ClusterRole antctl as the existence of the ClusterRole implies it contains all permissions its commands require. However, the required permissions of upgrade api-storage
could make antctl a super user, while its privilege is currently quite limited, the only create permission it get is creating supportbundles.
So a conservative approach is to document clearly what permissions are required for the upgrade command, then users know what they need to do when they encounter the issue.
I don't know how antctl mc
guides users to get required permissions. But for other commands, we have added all permissions to ClusterRole antctl. For upgrage api-storage
, we need at least document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got that, I'll add the required permissions in doc and give some error prints for lacking permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no document or permissions added to ClusterRole when we add the antctl mc
, I will create an issue to track this first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
4beeeba
to
d6ea9d2
Compare
d6ea9d2
to
83628ba
Compare
Kind: crd.Spec.Names.Kind, | ||
}) | ||
if getErr = k8sClient.Get(context.TODO(), client.ObjectKeyFromObject(&obj), objToUpdate); getErr != nil { | ||
return getErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can ignore the not found error here if the object is deleted during retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we got not found error, it will not retry and return the error directly.
83628ba
to
68b6570
Compare
existing objects of Antrea CRDs to the storage version. Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
68b6570
to
6d69400
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
Add command 'antctl upgrade api-storage' in antctl to upgrade
existing objects of Antrea CRDs to the latest API version.
For #4832
TODO: