-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate how service-account-issuer work #1766
Comments
Attempts to replicate the upgrade issue using a Workload Cluster have failed. Using as similar values as possible, I've tried the same upgrade from
This makes me think the issue may be related to some TTL expiring or similar so have created several test WCs to leave for a few days before attempting the same. EDIT: The worker nodes rolling is because I mistakenly upgraded to |
Some implementation details discovered while debugging:
Based on the above, I believe the issue lies in the kubelet. Unfortunately we don't have logs persisted from the kubelet so can't investigate until we can replicate the issue. |
The current kubelet logs available on both the |
Test case
|
Test case
|
Test case
|
We hit this issue again today with the upgrade of Root cause still unknown but I managed to rule out some possible reasons:
Some differences seen compared to tests performed on workload clusters:
which is shortly followed by:
It looks like kubelet is set up before the apiserver has had the placeholder replaced.
This is possibly caused by the ELB including non-ready instances (such as the currently initialising node) which can't respond to API requests. |
Test case
|
Test case
|
I have been unable to replicate the issue using workload clusters. Whatever is causing the issue seems to only be there on management clusters. It is also entirely possible that whatever cause the problem is now already fixed in the latest versions of cluster-aws and default-apps-aws. We wont know for sure until we next upgrade an MC. Plan for testing MC upgrades:
|
Need to investigate exactly how the
service-account-issuer
flag works in api-server when using a custom URL.Upgrades to CAPA clusters are very unstable due to pods becoming unauthorized during rolling of controlplane nodes. We need to understand why the existing service account tokens become invalid during this process even though the URL used for the issuer doesn't change.
The text was updated successfully, but these errors were encountered: