Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: fix the invalid cross-device link bug during upgrade #123406

Merged
merged 1 commit into from
Feb 22, 2024

Conversation

SataQiu
Copy link
Member

@SataQiu SataQiu commented Feb 21, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

kubeadm: fix the invalid cross-device link bug during upgrade

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#3023

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kubeadm: fix a bug during kubeadm upgrade, where it is not possible to mount a new device and create a symbolic link for /etc/kubernetes (or a sub-directory) so that kubeadm stores its information on the mounted device.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 21, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 21, 2024
@SataQiu
Copy link
Member Author

SataQiu commented Feb 21, 2024

/cc @shubham-yewalekar

@k8s-ci-robot
Copy link
Contributor

@SataQiu: GitHub didn't allow me to request PR reviews from the following users: shubham-yewalekar.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @shubham-yewalekar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

err := os.Rename(oldPath, newPath)
if err != nil {
klog.V(4).Infof("failed to rename %v to %v: %v, fallback to copy file", oldPath, newPath, err)
if err := kubeadmutil.CopyFile(oldPath, newPath); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it safe to use it directly instead of a fallback? seems better to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Rename has better performance, this is useful for copying large files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog.V(4).Infof("failed to rename %v to %v: %v, fallback to copy file", oldPath, newPath, err)

users might find this message confusing.

suggesting:

klog.V(4).Infof("failed to rename %v to %v: %v, attempting an alternative method", oldPath, newPath, err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do better error handling here?

I mean we should only call this alternative function for copying the file across the devices if we see the cross device link error

also, can we put some comments to describe the cross device link error we saw and mention how this function solves that problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error checking can be done like so:
https://gist.github.com/var23rav/23ae5d0d4d830aff886c3c970b8f6c6b?permalink_comment_id=4431960#gistcomment-4431960

$ grep "os.Rename" * -rnI
app/phases/upgrade/postupgrade_test.go:215:		if err := os.Rename(from, to); err != nil {
app/phases/upgrade/postupgrade_test.go:227:		if err := os.Rename(from, to); err != nil {
app/phases/upgrade/staticpods_test.go:327:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods_test.go:338:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods_test.go:349:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods_test.go:360:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods_test.go:376:				return os.Rename(oldPath, newPath)
app/phases/upgrade/staticpods_test.go:393:				return os.Rename(oldPath, newPath)
app/phases/upgrade/staticpods_test.go:410:				return os.Rename(oldPath, newPath)
app/phases/upgrade/staticpods_test.go:422:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods_test.go:434:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods_test.go:446:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods_test.go:457:			moveFileFunc:         os.Rename,
app/phases/upgrade/staticpods.go:126:	return os.Rename(oldPath, newPath)

so we seem to not have any other locations with os.Rename outside of tests, but i think we should:

  • create a new function kubeadmutil.MoveFile()
  • use this function in staticpods.go
  • it tries to do the os.Rename if it fails with the cross link error use the alternative way CopyFile / Remove
  • add some comments why this can happen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -17,19 +17,30 @@ limitations under the License.
package util

import (
"io"
"os"
)

// CopyFile copies a file from src to dest.
func CopyFile(src, dest string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #123254

recently we reverted a copy file unti test, would these changes break it again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to make sure it works on windows.
i think all of these stdlib functions are portable to Windows, but hopefully there are no gotchas.

@shubham-yewalekar
Copy link

we should do a chmod on the destination file to give it appropriate permissions

@SataQiu
Copy link
Member Author

SataQiu commented Feb 21, 2024

we should do a chmod on the destination file to give it appropriate permissions

New file will have the same permissions as the old file:

destFile, err := os.OpenFile(dest, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sourceFileInfo.Mode())

@neolit123
Copy link
Member

/release-note-edit

kubeadm: fix a bug during kubeadm upgrade, where it is not possible to mount a new device and create a symbolic link for /etc/kubernetes (or a sub-directory) so that kubeadm stores its information on the mounted device.

@neolit123
Copy link
Member

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 21, 2024
…o mount a new device and create a symbolic link for /etc/kubernetes (or a sub-directory) so that kubeadm stores its information on the mounted device
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 22, 2024
@SataQiu
Copy link
Member Author

SataQiu commented Feb 22, 2024

/retest

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
thanks

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c57df17ba0bdd7e1bcbaa36f9b3d95268bbb260d

@k8s-ci-robot k8s-ci-robot merged commit 7865109 into kubernetes:master Feb 22, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make the temporary directory used for backing up static pod manifests during upgrade configurable
4 participants