Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workaround for "runc list" returning "no such file or directory" #17977

Closed
wants to merge 9 commits into from

Conversation

prezha
Copy link
Contributor

@prezha prezha commented Jan 17, 2024

fixes #17976

details and examples are in the issue #17976

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 17, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: prezha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2024
@prezha
Copy link
Contributor Author

prezha commented Jan 17, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 17, 2024
@minikube-pr-bot

This comment has been minimized.

@prezha
Copy link
Contributor Author

prezha commented Jan 17, 2024

/retest-this-please

@minikube-pr-bot

This comment has been minimized.

@prezha prezha marked this pull request as draft January 17, 2024 12:40
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 17, 2024
@minikube-pr-bot

This comment has been minimized.

@prezha prezha force-pushed the fixRuncList branch 2 times, most recently from 74e9239 to 37eb5c8 Compare January 17, 2024 23:23
@prezha prezha removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 17, 2024
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@prezha prezha marked this pull request as ready for review January 18, 2024 07:16
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@prezha
Copy link
Contributor Author

prezha commented Mar 7, 2024

/ok-to-test

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@medyagh
Copy link
Member

medyagh commented Mar 20, 2024

once this is merged opencontainers/runc#3349 we can remove this workaround...

@prezha is there a timeline that we know when they are gonna include that in the release?

rr, err = cr.RunCmd(exec.Command("sudo", args...))
if err != nil {

// avoid "no such file or directory" runc list error by retrying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally not a big fan of retyring till it works...I rather waiting for a condition to be met,
if runc is gonna include that in their release soon opencontainers/runc#3349
I rather wait for that

Copy link
Contributor Author

@prezha prezha Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a less likely, but still occasional scenario where the race condition happens, runc currently exits with an error and we don't have a specific condition to wait for, so we retry and the odds to hit race condition again are minor (but we could repeat a couple of times, if that's needed)

i agree it would be better if this would be handled internally by runc, which that pr aims to address, so i asked author & approver if there are plans to release it (given that it was merged to the main 2 years ago)

@prezha
Copy link
Contributor Author

prezha commented Mar 26, 2024

once this is merged opencontainers/runc#3349 we can remove this workaround...

@prezha is there a timeline that we know when they are gonna include that in the release?

hey @medyagh i've asked runc folks, let's see what they say

@medyagh
Copy link
Member

medyagh commented Apr 3, 2024

once this is merged opencontainers/runc#3349 we can remove this workaround...
@prezha is there a timeline that we know when they are gonna include that in the release?

hey @medyagh i've asked runc folks, let's see what they say

sounds good ! let check back in a week if i we there is a new runc version if not we could merge this workarround

@prezha
Copy link
Contributor Author

prezha commented Apr 28, 2024

once this is merged opencontainers/runc#3349 we can remove this workaround...
@prezha is there a timeline that we know when they are gonna include that in the release?

hey @medyagh i've asked runc folks, let's see what they say

sounds good ! let check back in a week if i we there is a new runc version if not we could merge this workarround

@medyagh based on the reply we got from upstream runc maintainers, looks like the fix might be back-ported at some point in the future - do we want to merge this workaround to reduce out integration tests flakes in the meantime?

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17977) |
+----------------+----------+---------------------+
| minikube start | 51.0s    | 50.8s               |
| enable ingress | 26.7s    | 26.0s               |
+----------------+----------+---------------------+

Times for minikube start: 50.7s 52.3s 51.0s 51.5s 49.6s
Times for minikube (PR 17977) start: 52.1s 48.4s 49.4s 51.0s 53.0s

Times for minikube ingress: 27.5s 28.0s 23.9s 28.0s 26.1s
Times for minikube (PR 17977) ingress: 25.0s 24.5s 23.5s 29.0s 28.0s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17977) |
+----------------+----------+---------------------+
| minikube start | 22.1s    | 21.8s               |
| enable ingress | 22.5s    | 22.2s               |
+----------------+----------+---------------------+

Times for minikube start: 24.6s 20.9s 23.6s 20.5s 21.0s
Times for minikube (PR 17977) start: 21.0s 23.6s 21.4s 22.2s 20.7s

Times for minikube ingress: 21.7s 21.7s 25.8s 21.8s 21.3s
Times for minikube (PR 17977) ingress: 21.8s 20.8s 22.8s 23.3s 22.3s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17977) |
+----------------+----------+---------------------+
| minikube start | 20.9s    | 21.3s               |
| enable ingress | 35.6s    | 32.4s               |
+----------------+----------+---------------------+

Times for minikube ingress: 33.8s 48.3s 32.2s 31.3s 32.3s
Times for minikube (PR 17977) ingress: 31.3s 19.8s 48.2s 30.8s 31.8s

Times for minikube start: 19.8s 20.0s 19.8s 22.6s 22.5s
Times for minikube (PR 17977) start: 22.8s 20.5s 20.1s 20.9s 22.2s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux_crio TestStartStop/group/default-k8s-diff-port/serial/SecondStart (gopogh) 1.64 (chart)
Docker_Linux_containerd_arm64 TestAddons/StoppedEnableDisable (gopogh) 2.03 (chart)
Docker_Linux_containerd_arm64 TestAddons/parallel/InspektorGadget (gopogh) 3.38 (chart)
Hyper-V_Windows TestForceSystemdFlag (gopogh) 6.12 (chart)
Hyper-V_Windows TestRunningBinaryUpgrade (gopogh) 7.53 (chart)
Hyper-V_Windows TestMultiControlPlane/serial/HAppyAfterClusterStart (gopogh) 9.26 (chart)
Hyper-V_Windows TestFunctional/parallel/TunnelCmd/serial/RunSecondTunnel (gopogh) 9.90 (chart)
Hyper-V_Windows TestFunctional/parallel/TunnelCmd/serial/WaitService/Setup (gopogh) 9.90 (chart)
Hyper-V_Windows TestFunctional/parallel/DockerEnv/powershell (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageBuild (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageListJson (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageListShort (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageListTable (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageListYaml (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/MySQL (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/NodeLabels (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/PersistentVolumeClaim (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ServiceCmdConnect (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ServiceCmd/DeployApp (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ServiceCmd/JSONOutput (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ServiceCmd/List (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/StatusCmd (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/serial/InvalidService (gopogh) 10.78 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageLoadDaemon (gopogh) 13.73 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageLoadFromFile (gopogh) 13.73 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageReloadDaemon (gopogh) 13.73 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageSaveToFile (gopogh) 13.73 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageTagAndLoadDaemon (gopogh) 13.73 (chart)
Hyper-V_Windows TestMultiControlPlane/serial/StartCluster (gopogh) 13.76 (chart)
KVM_Linux_crio TestAddons/Setup (gopogh) 14.29 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

@prezha
Copy link
Contributor Author

prezha commented Jun 27, 2024

good news: runc v1.1.13 went out two weeks ago and it included the proper fix for this problem
we've also upgraded our kicbase & iso to update runc from v1.1.12 to v1.1.13, so we will not need this workaround (will be included in the next release), so i'm going to close this pr

@prezha prezha closed this Jun 27, 2024
@spowelljr
Copy link
Member

Thanks for the investigation @prezha!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runc sometimes fails with "no such file or directory"
5 participants