-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman network create: race: error reading */conflist: ENOENT #7807
Comments
(I'm not going to bother trying to reproduce this. I'm gambling that it's trivial to find and fix the race. If it isn't, please LMK and I'll dive into it further). |
It should be possible to get the tests to use separate CNI config
directories - have a per-test directory and a global one with the main
configs. Might be worth investigating to prevent these kinds of issues
entirely
…On Mon, Sep 28, 2020, 15:24 Ed Santiago ***@***.***> wrote:
(I'm not going to bother trying to reproduce this. I'm gambling that it's
trivial to find and fix the race. If it isn't, please LMK and I'll dive
into it further).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7807 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCAOCQG3TN4HVLR5DNDSIDPG7ANCNFSM4R44NPOA>
.
|
@mheon by "should be possible" do you mean "we currently have existing functionality, e.g. a --confdir option or envariable, that the tests can use"? Or "it should be possible to add such functionality"? |
The functionality exists but isn't wired into the CLI last I checked
(config file only). Should not be difficult to add but would require code
changes.
…On Mon, Sep 28, 2020, 15:44 Ed Santiago ***@***.***> wrote:
@mheon <https://github.com/mheon> by "should be possible" do you mean "we
currently have existing functionality, e.g. a --confdir option or
envariable, that the tests can use"? Or "it should be possible to add such
functionality"?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7807 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCGLIIS4RQPKT6DN3FDSIDRSLANCNFSM4R44NPOA>
.
|
Actually we support this already: |
@edsantiago Is that what you want? |
@rhatdan I haven't looked into it as closely at it deserves. My impression is that it might help with CI flakes but still leave us exposed to real bugs: I do suspect there's a real race condition in which (perhaps):
One possibility is for |
Just try this:
As you can see both calls created the same config. This happens about 1 out of 4 tries on my pc, both rootfull and rootless. Looking at the code there are definitely some race conditions. The root cause is properly this function: Lines 29 to 46 in f48b163
For some reason it is called three times when you run podman network create . This alone is very redundant.
I believe this is also the cause of this flake: #7583 Unfortunately I don't think I have enough knowledge and time to properly fix this. |
Random idea? |
Well this is also the network interface name. I personally prefer numbered interface names in my |
We definitely need a global lock for major CNI operations. I'm hesitant to add one for container network setup due to performance concerns, but creating and deleting CNI networks needs to be locked. |
I'll add this one to my queue ... i have one in front of it and somewhat heavy (compared to other days) meeting day. |
due to a lack of "locking" on cni operations, we could get ourselves in trouble when doing rapid creation or removal of networks. added a simple file lock to deal with the collision and because it is not considered a performent path, use of the file lock should be ok. if proven otherwise in the future, some generic shared memory lock should be implemented for libpod and also used here. moved pkog/network to libpod/network because libpod is now being pulled into the package and it has therefore lost its generic nature. this will make it easier to absorb into libpod as we try to make the network closer to core operations. Fixes: containers#7807 Signed-off-by: baude <bbaude@redhat.com>
due to a lack of "locking" on cni operations, we could get ourselves in trouble when doing rapid creation or removal of networks. added a simple file lock to deal with the collision and because it is not considered a performent path, use of the file lock should be ok. if proven otherwise in the future, some generic shared memory lock should be implemented for libpod and also used here. moved pkog/network to libpod/network because libpod is now being pulled into the package and it has therefore lost its generic nature. this will make it easier to absorb into libpod as we try to make the network closer to core operations. Fixes: containers#7807 Signed-off-by: baude <bbaude@redhat.com> <MH: Fixed cherry pick conflicts> Signed-off-by: Matthew Heon <mheon@redhat.com>
There seems to be a race condition causing flakes when multiple tests run 'podman network create/rm' at once:
Probable cause:
podman inspect container two CNI networks
test running at just the right time, and running itsnetwork rm
at just the wrong moment.Log: from PR 7803
The text was updated successfully, but these errors were encountered: