-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure the seccomp pipe is being read while exporting bpf (regression in rc93) #2871
Ensure the seccomp pipe is being read while exporting bpf (regression in rc93) #2871
Conversation
readerBuffer := bytes.NewBuffer([]byte{}) | ||
errChan := make(chan error) | ||
go func() { | ||
defer close(errChan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to use defer here, you can just close the channel at the end
nit: typo in commit message: ExportBRF |
glitch in CI; restarted |
c115d5d
to
2bd0fb6
Compare
Thanks for the comments. I've pushed a change addressing them. |
16f9b68
to
ca7b78a
Compare
Test failure is a flake (#2756) which just got fixed (so I hope it's the last time I see it). CI restarted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix reading goroutine stuck (see above).
ee1fc29
to
700febc
Compare
Hi @kolyshkin . Thanks for spotting that. Suggested changed pushed. |
Last commit should be merged into the first one. Also, can you please explain changing the test case in more details (in its commit message)? |
There is a potential deadlock where the ExportBPF method call writes to a pipe but the pipe is not read until after the method call returns. ExportBPF might fill the pipe buffer, in which case it will block waiting for a read on the other side which can't happen until the method returns. Here we concurrently read from the pipe into a buffer to ensure ExportBPF will always return. Co-authored-by: Kieron Browne <kbrowne@vmware.com> Co-authored-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Kieron Browne <kbrowne@vmware.com> Signed-off-by: Danail Branekov <danailster@gmail.com>
TestPatchHugeSeccompFilterDoesNotBlock is only testing the disassembleFilter function. There is no need to invoke PatchAndLoad which has the side effect of loading a seccomp profile. Co-authored-by: Danail Branekov <danailster@gmail.com> Co-authored-by: Kieron Browne <kbrowne@vmware.com> Signed-off-by: Kieron Browne <kbrowne@vmware.com> Signed-off-by: Danail Branekov <danailster@gmail.com>
700febc
to
08b5279
Compare
Yep - done and pushed. |
This is interesting, so I suppose sometimes the pipe is filling up and sometimes not? Does that seem right? I guess another question here is why are we generating a profile at all for cri? |
And why would it only fill up sometimes if the profile is constant? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM and seems to fix the issue seen by runc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@AkihiroSuda @mrunalp @cyphar PTAL |
I think this is fairly high priority to get into a release. We are seeing reports across several repos that seem to point to the issue being fixed here. AKS is reverting to use rc92 for the timing being because this is really easy to trigger during scaling. Thankfully our kernel doesn't have CAP_PERFMON yet 😄 |
After several days of debugging I found the root cause of some of my apps not starting, which happens to be addressed in this pull request. Thanks! :) Here's a stack trace of the issue on runc 1.0.0-rc93:
|
In my case the blocked writer only attempted to write just under 8KB of data. When the soft limit on the number of pages used for pipe buffers is reached (16K pages) and the process doesn't have either |
^ This probably explains why we were only seeing the issue when the system is under load (apparently IO load) |
This change adds patches from opencontainers/runc#2871 to 1.0.8.
This change adds patches from [0] to mitigate an issue customers were seeing where pods could not be scheduled because allocated IP addresses were not being released. [0]- opencontainers/runc#2871
@deadok22 Could you tell how you obtained this stack trace for runc? |
@rishikeshdevsot I don't have this mentioned in my investigation notes, unfortunately. I must have got it from either strace output or from something reading runc's stdout/stderr. That was one deep rabbit hole :) |
Not reading from the seccomp pipe might get the ExportBRF function stuck
on writing to the seccomp pipe if its buffer is full (i.e. the writer
wrote more than 65K of data). Reading from the read end of the pipe
asynchronously prevents this from happening.
We have seen the issue in real environments when the system is under heavy load. No idea why it does not always happen, but as demonstrated by the unit test, the issue is real.
(Potentially) related issue:
#2530 (comment)
#2865
cc @cloudfoundry/cf-garden