Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the seccomp pipe is being read while exporting bpf (regression in rc93) #2871

Merged
merged 2 commits into from
Apr 1, 2021

Conversation

danail-branekov
Copy link
Contributor

Not reading from the seccomp pipe might get the ExportBRF function stuck
on writing to the seccomp pipe if its buffer is full (i.e. the writer
wrote more than 65K of data). Reading from the read end of the pipe
asynchronously prevents this from happening.

We have seen the issue in real environments when the system is under heavy load. No idea why it does not always happen, but as demonstrated by the unit test, the issue is real.

(Potentially) related issue:
#2530 (comment)
#2865

cc @cloudfoundry/cf-garden

readerBuffer := bytes.NewBuffer([]byte{})
errChan := make(chan error)
go func() {
defer close(errChan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to use defer here, you can just close the channel at the end

@kolyshkin
Copy link
Contributor

nit: typo in commit message: ExportBRF

@kolyshkin
Copy link
Contributor

glitch in CI; restarted

@kolyshkin kolyshkin requested a review from cyphar March 24, 2021 21:16
libcontainer/seccomp/patchbpf/enosys_linux.go Outdated Show resolved Hide resolved
@kolyshkin kolyshkin added this to the 1.0.0-rc94 milestone Mar 24, 2021
@kieron-dev kieron-dev force-pushed the pr-seccomp-pipe-hang branch from c115d5d to 2bd0fb6 Compare March 25, 2021 10:53
@kieron-dev
Copy link
Contributor

Thanks for the comments. I've pushed a change addressing them.

@cyphar cyphar requested a review from kolyshkin March 25, 2021 12:23
@kieron-dev kieron-dev force-pushed the pr-seccomp-pipe-hang branch from 16f9b68 to ca7b78a Compare March 25, 2021 14:08
@kolyshkin
Copy link
Contributor

CI failure in Fedora is a known flake (#2805)
CI failure in CentOS is a relatively new flake (#2861)

Both unrelated; CI restarted.

@kolyshkin
Copy link
Contributor

Test failure is a flake (#2756) which just got fixed (so I hope it's the last time I see it). CI restarted.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix reading goroutine stuck (see above).

@kieron-dev kieron-dev force-pushed the pr-seccomp-pipe-hang branch 3 times, most recently from ee1fc29 to 700febc Compare March 27, 2021 11:21
@kieron-dev
Copy link
Contributor

Need to fix reading goroutine stuck (see above).

Hi @kolyshkin . Thanks for spotting that. Suggested changed pushed.

@kolyshkin
Copy link
Contributor

Last commit should be merged into the first one.

Also, can you please explain changing the test case in more details (in its commit message)?

danail-branekov and others added 2 commits March 30, 2021 12:29
There is a potential deadlock where the ExportBPF method call writes to
a pipe but the pipe is not read until after the method call returns.
ExportBPF might fill the pipe buffer, in which case it will block
waiting for a read on the other side which can't happen until the method
returns.

Here we concurrently read from the pipe into a buffer to ensure
ExportBPF will always return.

Co-authored-by: Kieron Browne <kbrowne@vmware.com>
Co-authored-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Kieron Browne <kbrowne@vmware.com>
Signed-off-by: Danail Branekov <danailster@gmail.com>
TestPatchHugeSeccompFilterDoesNotBlock is only testing the
disassembleFilter function. There is no need to invoke PatchAndLoad
which has the side effect of loading a seccomp profile.

Co-authored-by: Danail Branekov <danailster@gmail.com>
Co-authored-by: Kieron Browne <kbrowne@vmware.com>
Signed-off-by: Kieron Browne <kbrowne@vmware.com>
Signed-off-by: Danail Branekov <danailster@gmail.com>
@kieron-dev
Copy link
Contributor

Last commit should be merged into the first one.

Also, can you please explain changing the test case in more details (in its commit message)?

Yep - done and pushed.

@cpuguy83
Copy link
Contributor

This is interesting, so I suppose sometimes the pipe is filling up and sometimes not? Does that seem right? I guess another question here is why are we generating a profile at all for cri?

@cpuguy83
Copy link
Contributor

And why would it only fill up sometimes if the profile is constant?

Copy link
Contributor

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM and seems to fix the issue seen by runc.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin
Copy link
Contributor

@AkihiroSuda @mrunalp @cyphar PTAL

@cpuguy83
Copy link
Contributor

cpuguy83 commented Apr 1, 2021

I think this is fairly high priority to get into a release. We are seeing reports across several repos that seem to point to the issue being fixed here.

AKS is reverting to use rc92 for the timing being because this is really easy to trigger during scaling. Thankfully our kernel doesn't have CAP_PERFMON yet 😄

@deadok22
Copy link

deadok22 commented Apr 9, 2021

After several days of debugging I found the root cause of some of my apps not starting, which happens to be addressed in this pull request. Thanks! :)

Here's a stack trace of the issue on runc 1.0.0-rc93:

SIGQUIT: quitsyscall, 39 minutes, locked to thread]:
runtime.cgocall(, 0xc000148d00, 0x0)
        runtime/cgocall.go +0x5d0xc000148c980x560223dd5a2d, 0x4f sp= pc=(0x6, 0x0gh.neting.cc/opencontainers/runc/vendor/github.com/seccomp/libseccomp-golang/seccomp.go930 +0x69 fp=0xc000148d40 sp= pc=0x56022421a4b9
github.com/opencontainers/runc/vendor/github.com/seccomp/libseccomp-golang.(*ScmpFilter).ExportBPF(, 0xc0000102000x0)
        github.com/opencontainers/runc/vendor/github.com/seccomp/libseccomp-golang/seccomp.go930 +0xb5 fp=0xc000148da8 sp= pc=0x560224218565
(0x0, 0x00x00x0)
        : +0xc000148e90 pc=github.com/opencontainers/runc/libcontainer/seccomp/patchbpf.enosysPatchFilter(0xc000065480, 0xc00000e1600x00xc0001b0c400x0)
        :531 +0x42 fp=0xc000148f600xc000148e90 pc=0x560224224002gh.neting.cc/opencontainers/runc/libcontainer/seccomp/patchbpf.PatchAndLoad0xc000065480, 0xc00000e160, 0x0, 0x0)
        :603 + fp=0xc000148fc8 sp=0xc000148f60 pc=0x5602242247ab
github.com/opencontainers/runc/libcontainer/seccomp.InitSeccomp(0xc000065480, , 0x0)
        :77 + fp=0xc000149048 sp= pc=0x5602242251e6
github.com/opencontainers/runc/libcontainer.(*linuxStandardInit).Init(0xc00011ae70, 0x00x0)
         + fp=0xc000149300 pc=
github.com/opencontainers/runc/libcontainer.(*LinuxFactory).StartInitialization0x0, 0x0)
        github.com/opencontainers/runc/libcontainer/factory_linux.go:396 +0x392 fp=0xc0001494e8 sp=0xc000149300 pc=0x56022423bb42, 0x0 +0x4d fp=0xc000149540 sp=0xc0001494e8 pc=0x5602242be31d(0xc0000acb00, 0xc0000acb00, 0x0)
        github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:5230xc0 fp= pc=0x5602242796c00x5602242c7e270x40x00x00x0, 0x00x5602242e7b52, , 0x0, ...       github.com/opencontainers/runc/vendor/github.com/urfave/cli/command.go: +0x51e fp=0xc000149890 sp=0xc000149568 pc=0x56022427a42e
github.com/opencontainers/runc/vendor/github.com/urfave/cli.(*App).Run(0xc00009e8c0, 0xc00000e080, 0x2, 0x2, 0x00x0)
        github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:276 +0x71a fp=0xc000149890 pc=0x56022427776a
main.main()
        github.com/opencontainers/runc/main.go160 +0xbba sp=0xc000149bd80x5602242b3b3aruntime.mainruntime/proc.go0x20e fp=0xc000149fe0 sp=0xc000149f60 pc=0x560223e026ae
()
         +0x1 fp=0xc000149fe8 sp= pc=0x560223e2e791

goroutine 7 [syscall, 39 minutes]:
(0x0)
runtime/sigqueue.go
os/signal.loop(: +
os/signal/signal_unix.go:0x43
0x1000rbx    0x56022516bab00x7f39a247e6fd
0x6
0x560225215de0
rbp    0x0
rsp    0x7fffe3ae2350
r9     
r10    0x7fffe3ae1760
r11    0x293
r12    
r13    
r14    
r15    0x0
0x7f39a247e6fd

fs     0x00x0

@deadok22
Copy link

deadok22 commented Apr 14, 2021

if its buffer is full (i.e. the writer wrote more than 65K of data)

In my case the blocked writer only attempted to write just under 8KB of data. When the soft limit on the number of pages used for pipe buffers is reached (16K pages) and the process doesn't have either CAP_SYS_RESOURCE or CAP_SYS_ADMIN, the kernel uses the default pipe buffer size of just 1 page - see this code

@danail-branekov
Copy link
Contributor Author

^ This probably explains why we were only seeing the issue when the system is under load (apparently IO load)

zmrow added a commit to zmrow/bottlerocket that referenced this pull request Apr 23, 2021
This change adds patches from
opencontainers/runc#2871 to 1.0.8.
zmrow added a commit to zmrow/bottlerocket that referenced this pull request Apr 30, 2021
This change adds patches from [0] to mitigate an issue customers were
seeing where pods could not be scheduled because allocated IP addresses
were not being released.
[0]- opencontainers/runc#2871
@kolyshkin kolyshkin changed the title Ensure the seccomp pipe is being read while exporting bpf Ensure the seccomp pipe is being read while exporting bpf (regression in rc93) May 3, 2021
@cyphar cyphar mentioned this pull request May 30, 2021
@rishikeshdevsot
Copy link

After several days of debugging I found the root cause of some of my apps not starting, which happens to be addressed in this pull request. Thanks! :)

Here's a stack trace of the issue on runc 1.0.0-rc93:

SIGQUIT: quitsyscall, 39 minutes, locked to thread]:
runtime.cgocall(, 0xc000148d00, 0x0)
        runtime/cgocall.go +0x5d0xc000148c980x560223dd5a2d, 0x4f sp= pc=(0x6, 0x0gh.neting.cc/opencontainers/runc/vendor/github.com/seccomp/libseccomp-golang/seccomp.go930 +0x69 fp=0xc000148d40 sp= pc=0x56022421a4b9
github.com/opencontainers/runc/vendor/github.com/seccomp/libseccomp-golang.(*ScmpFilter).ExportBPF(, 0xc0000102000x0)
        github.com/opencontainers/runc/vendor/github.com/seccomp/libseccomp-golang/seccomp.go930 +0xb5 fp=0xc000148da8 sp= pc=0x560224218565
(0x0, 0x00x00x0)
        : +0xc000148e90 pc=github.com/opencontainers/runc/libcontainer/seccomp/patchbpf.enosysPatchFilter(0xc000065480, 0xc00000e1600x00xc0001b0c400x0)
        :531 +0x42 fp=0xc000148f600xc000148e90 pc=0x560224224002gh.neting.cc/opencontainers/runc/libcontainer/seccomp/patchbpf.PatchAndLoad0xc000065480, 0xc00000e160, 0x0, 0x0)
        :603 + fp=0xc000148fc8 sp=0xc000148f60 pc=0x5602242247ab
github.com/opencontainers/runc/libcontainer/seccomp.InitSeccomp(0xc000065480, , 0x0)
        :77 + fp=0xc000149048 sp= pc=0x5602242251e6
github.com/opencontainers/runc/libcontainer.(*linuxStandardInit).Init(0xc00011ae70, 0x00x0)
         + fp=0xc000149300 pc=
github.com/opencontainers/runc/libcontainer.(*LinuxFactory).StartInitialization0x0, 0x0)
        github.com/opencontainers/runc/libcontainer/factory_linux.go:396 +0x392 fp=0xc0001494e8 sp=0xc000149300 pc=0x56022423bb42, 0x0 +0x4d fp=0xc000149540 sp=0xc0001494e8 pc=0x5602242be31d(0xc0000acb00, 0xc0000acb00, 0x0)
        github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:5230xc0 fp= pc=0x5602242796c00x5602242c7e270x40x00x00x0, 0x00x5602242e7b52, , 0x0, ...       github.com/opencontainers/runc/vendor/github.com/urfave/cli/command.go: +0x51e fp=0xc000149890 sp=0xc000149568 pc=0x56022427a42e
github.com/opencontainers/runc/vendor/github.com/urfave/cli.(*App).Run(0xc00009e8c0, 0xc00000e080, 0x2, 0x2, 0x00x0)
        github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:276 +0x71a fp=0xc000149890 pc=0x56022427776a
main.main()
        github.com/opencontainers/runc/main.go160 +0xbba sp=0xc000149bd80x5602242b3b3aruntime.mainruntime/proc.go0x20e fp=0xc000149fe0 sp=0xc000149f60 pc=0x560223e026ae
()
         +0x1 fp=0xc000149fe8 sp= pc=0x560223e2e791

goroutine 7 [syscall, 39 minutes]:
(0x0)
runtime/sigqueue.go
os/signal.loop(: +
os/signal/signal_unix.go:0x43
0x1000rbx    0x56022516bab00x7f39a247e6fd
0x6
0x560225215de0
rbp    0x0
rsp    0x7fffe3ae2350
r9     
r10    0x7fffe3ae1760
r11    0x293
r12    
r13    
r14    
r15    0x0
0x7f39a247e6fd

fs     0x00x0

@deadok22 Could you tell how you obtained this stack trace for runc?

@deadok22
Copy link

@rishikeshdevsot I don't have this mentioned in my investigation notes, unfortunately. I must have got it from either strace output or from something reading runc's stdout/stderr. That was one deep rabbit hole :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants