Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low entropy causes VM agent to hang indefinitely during start #325

Closed
sipsma opened this issue Nov 7, 2019 · 8 comments
Closed

Low entropy causes VM agent to hang indefinitely during start #325

sipsma opened this issue Nov 7, 2019 · 8 comments
Labels
kind/bug Something isn't working

Comments

@sipsma
Copy link
Contributor

sipsma commented Nov 7, 2019

It appears for unknown reasons a Firecracker VM may not have sufficient entropy during boot, which can cause our VM agent to experience long pauses very early during process start (possibly as the go runtime itself is still starting, before any of our code is actually executing). This results timeouts when the runtime shim attempts to connect to the agent over vsock, resulting in errors like

failed to create VM: failed to dial the VM over vsock: context deadline exceeded

Because the pause happens so early when agent is starting, it never writes any output to stdout/stderr, so the debug logs show nothing from agent.

Running low on entropy during boot is an open issue w/ Firecracker. There are some suggested fixes there (which I have not yet tried), but it doesn't appear to have a strong conclusion at this time. I just saw the rngd suggestion in the thread, I will give that a try and update this issue with the results.

I don't know why this suddenly started happening to me, it's on the dev machine I normally use (an i3.metal with very similar setup to the CI machines). It may just be entirely random when it starts occurring.

We need to follow that Firecracker thread on solutions to this issue and decide how we can possibly address this for Firecracker-containerd users.

@sipsma
Copy link
Contributor Author

sipsma commented Nov 7, 2019

For more background on how I encountered/debugged this:
When testing out a (unrelated) change to agent, I suddenly started getting those failed to dial the VM over vsock: context deadline exceeded errors. I finally resorted to just changing the systemd configuration to run strace on agent, which showed the normal output you see but then a sudden pause after these lines:

time="2019-11-07T23:04:02.359831030Z" level=debug msg="[    1.693074] strace[733]: [pid   737] getrandom( <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.362735232Z" level=debug msg="[    1.695840] strace[733]: [pid   738] <... nanosleep resumed> NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.365138192Z" level=debug msg="[    1.698779] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.367418352Z" level=debug msg="[    1.700972] strace[733]: [pid   738] futex(0xc000054848, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.369251535Z" level=debug msg="[    1.703241] strace[733]: [pid   739] <... futex resumed> )       = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.371064943Z" level=debug msg="[    1.705083] strace[733]: [pid   738] <... futex resumed> )       = 1" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.373957783Z" level=debug msg="[    1.706869] strace[733]: [pid   739] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f2f02667000" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.375845018Z" level=debug msg="[    1.709807] strace[733]: [pid   739] munmap(0x7f2f02667000, 26841088) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.377673611Z" level=debug msg="[    1.711678] strace[733]: [pid   739] munmap(0x7f2f08000000, 40267776) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.379871612Z" level=debug msg="[    1.713512] strace[733]: [pid   739] mprotect(0x7f2f04000000, 135168, PROT_READ|PROT_WRITE) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.382076021Z" level=debug msg="[    1.715670] strace[733]: [pid   739] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1],  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.384396190Z" level=debug msg="[    1.717906] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.387078423Z" level=debug msg="[    1.720426] strace[733]: [pid   739] <... rt_sigprocmask resumed> [], 8) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.390964875Z" level=debug msg="[    1.723044] strace[733]: [pid   739] mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f2f09e66000" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.393641795Z" level=debug msg="[    1.726906] strace[733]: [pid   739] mprotect(0x7f2f09e66000, 4096, PROT_NONE) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.395548983Z" level=debug msg="[    1.729592] strace[733]: [pid   739] clone( <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.397404064Z" level=debug msg="[    1.731356] strace[733]: [pid   738] <... nanosleep resumed> NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.399550842Z" level=debug msg="[    1.733231] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.405128006Z" level=debug msg="[    1.735347] strace[733]: [pid   739] <... clone resumed> child_stack=0x7f2f0a665ff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2f0a6669d0, tls=0x7f2f0a666700, child_tidptr=0x7f2f0a6669d0) = 741" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.408521880Z" level=debug msg="[    1.740947] strace[733]: [pid   739] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.412838384Z" level=debug msg="[    1.744728] strace[733]: [pid   739] futex(0x16fa4e0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=59, tv_nsec=978906525}/usr/bin/strace: Process 741 attached" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.414317902Z" level=debug msg="[    1.748716] strace[733]:  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.416431754Z" level=debug msg="[    1.750259] strace[733]: [pid   741] set_robust_list(0x7f2f0a6669e0, 24) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.418817976Z" level=debug msg="[    1.752349] strace[733]: [pid   738] <... nanosleep resumed> NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.421143341Z" level=debug msg="[    1.754692] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.423556095Z" level=debug msg="[    1.757015] strace[733]: [pid   741] sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.426992987Z" level=debug msg="[    1.759408] strace[733]: [pid   741] sigaltstack({ss_sp=0xc0000f6000, ss_flags=0, ss_size=32768}, NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.430077359Z" level=debug msg="[    1.763078] strace[733]: [pid   741] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.432884826Z" level=debug msg="[    1.766086] strace[733]: [pid   741] gettid()                    = 741" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.435517808Z" level=debug msg="[    1.768877] strace[733]: [pid   738] <... nanosleep resumed> NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.439090497Z" level=debug msg="[    1.771624] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.442880313Z" level=debug msg="[    1.775158] strace[733]: [pid   741] futex(0xc000055d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:04:02.445717462Z" level=debug msg="[    1.778782] strace[733]: [pid   738] <... nanosleep resumed> NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout

I looked into whether that getrandom call could be blocking, which led me to this golang thread on the runtime's behavior when entropy is low, which in turn led me to try increasing the vsock timeout to greater than 60 seconds, which resulted in me finally being able to see this message in the strace output after agent was blocked for 60 seconds:

time="2019-11-07T23:05:01.540010568Z" level=debug msg="[   60.865694] strace[733]: [pid   738] futex(0x16f6610, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.541977436Z" level=debug msg="[   60.872239] strace[733]: [pid   739] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.543568465Z" level=debug msg="[   60.872629] strace[733]: [pid   739] futex(0x16f6610, FUTEX_WAKE_PRIVATE, 1) = 1" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.544962755Z" level=debug msg="[   60.872984] strace[733]: [pid   738] <... futex resumed> )       = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.546334423Z" level=debug msg="[   60.873373] strace[733]: [pid   738] sched_yield()               = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.547955871Z" level=debug msg="[   60.873721] strace[733]: [pid   738] futex(0x16f6510, FUTEX_WAKE_PRIVATE, 1) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.550940141Z" level=debug msg="[   60.886227] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000}, NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.552997750Z" level=debug msg="[   60.888628] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000}, NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.555289073Z" level=debug msg="[   60.890728] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.558394937Z" level=debug msg="[   60.893032] strace[733]: [pid   739] write(2, \"crypto/rand: blocked for 60 seconds waiting to read random data from the kernel\\n\", 80 <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.560128887Z" level=debug msg="[   60.895956] strace[733]: [pid   738] <... nanosleep resumed> NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.563291771Z" level=debug msg="[   60.897798] strace[733]: [pid   738] nanosleep({tv_sec=0, tv_nsec=20000}, crypto/rand: blocked for 60 seconds waiting to read random data from the kernel" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.564623957Z" level=debug msg="[   60.900892] strace[733]:  <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.567031916Z" level=debug msg="[   60.902173] strace[733]: [pid   739] <... write resumed> )       = 80" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.571645711Z" level=debug msg="[   60.905115] strace[733]: [pid   739] futex(0xc000054848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout
time="2019-11-07T23:05:01.573297359Z" level=debug msg="[   60.907662] strace[733]: [pid   738] <... nanosleep resumed> NULL) = 0" jailer=runc ociBundlePath=/run/firecracker-containerd/default/test-drive-mount runcBinaryPath=/usr/local/bin/runc runtime=aws.firecracker vmID=test-drive-mount vmm_stream=stdout

The key line there being crypto/rand: blocked for 60 seconds waiting to read random data from the kernel.

@sipsma sipsma added the kind/bug Something isn't working label Nov 7, 2019
@sipsma
Copy link
Contributor Author

sipsma commented Nov 8, 2019

I tried adding rngd to our VM's boot sequence, but found that it failed to start in the VM w/ the error can't open /dev/hwrng: No such device. Based on the Firecracker thread about this, it seems like virtio-rng may be capable of providing such a device to the VM, but I have no idea if Firecracker supports it at this time.

I then tried starting haveged during VM boot, which worked as expected, unblocking agent on getrandom. However, haveged's method of operation is very far outside my area of expertise, so I can't personally conclude that the randomness it's generating is universally considered "safe" and is what we should include in our default rootfs. There may be other options besides rngd and haveged that I'm not aware of.

@nmeyerhans Do you have any thoughts on all of this? Is there any standard way of dealing with this type of issue in debian?

@nmeyerhans
Copy link
Contributor

nmeyerhans commented Nov 8, 2019

Have you tried passing random.trust_cpu=on to the guest kernel, as was suggested in the firecracker issue? The claim in that issue was the it worked, and I wonder if this broke somehow.

haveged is a reasonable approach. It's actually embedded in the Debian installer to ensure that systems have usable entropy during the installation process. See the discussion of this at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923675

There's an in-kernel jitter-entropy collector available in very new kernels (linux 5.4, IIRC), but I don't think we're quite ready for that. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=50ee7529ec45

"Manually" passing a random seed from the host to the guest via the kernel command line or IMDS is also something we can consider, at least as long as Firecracker doesn't support virtio-rng. No, don't do that. There's no way to reliably avoid leaking the seed, so the subsequent output would be predictable.

@sipsma
Copy link
Contributor Author

sipsma commented Nov 8, 2019

Have you tried passing random.trust_cpu=on to the guest kernel, as was suggested in the firecracker issue? The claim in that issue was the it worked, and I wonder if this broke somehow.

I didn't try that because based on the thread we'd need at least kernel 4.19 (which is plausible of course but would be nice to avoid as a minimum dependency if possible) and I wasn't sure if all our users would actually want to trust the CPU rng (though I'm happy to be told that's not a meaningful concern). It's worth a shot though as it seems pretty easy to compile a new kernel using their guide, I'll update with the results.

@sipsma
Copy link
Contributor Author

sipsma commented Nov 8, 2019

On a separate note, I think I managed to track down why this started suddenly happening to me out of seemingly nowhere (it's a doozy...)

Here's a stack trace from gdb running on agent that caught the getrandom syscall:

Thread 1 "agent" hit Catchpoint 1 (call to syscall 318), syscall.Syscall () at /usr/lib/go/src/syscall/asm_linux_amd64.s:27
27              CMPQ    AX, $0xfffffffffffff001
(gdb) info stack
#0  syscall.Syscall () at /usr/lib/go/src/syscall/asm_linux_amd64.s:27
#1  0x00000000004c35bd in internal/syscall/unix.GetRandom (p=..., flags=0, n=<optimized out>, err=...) at /usr/lib/go/src/internal/syscall/unix/getrandom_linux.go:35
#2  0x00000000005ea4e8 in crypto/rand.getRandomBatch (p=..., ok=<optimized out>) at /usr/lib/go/src/crypto/rand/rand_batched.go:40
#3  0x00000000005eb52d in crypto/rand.batched.func1 (buf=..., ~r1=<optimized out>) at /usr/lib/go/src/crypto/rand/rand_batched.go:28
#4  0x00000000005ea9e5 in crypto/rand.(*devReader).Read (r=0xc00013c5d0, b=..., n=0, err=...) at /usr/lib/go/src/crypto/rand/rand_unix.go:57
#5  0x000000000049b698 in io.ReadAtLeast (r=..., buf=..., min=4, n=<optimized out>, err=...) at /usr/lib/go/src/io/io.go:310
#6  0x0000000000781818 in io.ReadFull (r=..., buf=..., err=..., n=<optimized out>) at /usr/lib/go/src/io/io.go:329
#7  github.com/globalsign/mgo/bson.readRandomUint32 (~r0=<optimized out>) at /home/sipsma/go/pkg/mod/github.com/globalsign/mgo@v0.0.0-20180905125535-1ca0a4f7cbcb/bson/bson.go:234
#8  0x0000000000792a05 in github.com/globalsign/mgo/bson.init.ializers () at /home/sipsma/go/pkg/mod/github.com/globalsign/mgo@v0.0.0-20180905125535-1ca0a4f7cbcb/bson/bson.go:229
#9  0x00000000007933fc in github.com/globalsign/mgo/bson.init () at <autogenerated>:1
#10 0x00000000007b57a2 in github.com/go-openapi/strfmt.init () at <autogenerated>:1
#11 0x00000000007d7121 in github.com/go-openapi/runtime.init () at <autogenerated>:1
#12 0x00000000008e02cd in github.com/firecracker-microvm/firecracker-go-sdk.init () at <autogenerated>:1
#13 0x0000000000a29c9e in github.com/firecracker-microvm/firecracker-containerd/internal.init () at <autogenerated>:1
#14 0x0000000000c451a5 in main.init () at <autogenerated>:1
#15 0x000000000042f608 in runtime.main () at /usr/lib/go/src/runtime/proc.go:188
#16 0x000000000045a701 in runtime.goexit () at /usr/lib/go/src/runtime/asm_amd64.s:1337
#17 0x0000000000000000 in ?? ()

So it looks like there is an init() function several layers deep in our dependencies that is reading randomness... It appears to be a MongoDB library (which we obviously have no actual use for in our code...)

As for why it happened during the changes I was making, while I was not changing the agent code at all as part of what I was working on I happened to migrate this code from the runtime package to the internal package (just because it was helpful for some minor refactoring). Due to the way init() functions work in golang, even though that code is not called by agent at all the agent binary will still end up executing the init() because it imports the internal package (which starts off a chain of imports that leads to that problematic init).

While it's good there is at least something we can do to avoid this particular situation (I can rearrange imports to avoid that dependency ending up in agent), this is still an extremely delicate spot to be in. It would be much preferable to not be at the mercy of any transitive dependency deciding to do a random read in an init.

@sipsma
Copy link
Contributor Author

sipsma commented Nov 8, 2019

Using a 4.19 kernel @nmeyerhans sent plus including random.trust_cpu=on in the kernel boot options does appear to fix the issue for me.

So in terms of what we can do for the image-builder rootfs, the top options I'm seeing right now are:

  1. random.trust_cpu=on
    • Pros: Provided you trust the CPU, very simple
    • Cons:
      • Requires at least 4.19 guest kernel
      • Requires trusting the CPU (I've seen occasional suspicion expressed about this when reading around, but unsure whether that's just FUD)
      • (?) I'm genuinely not sure whether or not this is supported on all architectures (i.e. do aarch64 CPUs always have an instruction equivalent to RDRAND). We don't support aarch64 at the moment but it would be nice to have an option that works with it in the future.
  2. haveged (or similarly jitterentropy-rngd)
    • Pros: Provided you trust the algorithm, fairly simple (just install the daemon and start it via the init system early on in vm boot)
    • Cons: Requires trusting the algorithm (though it seems debian at least trusts haveged for its installer based on the thread linked to above).
  3. Do nothing (just try to avoid code that eats entropy during boot)
    • Pros: Easiest option at the moment
    • Cons: Puts us at the mercy of any transitive dependency in our agent code or in any binary run during vm boot. Could possibly make doing version upgrades a huge headache if they happen to introduce code that consumes entropy.

@nmeyerhans any thoughts on the above? I think my preference at the moment would be 2. and to document in bold that the image-builder rootfs uses haveged for entropy, which users should be aware of and evaluate whether they trust if they are going to do anything crypto-related in the microVM.

@nmeyerhans
Copy link
Contributor

RDRAND alone probably shouldn't be trusted. Mixing data from RDRAND with other sources of entropy would be reasonable, but we don't really have those. Consider issues like this one and these, which suggest that RDRAND probably shouldn't be trusted all by itself.

I think I'm inclined to agree that option 2 is reasonable, at least for now. Supporting build-time configuration options, such as the customization of the kernel commandline, might be worth doing. We should also see about getting Firecracker support for virtio-rng. It may not be appropriate in all environments, and it's still got some of the same trust issues as RDRAND, but I think every option involves some level of tradeoff. At least by supporting multiple options we give the user the opportunity to deploy the one that makes sense for their use case.

@sipsma
Copy link
Contributor Author

sipsma commented Nov 14, 2019

Fixed in our image-builder's rootfs by adding haveged as part of #326, resolving

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants