Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet-dump 0x80004005 on .NET 5 in Linux Docker container #2098

Closed
mjrousos opened this issue Mar 18, 2021 · 11 comments · Fixed by dotnet/runtime#50477
Closed

dotnet-dump 0x80004005 on .NET 5 in Linux Docker container #2098

mjrousos opened this issue Mar 18, 2021 · 11 comments · Fixed by dotnet/runtime#50477
Assignees
Labels
bug Something isn't working dotnet-dump
Milestone

Comments

@mjrousos
Copy link
Member

mjrousos commented Mar 18, 2021

Description

Running dotnet-dump in a Linux Docker container appears to always require SYS_PTRACE capability to collect dumps now and fails with 0x80004005 without it.

Regression?

Yes. Using dotnet-dump to collect dumps in Linux containers generally worked without any elevated privileges when running on .NET Core 3.1. On .NET 5, however, dump collection fails without SYS_PTRACE.

Other information

Repro steps:

  1. On a Win10 dev machine running Docker Desktop targeting Linux containers.
  2. Create a new .NET 5 ASP.NET Core WebAPI from a new-project template and add a Dockerfile with VS’s right-click “add Dockerfile” option.
  3. Build the Docker image
    1. docker build -t mjrousos/dockertest:latest -f Dockerfile ..
  4. Run the Docker container
    1. docker run -d --rm mjrousos/dockertest:latest
  5. Exec into the container
    1. docker exec -it f273 /bin/bash
  6. Install and run dotnet-dump
    1. apt-get update
    2. apt-get install curl
    3. curl -L https://aka.ms/dotnet-dump/linux-x64 -o dotnet-dump
    4. chmod +x ./dotnet-dump
    5. ./dotnet-dump ps
      1. 1 dotnet /usr/share/dotnet/dotnet
    6. ./dotnet-dump collect -p 1
      1. Writing full to /app/core_20210317_144707
      2. Writing dump failed (HRESULT: 0x80004005)

--Diag output looks good at first, but concludes with this error:

EnumerateElfInfo: phdr 0x5586152a5040 phnum 10
ERROR: ReadMemory(0x5586152a5040, 38) phdr FAILED

If I retarget my app and Dockerfile to netcoreapp3.1, though, the exact same steps work great and I get a dump

Repro code

I've pushed a sample solution to mjrousos/DockerDotnetDumpRepro. In its current state, dotnet-dump fails but after retargeting the csproj and Dockerfile to .NET Core 3.1, dotnet-dump succeeds. Also, a sample Docker image (targeting .NET 5) is available on Dockerhub as mjrousos/dockertest:latest.

@mjrousos mjrousos added the bug Something isn't working label Mar 18, 2021
@mikem8361 mikem8361 self-assigned this Mar 23, 2021
@mikem8361 mikem8361 added this to the 6.0.0 milestone Mar 23, 2021
@mikem8361
Copy link
Member

I've repro'ed the problem using your project/docker file (thanks for that), but I haven't figured what is going yet. If I create a simple container with 5.0 SDK image, create and run a webapp in it, dotnet-dump collect on it works. It seems to be something do with launching in the docker file. I'm going to continue to investigate.

@mikem8361
Copy link
Member

The conclusion I've come to is that the SYS_PTRACE capability is required for a docker container that launches the app with ENTRYPOINT (I'm sure a pretty common case). Which is pretty much same conclusion the mail thread this issue came from said. The read memory failure is EPERM "Operation not permitted" and it is on app's module header which is critical for createdump to read. I'm not sure what createdump can do to fix this. I don't know enough about docker and linux in this area to come up with any workarounds.

/cc: @hoyosjs @shirhatti

@mjrousos
Copy link
Member Author

Any ideas what makes .NET Core 3.1 containers different (since dotnet-dump works there)?

@mikem8361
Copy link
Member

mikem8361 commented Mar 27, 2021 via email

@saul
Copy link

saul commented Mar 27, 2021

Docker uses the OS kernel - that's why the kernel is identical across containers.

@hoyosjs
Copy link
Member

hoyosjs commented Mar 30, 2021

Some further progress here, it looks like this is a regression caused by dotnet/runtime#420, where process_vm_readv returns EPERM. If pread/pread64 is used accordingly, the dump completes with no issues. Looks like this is expected according to how seccomp works under docker: https://docs.docker.com/engine/security/seccomp/, where process_vm_readv requires CAP_SYS_PTRACE . Most other projects seem to fall back to some other reading mechanism, either ptrace themselves or reading through IPC or from /proc mem

@hoyosjs
Copy link
Member

hoyosjs commented Mar 31, 2021

To make it even more fun (read complicated), this depends on the kernel version and the version of docker being used:

  • If you are on a system with a kernel < 4.8, you will always need CAP_SYS_PTRACE. This is disabled by default on docker due to bad security issues in the interactions between ptrace and seccomp, where ptrace can be used to break seccomp.
  • If you are on Docker 19.03+, you are allowed to use ptrace because of this PR with a kernel that's 4.8 or above for child processes without needing CAP_SYS_PTRACE (docker does a little more than it claims when adding capabilities, it adds syscalls to an allowlist). This is the scenario where dotnet-dump works nicely up to .NET Core 3.1 and fails on the experiments on .NET 5.0.
  • Eventually, the process_vm_* calls will also work due to this , which is currently only in upstream. I am creating a PR and will take it to ship-room for 5.0 as I am not sure when such a fix will make it to widely available docker versions.

And looking at the containerd code, seccomp seems to always disable ptrace there. I don't expect even crashdumps to work under containerd

@kamikyo
Copy link

kamikyo commented May 25, 2021

kernel'version:3.10.0-957.el7.x86_64
Using .NET Core global CLI tools in a sidecar container
The target container:
docker run -idt --name test --cap-add=SYS_PTRACE -v /dotnet-debug/dump-tmp/:/tmp net5runtime:latest
The sidecar container:
docker run -it --name dumplearn --cap-add=SYS_PTRACE -v /dotnet-debug/dump-tmp/:/tmp --pid=container:test net5sdk:latest /bin/sh

dotnet-dump's version:5.0.221401+2ee978c099e2af7ce69aad0a8c8aabb719fc952d
Still have this problem:Writing dump failed (HRESULT: 0x80004005)

So, this problem will be resolved in version 6.0.0, right?
Thank you for your great contribution @hoyosjs

@hoyosjs
Copy link
Member

hoyosjs commented May 26, 2021

It's solved in 5.0.6 as well, and 3.1 had no such issue. The problem is 3.10 is an old kernel, so there's a few things that I am not sure what's going on and I don't expect them to be related to this issue given that you tried SYS_PTRACE. Is this reproducible with any app? It's likely this will have something to do with sidecars and pid-namespace sharing, but this is all conjectures until this gets some more investigation. When you run dump, the verbose flag could give us some pretty useful spew to know where things go south

@kamikyo
Copy link

kamikyo commented May 26, 2021

Yes, I am running the simplest "hello world" console app in the target container, just like this:

Console.WriteLine("hello world");
Console.ReadLine();

Both my SDK and runtimes use version .NET 5.0.6

In addition, is there any way I can gather more information to help you locate the problem? @hoyosjs

@lghinet
Copy link

lghinet commented Jul 12, 2021

we solved this by specifying a shared output path between the target container and the sidecar,

dotnet-dump collect -p X -o /tmp/dump1

@ghost ghost locked as resolved and limited conversation to collaborators Jun 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working dotnet-dump
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants