Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker **Client** Hang on Windows Subsystem for Linux (WSL) #1123

Closed
MicahZoltu opened this issue Sep 25, 2016 · 36 comments
Closed

Docker **Client** Hang on Windows Subsystem for Linux (WSL) #1123

MicahZoltu opened this issue Sep 25, 2016 · 36 comments

Comments

@MicahZoltu
Copy link

MicahZoltu commented Sep 25, 2016

NOTE: This is a problem with running the docker CLIENT inside of WSL. This is NOT a problem with running the docker daemon inside WSL which I recognize should not be expected to work at all.

A brief description

After downloading the docker client in WSL I am able to run most commands (run, ps, images, rm, etc.). However, when I try to do a docker build it hangs while copying the context.

Expected results

I expected it to build my image.

Actual results (with terminal output if applicable)

$ ~/docker/docker -H tcp://0.0.0.0:2375 build .
Sending build context to Docker daemon 12.81 MB

It sits here indefinitely. I once let it sit for a while (not sure how long, maybe 30-60 minutes) and it never moved on. The number of MB it gets to before failing varies slightly each time, though it is usually within a few MB of 12MB. When running docker client in windows, it indicates a 53MB context.

Your Windows build number

Windows 10 Enterprise N with latest patch.

Steps / All commands required to reproduce the error from a brand new installation

  1. Install Windows 10.
  2. Install Docker for Windows.
  3. Install Windows Subsystem for Linux.
  4. Open a bash prompt.
  5. cd ~
  6. wget https://get.docker.com/builds/Linux/x86_64/docker-1.13.0.tgz
  7. tar -xzvf docker-1.12.1.tgz
  8. echo FROM alpine:latest > Dockerfile
  9. dd if=/dev/urandom of=file.txt bs=1048576 count=20
  10. ~/docker/docker -H tcp://0.0.0.0:2375 build .
  11. Wait indefinitely.

Related docker issue (unclear whether the problem is Docker or WSL): moby/moby#26889

Additional Notes

The problem appears to only occur with large contexts (over 12MB). With a trivially small context everything works. Some details on the exact failure can be found in this comment: moby/moby#26889 (comment)

@aseering
Copy link
Contributor

Hi @MicahZoltu , thanks for reporting this! It sounds very similar to #575 to me; possibly with #610 or #616 as the underlying cause. (I realize that you're reporting a client problem; those tickets discuss a problem with TCP connections such as the connection between a WSL docker client and a remote docker server.)

Could you take a look at those tickets? If you agree that one of them is a duplicate, could you close this as a dupe and follow up there so that we can have all discussion in the same place? If not, please post some more detail to help understand how these issues differ.

@MicahZoltu
Copy link
Author

I don't believe #575 is related as I am able to do most docker operations, just not this one. #610 and #616 I don't believe I have enough knowledge/understanding of the underlying root cause to be able to identify as duplicates or not. If someone more knowledgeable than I about the inner workings of docker and/or those tickets believes this is a dupe then I support closing this as a dupe.

@aseering
Copy link
Contributor

Regarding #575 , for what it's worth, I believe the issue with most commands was due to a misconfiguration that was corrected; the problem that's being tracked is a hang in docker build.

@lypanov
Copy link

lypanov commented Dec 4, 2016

Any news on this? Quite a blocker.

@aseering
Copy link
Contributor

aseering commented Dec 4, 2016

@sunilmut , is it possible that this is resolved by the wonderful fix that you've just put in for #616 ?

@sunilmut
Copy link
Member

sunilmut commented Dec 4, 2016

It's possible. I just tried the repro posted by @MicahZoltu, but got stuck at step 8. I also didn't install Docker for Windows. Is that required for step 8? (Is that what starts the server at 2375?)
Note that I am running Ubuntu 16.04 on WSL.

:~# ~/docker/docker -H tcp://0.0.0.0:2375 build
"docker build" requires exactly 1 argument(s).
See 'docker build --help'.
Usage: docker build [OPTIONS] PATH | URL | -
Build an image from a Dockerfile

@MicahZoltu
Copy link
Author

@sunilmut Yeah, you need Docker for Windows installed for step 8 to work. Step 8 attempts to use the docker client inside of the Linux subsystem to communicate with the docker server running inside a Linux Hyper-V VM that Docker for Windows installs/sets up.

@MicahZoltu
Copy link
Author

@sunilmut Oh, also it appears you are missing the trailing . in your docker build command. It is required (sets the context directory).

@aseering
Copy link
Contributor

aseering commented Dec 4, 2016

I believe you do need Docker for Windows installed. I think you also need a Dockerfile in your current directory; the issue is more pronounced if you have both a Dockerfile and some other arbitrary large files in the same directory -- the docker client's "build" command grabs the whole current directory and sends it to the server over TCP.

@MicahZoltu
Copy link
Author

MicahZoltu commented Dec 4, 2016

Improved the repro steps some more. This time the steps explicitly create a Dockerfile and a 20MB random contents file in the CWD (should be home directory) rather than cloning a repo (which would require git installed).

"Step 8" discussed above is now Step 10.

@sunilmut
Copy link
Member

sunilmut commented Dec 5, 2016

Thanks guys for the help. The issue repro'd even with the fix. I substituted "Docker for Windows" with "Docker for Linux", which should also be a viable option. I could see the docker from WSL connected to the server on Docker for Linux (netstat showed established connection), but I didn't see any data transfer happening. I couldn't get into the nitty gritty of the traces, but it seems like it was waiting for epoll. So, maybe these issues are not related. I will try to dig further into this later.

@aseering
Copy link
Contributor

aseering commented Dec 6, 2016

@sunilmut -- what do you mean by "Docker for Linux"? Is the Docker server expected to work correctly on Linux now? (If so, that's unexpected but really cool :-) )

@MicahZoltu
Copy link
Author

I'm assuming he meant he had a linux box with docker on it and he connected up to it with -H tcp://...:2375. I believe the problem will reproduce no matter where the host you are trying to build on is as the bug is in the client/linux subsystem.

@sunilmut
Copy link
Member

sunilmut commented Dec 6, 2016

@MicahZoltu correct.
@aseering - apologize for using relaxed and misleading (and maybe incorrect) terms :).
But, the difference is that in my repro, there is no byte transfer that seems to be happening between client and server, which looks different from what you have described above. @MicahZoltu - Can you share a strace from step 10?

@MicahZoltu
Copy link
Author

Sorry for the delay in getting back to you @sunilmut

$ strace -e trace=open,stat,read,write ~/docker/docker -H tcp://0.0.0.0:2375 build .
read(3, "\225\364'-{(\376&b\267\t\23\347o\361x\333\3105y\17\327\222\20i\366\252\230\27\271\373\243"..., 4096) = 4096
stat("/home/micah/.docker/config.json", 0xc820328d38) = -1 ENOENT (No such file or directory)
stat("/home/micah/.dockercfg", 0xc820328e08) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/docker-credential-secretservice", 0xc820328ed8) = -1 ENOENT (No such file or directory)
stat("/usr/bin/docker-credential-secretservice", 0xc820328fa8) = -1 ENOENT (No such file or directory)
stat("/bin/docker-credential-secretservice", 0xc820329078) = -1 ENOENT (No such file or directory)
stat("/usr/local/games/docker-credential-secretservice", 0xc820329148) = -1 ENOENT (No such file or directory)
stat("/usr/games/docker-credential-secretservice", 0xc820329218) = -1 ENOENT (No such file or directory)
stat(".", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/home/micah", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
Sending build context to Docker daemon 557.1 kB

It just sits on that last line forever. I have never used strace before and got the parameters off a random stack overflow answer. Let me know if you want me to run it with a specific set of parameters and I can reproduce easily.

@MicahZoltu
Copy link
Author

For comparison, here is the same Dockerfile being built with an essentially empty context (no data to transfer into the image):

$ strace -e trace=open,stat,read,write ~/docker/docker -H tcp://0.0.0.0:2375 build .
read(3, "\303\30_\252\356\301Hn\16AT\212v\2427\311t\333\261\1\340\360\246~\7y\275\362X\224e\332"..., 4096) = 4096
stat("/home/micah/.docker/config.json", 0xc820330518) = -1 ENOENT (No such file or directory)
stat("/home/micah/.dockercfg", 0xc8203305e8) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/docker-credential-secretservice", 0xc8203306b8) = -1 ENOENT (No such file or directory)
stat("/usr/bin/docker-credential-secretservice", 0xc820330788) = -1 ENOENT (No such file or directory)
stat("/bin/docker-credential-secretservice", 0xc820330858) = -1 ENOENT (No such file or directory)
stat("/usr/local/games/docker-credential-secretservice", 0xc820330928) = -1 ENOENT (No such file or directory)
stat("/usr/games/docker-credential-secretservice", 0xc8203309f8) = -1 ENOENT (No such file or directory)
stat(".", {st_mode=S_IFDIR|0777, st_size=0, ...}) = 0
stat("/home/micah/foo", {st_mode=S_IFDIR|0777, st_size=0, ...}) = 0
read(6, "FROM alpine:latest\n", 32768)  = 19
read(6, "", 32749)                      = 0
read(4, 0xc820335000, 4096)             = -1 EAGAIN (Resource temporarily unavailable)
write(4, "POST /v1.24/build?buildargs=%7B%"..., 378) = 378
write(4, "7ff\r\nockerfile\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2054) = 2054
) = 481, "Sending build context to Docker "..., 48Sending build context to Docker daemon 2.048 kB
write(1, "\r\n", 2
)                     = 2
Step 1/1 : FROM alpine:latest
latest: Pulling from library/alpine

3690ec4760f9: Pull complete
Digest: sha256:1354db23ff5478120c980eca1611a51c9f2b88b61f24283ee8200bf9a54f2e5c
Status: Downloaded newer image for alpine:latest
 ---> baa5d63471ea
Successfully built baa5d63471ea
+++ exited with 0 +++

@aseering
Copy link
Contributor

aseering commented Dec 10, 2016

Hm... @MicahZoltu -- could you try omitting -e trace=open,stat,read,write from strace's argument list? That command filters strace's output to just those four syscalls; those calls are very useful for tracing file IO, but less so for network IO, which is what I would expect to be relevant here.

The full (unfiltered) strace output can be quite verbose. You may wish to use -o <filename> to direct it to a file rather than stderr.

@MicahZoltu
Copy link
Author

@aseering
Copy link
Contributor

Both of those actually say "Successfully built"... Did one of them fail?

@MicahZoltu
Copy link
Author

They are two files in the same gist, here is the end of the broken file: https://gist.github.com/MicahZoltu/4f80319472d3872931d467134f9434c1#file-broken-with-context-txt-L607

@bitstrings
Copy link

I believe I have a related issue.

If you try to docker cp a file >~ 1.2MB the hang also happens.

It transfers some but then hangs.

You can easily reproduce but simply using cat file | docker exec -i {CONTAINER} sh -c 'cat > out_file'

I use a 15MB file so that it happens 100% of the time.

@AaronFriel
Copy link

AaronFriel commented Dec 23, 2016

I have also experienced hangs with large Git repos. I ran a "git add" on a very large folder in WSL and the process hung.

@sunilmut
Copy link
Member

@bitstrings @AaronFriel - Hang/incomplete transfer on large files is a known issue, documented in #610, #616. Fix for that is inbound. This issue seems to be a bit different because I don't see anywhere in the trace uploaded by @MicahZoltu, where socket send/recv is happening.

@laurensV
Copy link

Can confirm that I have the same problem. I run the docker engine on windows and the docker client on bash on windows. I can run and see docker containers with no problems, but it hangs when trying to build an image. Hopefully this issue gets fixed soon!

@DvdGiessen
Copy link

I ran in to this issue yesterday and was just about to comment here today with some details on my setup and findings from digging though the logs, but I noticed I could no longer reproduce this issue since my PC updated to Windows 10 build 15002 overnight.

Microsoft did release a bunch of updates and fixes for WSL with this build[1], a few related to TCP connection bugs as well. Perhaps that resolved the underlying issue here.

[1]: https://blogs.msdn.microsoft.com/commandline/2017/01/09/bash-in-windows-insider-build-15002-many-fixes-but-a-couple-of-bugs/

@sunilmut
Copy link
Member

@DvdGiessen - Thanks for validating and the comment. It's helpful to know.

@MicahZoltu, @AaronFriel, @laurensV - if you can also validate your scenario with 15002 and see if it works, then I will happily close this issue out.

@derekbelrose
Copy link

I am running 15007 17017-1846 with WSL along with "Docker For Windows" with a LinuxVM in HyperV.

I installed docker through WSL with: sudo apt-get update && sudo apt-get install -y docker.io
I then did the following in Bash on Ubuntu on Windows:
docker -H tcp://0.0.0.0:2375 run -it --rm alpine:latest /bin/sh

image

Success!

@MicahZoltu
Copy link
Author

@sunilmut: I just updated to official Windows 10 release version 1607 build 14393.693 and the problem persists. I believe version 15002 is an insider build (I don't think it ever landed on my system)? Perhaps this means a future official update will resolve the issue?

@derekbelrose: the problem isn't with running docker images, it is with building them when they have large contexts. I recommend following the instructions in the original post and see if you experience the hang described there, I have updated the instructions a few times during the life of this issue and they should be pretty easy to follow at this point.

@aseering
Copy link
Contributor

@MicahZoltu -- WSL updates are (for the most part) only available in Insider builds right now. You're correct, as I understand it, that this means that they will be available in a future stable release.

@derekbelrose
Copy link

@MicahZoltu Well, besides using apt-get install docker.io instead of downloading docker in a tarball, I just followed your instructions to a T. It worked.

I even went a bit further and copied the 21MB file into the image being built and I was able to build it exactly the same way in about 3 seconds.

I cannot reproduce this on 15007. WSL is actively being worked on in the insider builds and does do as expected in this scenario.

@dashesy
Copy link

dashesy commented Mar 15, 2017

docker-compose build works but docker build . does not!

@sunilmut
Copy link
Member

I am closing this issue out since it seems to be fixed. If anyone experiences any problem here, please speak out and we will gladly reopen the issue.

@MicahZoltu
Copy link
Author

It's there a mechanism for getting notification when whatever insider code has this fix is merged to the mainline builds? I was using this issue to monitor for a fix so I could verify once it was released officially (non-insider).

Originally, I assumed that insider builds were betas for mainline releases, but that isn't the case as there have been mainline releases of Windows 10 since this was fixed in insider, yet the problem persists on mainline Windows 10.

@benhillis
Copy link
Member

@MicahZoltu - The current plan of record is to update mainline WSL with the standard Windows ship cadence (which we're targeting twice a year). There are different Insider levels that mean different things. "Release Preview" is what you describe, where you're essentially trying out things that will be later released as Windows Updates to mainline Windows. Those will primarily be security and reliability fixes. For new feature work "Insider slow" or "Insider fast" are the levels you should use. This will have features that will not reach general availability until the next Windows release.

@MicahZoltu
Copy link
Author

@benhillis Ah, OK. So if a bugfix like this is part of insider slow/fast, I should not expect to see it in the regular windows updates (the ones my computer is constantly doing in the background on what seems like a weekly basis) but instead expect to see it in the next full release, like the Anniversary update or the upcoming Creators Update.

@sunilmut
Copy link
Member

@MicahZoltu - In general, yes, unless the bugfix is for a critical security issue or a major blocker. In which case we try to backport the fix to the last Windows full release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests