-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Slaves die randomly during build. #628
Comments
Broken communication with docker daemon (http hijacked connexion) |
Hi. Our Jenkins box is on one part of our corporate network (in Chicago), and the build node is local to our building (UK). The build node is behind a firewall (red zone) that stops outgoing connections from being made, but allows all incoming connections. The hijacked connection is interesting thought. I'd never have been able to figure that out from the stack trace. I'm not familiar with the internal connections between our red zone and the green zone. I could imagine that the connection may be hijacked. I'll look at spinning up a Jenkins docker on the docker host tomorrow. Sadly, the Jenkins box we already have on this network is v1.6, so way too old for docker! For now, I've spun up 4 slaves with a modified config to allow them to behave as a normal Jenkins slave, and they seem to work as expected. I'll update the ticket tomorrow, as soon as I have any findings. Thanks for the assistance. Pete |
IIUC you can't use a jnlp connection as your build node can't connect back to master. |
Sadly, I can't use the SSH connection in the standard form as it uses an SSH private key. There's a bug in one of the plugins which means the SSH key used initially is then used on any other SSH connection, including the one to gitlab. I'll also try using the modified docker image which has username/password based login to see if that improves the reliability. Thanks again for the assistance. Pete |
any reference ? |
Hi. The ones I found are a Stack Overflow and a Jenkins bug. https://stackoverflow.com/questions/16721629/jenkins-returned-status-code-128-with-github All I know is that if I connect via SSH Key injection, I can't then connect to our Gitlab server, as it rejects our SSH key. If I switch back to I'm rebuilding our base image to allow connection via SSH using username & password, and I'll see if that helps the connection hijacking. In parallel, I'm setting up a docker based Jenkins instance on the same box to see if that makes any difference (just in case the SSH connection doesn't fix it). Thanks. Pete. |
Have you tried using SSH key injection with the public Jenkins slave docker image? Also, when using SSH, make sure you set the key verification strategy to non-verifying or it'll reject most things. |
Hi. I've already done this. On a simple test (freestyle) project, it works correctly. When I extend this to one of my maven projects, it connects, but fails to pull the code from the repository. It appears that when I log in via SSH, even though the GitLab integration has a username and password, this username and password is completely ignored, e.g. if we log in via an SSH key injection as "root", and the GitLab pull is configured to use hudson/password, the root/no password is passed into gitlab, and it fails. I think at this stage I've used up enough of your time. I think my connection problems are probably because I'm using the experimental docker connection. If I can figure out how to configure Jenkins to honour the username and password specified in the configuration when pulling, rather than the one used to log in, I'll see where I get. Please feel free to close the issue. Thanks for your help. Pete. |
Hopefully the last update. I've built 4 static docker based slaves.
If I build a dynamic docker slave (using the docker plugin), none of the jobs work. They all fail with the Git plugin returning a status code 128. This sort of behaviour is mentioned in the "Gotchas" section of https://plugins.jenkins.io/git-client. I can't explain why the Jenkins user (elevated to root status) would pass, when the root user would fail. I also can't explain why the Jenkins user would pass when using a static docker container, when it fails when using a dynamic container, based on exactly the same image. At this stage, I don't know if it's a problem with the Git plugin, or something else. As I said before, please feel free to close the issue, and thanks for your help. Pete. |
I ran across this post while trouble-shooting the same issue, my Jenkins Docker builds would die with the following lines always appearing in the build log: "Expected n bytes application/vnd.docker.raw-stream header...". Figured the following might be helpful... I'm not sure if this was the ultimate cause, but I was building on a laptop with a wifi connection that was super-flaky. Once I realized this might be the cause I switched over to an ethernet connection, turned off the wifi on the laptop, then reconfigured the Jenkins docker cloud to use the ethernet if address and the issue stopped happening. Again, this could be just a coincidence, but since I saw very few posts regarding the "Expected n bytes" error, I thought this might be helpful to anyone else who comes across this. Cheers. |
@ReyChavez Flaky network connections are a pain in most environments, but in Jenkins then a momentary drop in network connectivity that results in TCP connections closing will be fatal to just about all operations - not just docker. So, if you're running anything on Windows, make sure that the operating system only ever sees a perfect network environment because Windows makes any minor outage into a fatal error which Jenkins will report as a build error etc. |
Hi, |
This started happening to me out of the blue. Are there any know workaround other than getting a better network connection? BTW my network connection seems to be pretty OK:
|
This issue is specific to the "attach" method. If you're suffering from this, you could try downloading the .hpi file that the jenkinsci build server auto-builds from #693 and see if that fixes it for you (that build is just the latest code + the bugfix). |
@pjdarton thanks for the clarification, I will definitely try out the PR build ( I'm wondering if this failure is related. Most of the time I get the
|
Hi, thanks for the tip ... I was getting this error, plugin 1.1.5, "attach" method, running jenkins in rancher (ie docker in docker) thanks alot!! |
* Fixes bug #628. * e.g. "java.io.IOException: Expected 8 bytes application/vnd.docker.raw-stream header". * Improve debug logging. * Add unit tests for DockerMultiplexedInputStream.
Docker Plugin versions:
1.1.3 and 1.2 SNAPSHOT
Jenkins Version:
2.89.4
Docker Version:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 02:21:36 2017
Possibly related issue:
#621
My actual problem
I created a basic freestyle test project. This just prints out a directory listing, touches a file (to make sure mounting volumes works correctly), but doesn't run for >60 seconds.
This always completes correctly.
With all versions of the plugin, what happens, other than for the trivial "Test" job, all jobs fail before completion.
All failing jobs are Maven based.
When they die, they are always cleaned up correctly, and the next job always starts correctly.
I upgraded to 1.2 SNAPSHOT after I saw the possibly related issue above, but it made no difference.
Infrastructure is a single VM, dual core 8Gb RAM, 100Gb HDD running the docker server + JFrog artifactory.
There is no old data present currently. I had an issue with old maven data being present, but that's been removed.
Relevant Jenkins Config
Template is configured to communicate via Attaching to Docker Container. This is because if I connect via SSH, something, somewhere has an issue, and the SSH key is passed from the Docker Plugin>Docker Container to the Container>GIT repository, which causes an authentication failure. I can't build a JNLP image on our server here, due to missing files, so I'm kind of stuck with the SSH image for now.
Configuration for the Jenkins Template is below. idleMinutes is set to 10, but it seems to make no difference if I set it to 60.
Clouds configuration is attached as clouds.txt.
Installation History
Version 1.1.2 was already installed on the box when I took over managing it. I then upgraded to 1.1.3. As far as I know, version 0.16.x or earlier has never been installed on this Jenkins.
I've installed 1.2 SNAPSHOT to see if it helped.
Stack Traces:
All stack traces are along the lines of the blocks below. I've not supplied the full build logs, as most of these are >1Mb, and don't actually give much information.
This stack trace shows it happening 5 minutes after the start of the build...
The underlying Jenkins log error for the stack trace above is
clouds.txt
logs.zip
This set of log files covers the docker inspect for the slave I'm building on, the docker logs (effectively empty), the Jenkins log, showing the underlying problem, and a trunated version of the Build console for the failed build.
If I can provide any information you think I've missed, please just respond and I'll try to provide it.
Many thanks.
Pete.
The text was updated successfully, but these errors were encountered: