Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JNLP launcher broken since 1.1.3 #635

Closed
fxnn opened this issue Mar 13, 2018 · 19 comments
Closed

JNLP launcher broken since 1.1.3 #635

fxnn opened this issue Mar 13, 2018 · 19 comments

Comments

@fxnn
Copy link
Contributor

fxnn commented Mar 13, 2018

Since update to 1.1.3, docker-plugin fails to launch containers on our Jenkins instance.

We use containers with java preinstalled, and the Connect with JNLP option. We did not need to configure any entrypoint or command, neither in Dockerfile, nor in Jenkins. Therefore, we could keep minimal configuration (convention over configuration).

This worked, but since 1.1.3, we get the following log messages:

Mar 12 06:25:08 bvsl05linux06 dockerd[27174]: time="2018-03-12T06:25:08.845787483+01:00" level=debug msg="Calling GET /images/l5-docker.mydomain.com/build-docker:latest/json"
Mar 12 06:25:08 bvsl05linux06 dockerd[27174]: time="2018-03-12T06:25:08.855256229+01:00" level=debug msg="Calling POST /containers/create?name=cc1448a549e4"
Mar 12 06:25:08 bvsl05linux06 dockerd[27174]: time="2018-03-12T06:25:08.856180879+01:00" level=debug msg="form data: {\"Cmd\":[\"-url\",\"http://jenkins.mydomain.com/jenkins/\",\"60651987a0fdca5e8ea707b3d509256cd68fe1b44483ddeb2a91da1658ef8e6b\",\"docker-cc1448a549e4\"],\"Env\":[\"COLLECTD_DOCKER_APP=jenkins-docker-agent\"],\"ExposedPorts\":{},\"HostConfig\":{\"Memory\":524288000,\"MemorySwap\":-1,\"NetworkMode\":\"host\",\"PortBindings\":{},\"Privileged\":false,\"PublishAllPorts\":false},\"Image\":\"l5-docker.mydomain.com/build-docker:latest\",\"Labels\":{\"JenkinsContainerImage\":\"l5-docker.mydomain.com/build-docker:latest\",\"JenkinsId\":\"295c19b7077fc8004251d87c5f81ef7d\",\"JenkinsServerUrl\":\"http://jenkins.mydomain.com/jenkins/\"},\"NetworkDisabled\":false,\"Tty\":false,\"Volumes\":{},\"name\":\"cc1448a549e4\"}"
...
Mar 12 06:25:08 bvsl05linux06 dockerd[27174]: time="2018-03-12T06:25:08.944657766+01:00" level=debug msg="Calling POST /containers/de8ca31fa47d95a6c4ef4ba4fb29a392f78aa0c5277c79497df3e31b33837ef1/start"
...
Mar 12 06:25:09 bvsl05linux06 dockerd[27174]: time="2018-03-12T06:25:09.189358823+01:00" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:265: starting container process caused \"exec: \\"-url\\": executable file not found in $PATH\"

Seems that docker-plugin doesn't invoke java anymore, but simply passes -url, which obviously doesn't work without manually configuring an entrypoint: \"exec: \\"-url\\": executable file not found in $PATH\"

Seems to be introduced through commit 4ae1f17df29, where you dropped the java etc. arguments. The commit message says something from a "default behaviour", but I don't see how to switch back to the old behaviour.

Thank you for your work and for looking into this!

docker-plugin version: 1.1.3
jenkins version: 2.110
docker engine: 17.09.1-ce

@fxnn
Copy link
Contributor Author

fxnn commented Mar 16, 2018

...my expectation is as described in the docs:

Launch via JNLP

  • a JDK installed. You can just use jenkins/jnlp-slave as a basis for a custom image.
  • Jenkins master URL has to be reachable from container.
  • container will be configured automatically with agent's name and secret, so you don't need any special configuration of the container

No mention of a special script or the remoting JAR, which -- from my point of view -- doesn't make any sense to duplicate into every custom image, when it can be provisioned by Jenkins (as it has ben the case).

@magn2
Copy link

magn2 commented Apr 13, 2018

ran into the same problem.... can this please be fixed?
still having this issue #581 and i do want to switch to JNLP

@pjdarton
Copy link
Member

pjdarton commented May 9, 2018

I've recently had to do some investigation in this area (docker + JNLP) and I've found out some information that may be useful.

FYI the basic expectation of the plugin code seems to be that any container that's connecting via JNLP will have the same kind of entry point as https://hub.docker.com/r/jenkins/jnlp-slave/
That seems to be the "official" reference JNLP docker image; that image doesn't take a "java" argument, and so that's what the docker-plugin is (now) set up to use.
While the documentation doesn't make this explicit, it does at least hint at this by suggesting that you "can just use jenkins/jnlp-slave as a basis for a custom image" - I guess what it should do is make it clear that this isn't merely a "can", but a "...and if you don't, you need to make sure that you behave the same way as that one does."
i.e. This may be more of a documentation issue than a code functionality issue.

It does, however, highlight a discrepancy between https://hub.docker.com/r/jenkins/jnlp-slave/ and what other users (such as yourselves) "reasonable expectations" might be - personally, I don't think you're being unreasonable in expecting it to be possible to use a custom image for JNLP without having to copy/paste the script that the "official" JNLP image uses.
Making it possible to use either the "official style entry point" or a custom entry point may require a non-trivial amount of additional (optional) configuration options in the DockerComputerJNLPConnector class though, and I'm not entirely sure I know what a "good solution" would even look like in the GUI.

@pjdarton
Copy link
Member

I've had a go at implementing a solution for this. Can you both (@fxnn and @magn2) please try it out and see if it resolves the situation?

To test it, download the .hpi file from https://ci.jenkins.io/job/Plugins/job/docker-plugin/view/change-requests/job/PR-654/lastSuccessfulBuild/artifact/target/, install it into a test Jenkins server, then drill down to the JNLP configuration for your container (Manage Jenkins -> Configure System -> Cloud -> your docker cloud -> Templates -> your JNLP-using template -> "Connect method") and then take a look at the new field "EntryPoint Arguments".
Hopefully the (newly written) help text (which shows if you press the round ? icon) will tell you everything you need to know.

If you can try this out for me, and also let me know any comments/feedback you have (e.g. suggestions on different names, different wording on the help text), then there's a good chance we can get this enhancement included in the next official version of the docker-plugin.

@fxnn
Copy link
Contributor Author

fxnn commented May 25, 2018

Nice job, thank you for looking into this!!

I'll comment as soon as I have the test plugin installed.

@magn2
Copy link

magn2 commented May 29, 2018

Thanks for looking into this @pjdarton.
I took the build for a test spin today, looks promising so far.
I will leave it running for one day now, and see if something goes wrong.

Only thing that got me confused in the first place is the multiline mode (turn it on before pasting the example!)

@pjdarton
Copy link
Member

Confused? Could you explain in more detail?

If you could describe exactly what it did that was confusing, and what you think it should do instead, it's possible that I might be able to make it less confusing.

@magn2
Copy link

magn2 commented May 29, 2018

Confused meaning: The example (sh -c ....) did only work when I pasted it in multiline mode, not as one line.

@pjdarton
Copy link
Member

Ah, ok. Understood.

OK, I'll edit the help-text and see if I can make it clearer.
(we don't want lots of people copy/pasting the example and then raising a bug report claiming that it doesn't work!)

@pjdarton
Copy link
Member

OK, I've enhanced the help text for...

  • The prerequisites
  • The JNLP entry point arguments

See ad68a22 for details of the changes.

There's a new hpi at https://ci.jenkins.io/job/Plugins/job/docker-plugin/view/change-requests/job/PR-654/lastSuccessfulBuild/artifact/target/ with the new help text.
Note: Functionality is unchanged - this only alters the help text.

@magn2
Copy link

magn2 commented May 29, 2018

Thanks, jeah that should make it clear 👍
So the entrypoint seems to work fine for all of my containers.

The following is unrelated:

After having it running a bit, I still seem to have issues with my connectivity, closing connections and random timeouts after some time of the containers running..... this is similar to the "attach" method.
I still have no idea by which this is caused.
After some time i saw a
Cannot contact docker-0012dzhvzvp11: java.lang.InterruptedException
followed by a

java.nio.channels.ClosedChannelException
Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from [redacted]:37076 failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:948)
	at hudson.FilePath.act(FilePath.java:1036)
Caused: java.io.IOException: remote file operation failed: [redacted] at hudson.remoting.Channel@62d630f5:JNLP4-connect connection from [redacted]:37076

I mean, this all can be caused by our company network/proxy stuff.
But I really have no idea how to debug it, and i see similar reports in the issues.

@pjdarton
Copy link
Member

Re: unrelated

I've experienced disconnection issues where I work too, and debugging them is very difficult :-(
What we did was to provide a Java properties file configuring the slave logging, and and lots of logging information on (a set of 9 25meg) rotating logs so that, when a disconnection happened, we could see both the slave logs and master logs and figure out what went wrong. We also set up some logs within the Jenkins master configuration.
What I've found is

  • Jenkins is perfectly happy to run "Flyweight" tasks (e.g. the coordinating job for a multi-configuration build) on a docker container that doesn't "use" an executor and hence when the container has finished doing its main job (whatever work did use an executor) the multi-configuration job dies). Or vice-versa.
  • Severe performance issues can cause the Jenkins master-slave pings to time out (by default a response is needed within 4 minutes)
  • Brief network outages on a Windows machine's NIC will be cascaded into catastrophic fatal outages that kill all current TCP connections. Microsoft claim this is WAD; everyone else (including those who wrote the book on networking) disagree. So don't use Windows for anything important.
  • I think the issue became more pronounced since the switch to JNLP4 and/or us upgrading from an old Jenkins to 2.89.2.
  • ...and if you do get a real company network outage then, of course, you'll also get some/a lot of disconnects.

w.r.t. configuring logging: On a docker container, for it to be useful, it'd have to log to a file that was on a non-temporary filesystem, e.g. something mounted from the host (and, as the logfiles would be shared, you'd need to include something like hostname/container-names/slave-names in the log entries in order to avoid ambiguity).

@pjdarton
Copy link
Member

@fxnn Any further comment? Does this fix it for you?

@fxnn
Copy link
Contributor Author

fxnn commented May 29, 2018

@pjdarton Tested with Jenkins LTS 2.107.3 and work's like a charm for me!

I wonder whether the prerequisite text of the connect method could be misleading. It says:

Docker image must either have slave.jar pre-installed or have its entry point code download it.

To me, this sounds a bit like "okay, there's a slave.jar somewhere, Jenkins will somehow find and launch it". My suggestion would be

Docker image must launch the slave.jar program by itself or using the "EntryPoint Arguments" option below.

Anyways, thanks a lot for looking into this!

@pjdarton
Copy link
Member

Text updated. See afefa53 .
HPI build available in the usual place.

This will be "fixed in the next release".

@leodutra-aurea-zz
Copy link

When will it be released? Any idea?

@pjdarton
Copy link
Member

pjdarton commented Jun 7, 2018

There's some work in progress at present. Once that's done (1-2 weeks?) and soak-tested (1-2 weeks) then, if there's nothing else urgent in progress...

In the meantime, download the hpi (see above for URL) of the dev build.

@leodutra-aurea-zz
Copy link

leodutra-aurea-zz commented Jun 7, 2018

@pjdarton I cannot download them directly (404).
Is there a way to use the link directly on Jenkins or something like that?

Thank you

@pjdarton
Copy link
Member

pjdarton commented Jun 7, 2018

Ah, yes, oops - my mistake. As the changes have already been merged, the PR is closed, so the Jenkins CI system tidies it all up, hence 404. All these changes are now in the main code, albeit not released yet.
You'll find the bleeding-edge (unreleased) builds here - follow the links to the last successful artifacts, grab the .hpi file from there.
You might need to rename it (to remove the version number stuff) so it's just docker-plugin.hpi prior to uploading it to your Jenkins server, I'm not 100% sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants