Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add missing descriptor #726

Merged
merged 2 commits into from
Apr 1, 2019
Merged

add missing descriptor #726

merged 2 commits into from
Apr 1, 2019

Conversation

jetersen
Copy link
Member

@jetersen jetersen commented Mar 29, 2019

Related to
jenkinsci/google-compute-engine-plugin#53 (comment)
jenkinsci/configuration-as-code-plugin#804

we are trying to fix JCasC to avoid disconnecting ephemeral cloud agents

@jetersen
Copy link
Member Author

jetersen commented Apr 1, 2019

@pjdarton PTAL 😄 See this excellent summary jenkinsci/google-compute-engine-plugin#53 (comment)

@pjdarton
Copy link
Member

pjdarton commented Apr 1, 2019

I've looked. I'm not much wiser. I've never played with JCasC (config as code is on my TODO list, but keeps being pushed downwards by other issues).

I'm wary of JCasC-related changes as, last time someone wrote a PR to improve JCasC support for a plugin I'm involved with, fiddling with descriptors had unwanted side effects (and that PR was then effectively abandoned, presumably because config-as-code got pushed off the top spot on their TODO list too).
I'm also wary because I don't know why there isn't a descriptor already and why, if having one is so important, the code "works fine for everyone else" without one - I know that most things have descriptors, so why doesn't this have one?
Lastly, I'm also wary because it's far too easy to make changes to Descriptors and Extension Points that seem to be working fine but actually have unwanted side effects that only show up later (e.g. #697 and this comment in particular).

Personally, I'd like to see a nice unit test that checks that everything has a descriptor that works the way such descriptors were supposed to work. i.e. not a unit-test that merely checks that the code here is the way the coder wrote it, but a not-docker-plugin-specific unit test that one can apply to any code to verify that it's correct.
If there was something which said "this is the way that all cloud plugins should be coded" and I could see that all of them did that except the docker-plugin then it'd give more confidence that making such changes was correct.

i.e. This is not my field of expertise, so changes like this scare me as I can't tell if they're correct or not by inspection, there's not enough unit-testing to prove they're correct, and changes like this can have undesired side effects.

If you know better than I do then please explain.

@jetersen
Copy link
Member Author

jetersen commented Apr 1, 2019

@pjdarton looking at VSphere, Azure VM, Kubernetes and EC2 clouds they all have SlaveDescriptors so 🤷‍♂️

I can see if I can get a test working but what do you want me to test?
Would not know how to configure the test against Jenkins CI and make it generic enough to have it work locally.
@robincsmith has already confirmed it is working.

@pjdarton
Copy link
Member

pjdarton commented Apr 1, 2019

Well, I wouldn't trust the vSphere plugin to be correct because I'm the "last man standing" on that plugin (all other maintainers ran off, leaving me with sole commit access) ;-)
Historically, when I've done work on cloud plugins, I've generally looked at what the vSphere, docker and OpenStack plugins each do and try to use that to guide "best practice" as those were the plugins I was working on - I don't know if there's a plugin which is a "known good" one (they all seem to be messy in different ways).

Testing: Check that you can't manually create these nodes. Make sure they don't show up anywhere they shouldn't. There's a lot of Jenkins code which looks up "all the descriptors" of various things (including slave node classes) and we need to ensure that adding this descriptor, especially without a filter, doesn't result in it being offered as an option where it shouldn't be.
I don't know of any exhaustive lists of places where it might be "offered as an option" though.

As for "confirmed it is working" - there's a huge difference between "change X has been confirmed to fix problem Y" and "change X has been confirmed to fix problem Y and not cause any other problems". I don't doubt that you are confident that your changes fix the problem that you encountered in your use-case - I'm more concerned that it might break other things for other people in other use cases.

Out of interest: Why does this have to be done as an Extension point? Doing that means that the descriptor is opened up to being used all over, and then requires additional code to prevent such usage, whereas if it isn't made so public to start with you don't need to restrict its usage.

PS. I would also point out that, if you're reconfiguring Jenkins by code, that process shouldn't remove existing docker transient nodes as a by-product. If this happens then there'll be slaves running that Jenkins doesn't know about (it'll self-repair, but there will be borked builds and wasted container resources until the garbage-collector tidies up the mess).

@robinbanbury
Copy link

@pjdarton I have these changes deployed locally in a Jenkins instance and can confirm they are fixing the issue of ephemeral nodes provisioned by the Docker plugin being disconnected after JCasC config is reloaded. I am also continuing to use the plugin in a regular way (e.g. launching builds, using labels, mounting volumes, observing container cleanup after a build), and my impression is that the plugin is continuing to work as it was previously. Although I respect that I'm not by any means testing all use cases :-)

I'm not too familiar with the inner workings of the Jenkins plugins ecosystem, but if there's anything particular you'd like me to test in my Jenkins instance, I'm more than happy to do that.

@robinbanbury
Copy link

Ah sorry, your response came in just before mine

@jetersen
Copy link
Member Author

jetersen commented Apr 1, 2019

When isInstantiable is false, that is the way to tell Jenkins that the node is not configurable through UI.
I also started down the path of using the AbstractCloudSlave but that required even more code changes, this was the fastest way to solve JCasC being able to manage docker clouds without destroying them.

Please see the JavaDocs https://javadoc.jenkins.io/hudson/slaves/NodeDescriptor.html#isInstantiable--

@pjdarton
Copy link
Member

pjdarton commented Apr 1, 2019

OK. I've looked at things. I'm now (a little) better informed.

If you could just remove the unnecessary blank lines at line 331 and 329 then I'll merge it.

Copy link
Member

@pjdarton pjdarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that'll do.

@pjdarton pjdarton merged commit 9f62994 into jenkinsci:master Apr 1, 2019
@jetersen jetersen deleted the fixMissingDescriptor branch April 1, 2019 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants