Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A novel approach to trimming Jenkins nodes by Node.js version #1153

Closed
rvagg opened this issue Feb 27, 2018 · 8 comments
Closed

A novel approach to trimming Jenkins nodes by Node.js version #1153

rvagg opened this issue Feb 27, 2018 · 8 comments

Comments

@rvagg
Copy link
Member

rvagg commented Feb 27, 2018

For discussion, although I have a working implementing of the approach I'm detailing below on ci-release atm.

We have 2 different strategies for doing different things in Jenkins according to Node.js version:

  1. Detecting NODE_MAJOR_VERSION via some Bash scriptery inside a parent job and bubbling it down via environment variables (via env.properties). You can see this in node-test-commit and then it's used in the 3 jobs that are used to compile and test for Windows (node-test-binary-windows, node-compile-windows, node-test-commit-windows-fanned) to select amongst Windows builders.
  2. Hacky scripting in Bash and BAT. iojs+release on ci-release is actually pretty entertaining if you look at the many different ways that all of us with access have gone in and added rules that restrict jobs on particular hosts. This is getting more and more complicated.

Our rules are most strict on ci-release where it's critical we get the right builders for a release:

  • CentOS 5 for Node 4, CentOS 6 above that for Linux
  • No PPC BE after Node 8.
  • s390x and AIX only for Node 6 and later
  • SmartOS 14 before Node 8, Smart OS 15 beyond that (we also have SmartOS 13 still in there for <1 ...)
  • MSVC2013 before Node 6. MSVC2015 from 6 up to 10. MSVC2017 from 10 onward

Node 10 is going to make it more complicated. I think for instance we're going to have to introduce new rules for ARMv6 & ARMv7 (and possibly new machines) because we're still using Wheezy, plus I think CentOS 7 might be our new target builder for Node 10+ and probably a new SmartOS too (this discussion is for a new issue).

There are drawbacks to both approaches we use:

  1. To get a correct NODE_MAJOR_VERSION you need to run a parent job to inspect the source, or you have to manually enter it in the job (it's currently set as default 10 and that needs to be incremented each time we make a major release but, for example, if you want to test Node 8 and you don't select it then you'd get the wrong compiler and you may not even know. We've also kept ci-release pretty simple and only have a single job, with no hierarchy, so we can't use this approach unless we want to start splitting it up (my preference is to KISS but it's getting hard).
  2. With our scripting approach in iojs+release, these scripts are run on all machines in the release pool, so they have to check out Node and check the source and exit prematurely if they aren't supposed to run. This can be annoying if you have a flaky machine or a machine offline that you don't even need (CentOS 5 is hardly ever used there but it's also one of the most problematic machines to keep happy).

So, I did some experimenting this weekend with a new approach on ci-release that might scale across ci too, although I admit that it's still complicated but perhaps we have to put up with that.

Jenkins plugin to detect version

Yeah .. I went down that route. A very simple plugin that hooks into the Git plugin and performs some work upon checkout. You opt-in to the functionality with a tickbox on the Git setup for a job. If you opt-in, then it will look for src/node_version.h in the current workspace, if there it'll parse it for the 3 semver elements and then use them to insert two parameters into the current build: NODEJS_VERSION—the full version (minus any tags) and NODEJS_MAJOR_VERSION—just the major number, which is what we'll make most use of.

Once we have parameters we can start to use them in our matrix job filtering. I have this installed on ci-release to try it out and I've also installed the Matrix Groovy Execution Strategy Plugin which lets us enter more sophisticated Groovy scripts into the matrix combination filtering. I think the same thing could be achieved with the basic "Combination filter" option but I haven't had as much success and the box is just an input field rather than a textbox so it gets out of hand really quickly.

By filtering execution at this level, the builders themselves don't even get invoked if we don't need them to be (similar to approach 1 above, but self-contained within a job). Plus, our rules for what gets to run is all contained in a single script in a single language that's pretty easy to grok and edit (unlike approach 2 now which spans Bash and BAT and even in Bash we have maybe 3 different approaches to achieving the same thing).

Here's the Groovy script I have for iojs+release at the moment that does what the existing per-builder scripting does (I haven't removed that second layer of scripting but if this is correct then they'll never short-cut a build because it shouldn't be running).

def canBuild(nodeMajorVersion, builderLabel) {

  // Linux
  if (builderLabel.indexOf('centos5') == 0 && nodeMajorVersion >= 8)
    return false
  if (builderLabel.indexOf('centos6') == 0 && nodeMajorVersion < 8)
    return false

  // Windows
  if (builderLabel.indexOf('vs2013-') == 0 && nodeMajorVersion >= 6)
    return false
  if (builderLabel.indexOf('vs2015-') == 0 && (nodeMajorVersion < 6 || nodeMajorVersion >= 10))
    return false
  if (builderLabel.indexOf('vs2017-') == 0 && nodeMajorVersion < 10)
    return false

  // SmartOS
  if (builderLabel.indexOf('smartos13') == 0) // Node.js 0.x
    return false
  if (builderLabel.indexOf('smartos14') == 0 && nodeMajorVersion >= 8)
    return false
  if (builderLabel.indexOf('smartos15') == 0 && nodeMajorVersion < 8)
    return false


  // PPC BE
  if (builderLabel.indexOf('ppcbe-ubuntu') == 0 && nodeMajorVersion >= 8)
    return false

  // s390x
  if (builderLabel.indexOf('s390x') > -1 && nodeMajorVersion < 6)
    return false

  // aix61
  if (builderLabel.indexOf('aix61') > -1 && nodeMajorVersion < 6)
  	return false

  return true
}

// setup for execution of the above rules

int nodeMajorVersion = -1
if (parameters['NODEJS_MAJOR_VERSION'])
  nodeMajorVersion = parameters['NODEJS_MAJOR_VERSION'].toString().toInteger()
println 'Node.js major version: ' + nodeMajorVersion
println 'Node.js version: ' + parameters['NODEJS_VERSION']

result['nodes'] = []

combinations.each{
  def builderLabel = it.nodes
  if (nodeMajorVersion >= 4) {
    if (!canBuild(nodeMajorVersion, builderLabel)) {
      println 'Skipping ' + builderLabel + ' for Node.js ' + nodeMajorVersion
      return
    }
  }
  result['nodes'] << it
}

[result, true]

When you build off Node master (10) you end up with this kind of thing:

screenshot 2018-02-27 13 17 43

(the lighter shaded icons indicate that they weren't invoked)

Now, one drawback of this approach is that it's done per-job. If we include this logic in ci.nodejs.org then a script would have to be introduced wherever we need to do filtering. That means the script only needs to focus on the builders involved in that job (linux only, windows only, etc.) but it also means the logic is spread out and will end up being duplicated because we have so many copied jobs. It may be possible to have the above script done in a single location but it would take another plugin, or modification of that Groovy Matrix plugin (and some knowledge about Jenkins plugins that I don't have atm) to make it happen and we'd also have to make sure that it properly accounts for all of the builder labels we have and that they are used consistently across the jobs that make use of this.

/cc @joaocgreis @mhdawson because you have both been involved in hacking together our current approaches.

@richardlau
Copy link
Member

Also cc @gibfahn, this is a similar problem to the one we're attempting to solve internally in IBM with pipelines.

@mhdawson
Copy link
Member

Having more logic hidden inside of jenkins (and duplicacated) is a minus for me, but I do like that it would avoid having to run the job at and it will be more obvious when we get it wrong (as we'll see a shaded entry as opposed to a green due to pre-mature exist). Is there any way to pull the script that does the filtering from the repo as opposed to having to configure in jenkins itself ?

@joaocgreis
Copy link
Member

This is great work @rvagg, this is a massive improvement over what was there before.

First, we actually have 3 strategies for doing different things in Jenkins. The 2 mentioned above, but there is also a NODES_SUBSET variable lingering in a lot of places that was used before node 4 to differenciate 0.10, 0.12, io.js and doc-only.

One way to think of this is that (ignoring pipelines for now) we have 2 ways of structuring jobs in Jenkins:

  • The multi-job approach: A job (Jenkins multi-job type) spawns child jobs (that can be any type, including multi-job). This creates a tree-like structure. Examples are node-test-pull-request and node-test-commit. Pros: organization (better for complex jobs), easy to run only a sub-set. Cons: configuration spread in many jobs, so it's harder for us to maintain.
  • The multi-configuration approach: A job (multi-configuration type) that runs the same (or similar) scripts on multiple workers. This creates a list of workers where it executed. Examples are iojs+release and node-stress-single-test. Pros: simple (better for simpler jobs), easier to maintain. Cons: hard to start a sub-set o workers (node-stress-single-test has a mechanism for only one, but nothing for groups).

The approach mentioned here with the plugin allows to have the same mechanism for both types, which is great. Until now we had NODES_SUBSET and NODE_MAJOR_VERSION for multi-job jobs (with some scripts occasionally in the leaves), but could only use scripts for multi-configuration jobs. I've been pushing for NODE_MAJOR_VERSION but have to admit it's a problem if set wrong, and this plugin takes care of it.

About having the logic hidden in Jenkins is something that we can and should improve, but it's no worse than it is today (have a look at the combination filter for node-test-binary-windows for some fun, I didn't know Matrix Groovy Execution Strategy existed - I'll get it ported). So, big +1 for getting this into the test CI and start porting the logic we have on the other mechanisms.

@gibfahn
Copy link
Member

gibfahn commented Feb 28, 2018

Do you have a link to your plugin? Would be interesting to see how much extra you have to add to make it a Jenkins plugin.

Long-term I think using a Jenkins pipeline is a better way to go if it's not more difficult, simply because it's less clicking around in the UI. I have a pipeline script somewhere that takes an array of labels to run on and runs a job on each, it would be simple enough to hook this groovy function up to something like that.

I had a play with implementing node-test-pull-request (as a good first step), see #1156.

Either way it's not an issue, they're both groovy scripting languages, so there's no reason we shouldn't start with a Jenkins plugin and then move the script into a pipeline if it makes sense.

@rvagg
Copy link
Member Author

rvagg commented Feb 28, 2018

re pipelines .. I remain unconvinced, although I feel like a laggard with everyone saying that they are the way of the future with Jenkins. I still find them opaque and not user-friendly compared to what we have. I was hoping that the folks working on pipelines within our CI would be building a case for them as a solid replacement that will improve our lives as administrators while also hopefully improving the insight for users. Perhaps that case is coming?

Anyway, @mhdawson re your point about hiding even more stuff in Jenkins, this is a real problem. So with that in mind, since we're dealing with a full Groovy sandbox I've replaced the entire script with this:

def scriptUrl = 'https://raw.githubusercontent.com/nodejs/build/rvagg/VersionSelectorScript.groovy/jenkins/scripts/VersionSelectorScript.groovy'

def code = new URL(scriptUrl).getText()
println "Got matrix selection script from ${scriptUrl}:"
println "-------------------------------------------------------------------"
println code
println "-------------------------------------------------------------------"
evaluate(code)

return [ result, true ]

and have put the script in our repo in the jenkins/script directory (on a branch for now): https://github.com/nodejs/build/blob/rvagg/VersionSelectorScript.groovy/jenkins/scripts/VersionSelectorScript.groovy

That works nicely!

Regarding the plugin, I was holding off putting code up for it in case this idea got shot down quickly. If I set up a new repo under my own GitHub profile then we have to jump through hoops to get it transferred into the org so I wanted to jump straight to setting up a project in the nodejs org. I'll do that now so you can review it.

@rvagg
Copy link
Member Author

rvagg commented Feb 28, 2018

for your consideration: https://github.com/nodejs/node-version-jenkins-plugin

@mhdawson
Copy link
Member

mhdawson commented Mar 1, 2018

Having it in a script in the build repo and pulling it/running from their is a great improvement ! Sounds though that you already have a plug in that will do the same thing as well. Either option seems good as we won't have to duplicate the logic across each job.

@rvagg
Copy link
Member Author

rvagg commented Apr 6, 2018

I pushed an update to node-version-jenkins-plugin that fixes it so it works on ci.nodejs.org and can run remotely on the jenkins-workspace (and other) machines, it's simpler in ci-release so I didn't catch that bug.

It's now installed in ci.nodejs.org and I'm moving this discussion to #1217 because I've started implementing it (doesn't mean that we have to embrace it fully, we could roll it back, but I quite like it!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants