-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(Jenkins Pipelines) do not allocate agent for "parent" pipelines and retry the packaging process a 2nd time #283
Conversation
Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
If we are able to review this PR, and if it makes sense and is approved, then we should be able to test it for the next weekly release (the 13th of September 2022). |
Ping @timja @MarkEWaite @lemeurherve @NotMyFault I would want to have multiple reviews on this one, for the sake of knowledge sharing, and feeling safer :) For info, I've tested with manual pipeline jobs on release.ci, but only with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect retry syntax.
Jenkinsfile.d/core/package
Outdated
@@ -66,6 +66,7 @@ pipeline { | |||
options { | |||
disableConcurrentBuilds() | |||
buildDiscarder logRotator(numToKeepStr: '15') // Retain only last 15 builds to reduce space requirements | |||
retry(conditions: [agent(), kubernetesAgent(handleNonKubernetes: true), nonresumable()], count: 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retry(conditions: [agent(), kubernetesAgent(handleNonKubernetes: true), nonresumable()], count: 2) | |
retry(conditions: [kubernetesAgent(handleNonKubernetes: true), nonresumable()], count: 2) |
(The whole point of handleNonKubernetes: true
is that you do not also specify agent
.)
Anyway I am not sure if this syntax even works—I have not tried it—and you are working too hard. The tested syntax for Declarative is
agent {
kubernetes {
// …as before
retries 2
}
}
applied above to
release/Jenkinsfile.d/core/package
Lines 2 to 7 in e2368e9
agent { | |
kubernetes { | |
yamlFile 'PodTemplates.d/package-linux.yaml' | |
workingDir '/home/jenkins/agent' | |
} | |
} |
Jenkinsfile.d/core/release
Outdated
@@ -45,6 +45,7 @@ pipeline { | |||
|
|||
options { | |||
disableConcurrentBuilds() | |||
retry(conditions: [agent(), kubernetesAgent(handleNonKubernetes: true), nonresumable()], count: 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
Actually, no, Pipeline CPS code does not consume a JVM thread permanently. (Temporarily uses a JVM thread while running Groovy, then exits it when switching to a step like
This is sort of mangled. The build should survive a controller restart even if you had a superfluous agent. That is a core aspect of Pipeline from its initial design, and applies in particular to K8s agents.
Again this is confused. So while adding
Is there a specific example you have in mind? The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine except for the PR title.
Thanks @jglick for the review and the pointers. I've updated: does it fit to what you suggested? Side note: the |
Honestly, It's rarely the case. I have no idea why, what and how and it feels really complicated (hence my confusion). What we see on this instance (and also on infra.ci), which are mainly using Kubernetes pods, is that when the controller restarts, the builds are stuck and never resume (hence the initial issue). Most of the time, the pod agent is gone (No idea which system removes it: is it Jenkins - through the kubernetes plugin?- Is it a timeout of the PID 1 ? something else? I have no idea how to track this). That's why I felt adding the "retry" here could help to restart the build. But this PR might be dangerous, as you mentionned that the maven deployment should not be retried. Maybe we should accept build failures and restart it manually. What do you think? What direction should we head to? |
I think it simply offers any block-scoped step as an option of the same name. Potentially that could be useful in the case of At any rate, I checked and |
As in, stay in progress indefinitely? Or abort after the 5m timeout applied to a missing agent after controller restart?
No, it should not. There is of course test coverage in
Again, pending jenkinsci/workflow-durable-task-step-plugin#180, only if a controller restart is not involved; and as mentioned, restarting the build (more specifically the
Well the removal of the superfluous |
Interesting. Thanks for your patience and the explanation (and the work involved). Given what you described, I'll update the PR (and its title) to only remove the agent of the parent pipeline and a a "retry" to the packaging pipeline (which is idempotent), but no retry on the release pipeline (as it is NOT idempotent and I does not feel safe to automate resuming it). I'll update the asosciated helpdesk issue to ensure that we will try to willingly restart the controller (e.g. deleting the pod which is expected to send a STOP signal to Jenkins process) during the next weekly release to see what happens exactly (and extract logs). |
Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
WDYT? |
Well, |
Closes jenkins-infra/helpdesk#2925
This PR introduces 2 min changes:
sh
step was started, then it continues the pipeline when resuming.