-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runner update end fails #33
Comments
This might be related to the action new release today? |
Yes, it is. The question is if the container should be able to self update or not. |
Thank you for using and reporting! |
When I looked into the code, I couldn't find a way to stop self-updating. I've created the following Issue. I'll also look into how to make self-updating a success. |
@jorge07 Btw, does the runner pod really stop forever on this error? I thought it would just exit with |
It stops consuming the queue, no jobs are taken. |
@summerwind, what do you think of running runner as a service in the container? There is a couple of options for doing it. I'm using https://github.com/gdraheim/docker-systemctl-replacement in my docker images |
@aweris We discussed and experimented about that in #40. I don't understand every detail and implication of the work, but I had a few questions:
WDYT? |
After reading actions/runner#246, I got to think that it isn't always a good idea to disable the self-update mechanism. Maybe it's true that we need a kind of real init system(systemd et al.), that hopefully addresses my above points without too much hack. |
I checked the PR; unfortunately, it's not possible to use the original systemd in docker without hacking. So that's why I used https://github.com/gdraheim/docker-systemctl-replacement as a replacement. It's a simple python script that replicates the basic functionality of the systemd. When you configure runner as a service, runner creates a unit file, and I used that unit file as a template.
It's a standard unit file. You can use
This is one of the main issues I couldn't address properly. I'm using a custom bash script. It starts service and watching its status.
I'm tracking log files under the What I do is not an ideal solution, but it's working in production for the last 5 months without any problem. This is my hack to the problem. |
@mumoshu As I said mimic systemctl in container solving the update problem but it introduces additional complexity. I am currently looking for other solutions like |
I think it might be a good idea to launch a Runner as a service. I actually tried using systemd to run The problem is that since systemd manages the state of the Runner, it's hard to know when the Controller should recreate the Pod. Currently, the Controller has to recreate the Pod every time Runner runs a job, so the Controller needs to know the status of Runner. I might be able to use something like Readiness Probe to figure out when Controller should recreate the Pod, but I haven't tried that method yet. |
@mumoshu, @summerwind Just debugged update process and it seems problem is generated update script Then I used a small go app instead of a bash script for running runner and it worked perfectly since it's waiting for exit code in the background. package main
import (
"context"
"log"
"os"
"os/exec"
)
func main() {
ctx := context.Background()
cmd := exec.CommandContext(ctx, "./run.sh", "--once")
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err := cmd.Run()
if err != nil {
log.Fatal(err)
}
} I hope it helps 😄 |
@aweris Thanks for your support! I'm still trying to understand what you said. Does it mean your primary runner process should firstly run In other words, we already delegate things to |
Let me explain this way—the main difference between This post also used sleep to prevent container exit in self-update.
I think you can try starting
Yes, technically, when the runner process finishes without any error, it's returning exit code |
@aweris Thanks for the info! It did help me understand the problem. Today I managed to take some time to reproduce this on my machine, and realized that this might have been resolved in upstream. Our
Probably we can safely close this as resolved now? Could anyone confirm? |
Well, run.sh seems to be unchanged for months. So is it that run.sh isn't working as intended? I'm pretty confused. https://github.com/actions/runner/blob/master/src/Misc/layoutroot/run.sh |
Back the original errror reported by @jorge07, and its preceding log message "Runner will exit shortly for update, should back online within 10 seconds.", can it be that in some env |
I have the same issue. I'm using a custom image built on top of the base one, and every time a new version of the base image is released, runners try to auto-update and then exit with an error.
There are some possible solutions proposed above:
I'll try to integrate one of them and open a PR. |
I've been experiencing similar issues - also using a custom image and also having to update manually every time there's a new release. In my case however, i've been making other breaking changes so it took me a while to pin it down to something wrong with the runner. It would be fantastic if this was fixed, as it defeats the purpose of self hosted runners (having to manually tend to them after updates). This one actions/runner#484 (comment) seems to be the best way to me. |
@igorbrigadir thanks a ton for entrypoint hint, implemented here #99 . Worked like a charm. |
This is what was happening with me too. I found what @igorbrigadir posted and it's never failed since. |
It seems to have worked fine since then. Closing as resolved. Thanks everyone for your help! |
Problem
Runner update fails and stop working
Logs
The text was updated successfully, but these errors were encountered: