You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
because if health_cmd returns a non-zero exit status but nothing on stderr, the process is killed with no explanation. In my mind, that's a significant thing for the healthdog to do, and we'd want to know about it.
Unfortunately, in balenaOS it seems we tend to use balena-healthcheck as the health_cmd, which suppresses stderr with > /dev/null 2>&1. I checked, and via a long convoluted path, balena-engine-containerd-ctr itself does manage to write something useful to stderr, but we choose to suppress it.
I'll attach a support ticket where this led to a long investigation looking for a cause.
I think the resolution is either:
if the expectation is that health_cmd reports failure with a non-zero exit code, as it seems to be, then healthdog should print an error saying the health_cmd failed, and include the err if available. It's relatively easy then to investigate why the health_cmd failed.
if the expectation is that health_cmd also reports the reason for failure, then this should be made explicit, and the stderr suppression in balena-healthcheck should be removed.
The text was updated successfully, but these errors were encountered:
I see that resolving #16 might end up resolving this issue in another way. But I don't know how current that issue is because it says "we don't do anything actively" and I think the line of code I referenced and the one after (process::exit(1);) would qualify for "something".
This line seems problematic:
healthdog-rs/src/main.rs
Line 87 in 04f6180
because if
health_cmd
returns a non-zero exit status but nothing onstderr
, the process is killed with no explanation. In my mind, that's a significant thing for the healthdog to do, and we'd want to know about it.Unfortunately, in balenaOS it seems we tend to use
balena-healthcheck
as thehealth_cmd
, which suppressesstderr
with> /dev/null 2>&1
. I checked, and via a long convoluted path,balena-engine-containerd-ctr
itself does manage to write something useful to stderr, but we choose to suppress it.I'll attach a support ticket where this led to a long investigation looking for a cause.
I think the resolution is either:
health_cmd
reports failure with a non-zero exit code, as it seems to be, then healthdog should print an error saying the health_cmd failed, and include the err if available. It's relatively easy then to investigate why the health_cmd failed.health_cmd
also reports the reason for failure, then this should be made explicit, and the stderr suppression in balena-healthcheck should be removed.The text was updated successfully, but these errors were encountered: