-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let exit_code be better aligned with C/POSIX #240
Comments
EdSchouten
added a commit
to EdSchouten/remote-apis
that referenced
this issue
Mar 8, 2023
POSIX issue 7, 2018 edition, section 3.303 "Process Termination" states the following: > There are two kinds of process termination: > > 1. Normal termination occurs by a return from main(), when requested > with the exit(), _exit(), or _Exit() functions; or when the last > thread in the process terminates by returning from its start > function, by calling the pthread_exit() function, or through > cancellation. > > 2. Abnormal termination occurs when requested by the abort() function > or when some signals are received. Both these cases can be detected by parent processes separately. Unfortunately, REv2 currently doesn't allow this, as we only provide an exit code. Implementations such as Buildbarn tend to set the exit code for abnormal termination to 128+signum, meaning that if a segmentation fault occurs, an action terminates with exit code 139. This tends to be fairly confusing for non-expert users. The goal of this change is to make all of this less confusing. It adds a new message type named ProcessTerminationResult, which contains the termination result in a more structured way. It can now either be an exit code or a signal number. If it turns out that other operating systems provide even more ways, we can add additional `oneof` cases. For the exit code in case of normal termination I decided to use an int64 instead of int32. The reason being that POSIX allows you to get the full exit code back, using APIs like waitid(). On ILP64 systems, the exit code may thus be 64 bits. For the signal in case of abnormal termination I decided to use a string. The reason being that signal numbers are not covered by POSIX. Everybody always assumes that 9 is SIGKILL, but POSIX only documents this as a convention for the command line flags of the `kill` utility. Not the the actual `SIG*` constants. Fixes: bazelbuild#240
EdSchouten
added a commit
to EdSchouten/remote-apis
that referenced
this issue
Mar 10, 2023
POSIX issue 7, 2018 edition, section 3.303 "Process Termination" states the following: > There are two kinds of process termination: > > 1. Normal termination occurs by a return from main(), when requested > with the exit(), _exit(), or _Exit() functions; or when the last > thread in the process terminates by returning from its start > function, by calling the pthread_exit() function, or through > cancellation. > > 2. Abnormal termination occurs when requested by the abort() function > or when some signals are received. Both these cases can be detected by parent processes separately. Unfortunately, REv2 currently doesn't allow this, as we only provide an exit code. Implementations such as Buildbarn tend to set the exit code for abnormal termination to 128+signum, meaning that if a segmentation fault occurs, an action terminates with exit code 139. This tends to be fairly confusing for non-expert users. The goal of this change is to make all of this less confusing. It adds a new message type named ProcessTerminationResult, which contains the termination result in a more structured way. It can now either be an exit code or a signal number. If it turns out that other operating systems provide even more ways, we can add additional `oneof` cases. For the exit code in case of normal termination I decided to use an int64 instead of int32. The reason being that POSIX allows you to get the full exit code back, using APIs like waitid(). On ILP64 systems, the exit code may thus be 64 bits. For the signal in case of abnormal termination I decided to use a string. The reason being that signal numbers are not covered by POSIX. Everybody always assumes that 9 is SIGKILL, but POSIX only documents this as a convention for the command line flags of the `kill` utility. Not the the actual `SIG*` constants. Adding an enumeration would also be unwise, as operating systems are free to add their own signals that are not covered by POSIX (e.g., SIGINFO on BSD). Fixes: bazelbuild#240
EdSchouten
changed the title
REv3: Let exit_code be better aligned with C/POSIX
Let exit_code be better aligned with C/POSIX
Mar 14, 2023
EdSchouten
added a commit
to EdSchouten/remote-apis
that referenced
this issue
Mar 14, 2023
POSIX issue 7, 2018 edition, section 3.303 "Process Termination" states the following: > There are two kinds of process termination: > > 1. Normal termination occurs by a return from main(), when requested > with the exit(), _exit(), or _Exit() functions; or when the last > thread in the process terminates by returning from its start > function, by calling the pthread_exit() function, or through > cancellation. > > 2. Abnormal termination occurs when requested by the abort() function > or when some signals are received. Both these cases can be detected by parent processes separately. Unfortunately, REv2 currently doesn't allow this, as we only provide an exit code. Implementations such as Buildbarn tend to set the exit code for abnormal termination to 128+signum, meaning that if a segmentation fault occurs, an action terminates with exit code 139. This tends to be fairly confusing for non-expert users. The goal of this change is to make all of this less confusing. It adds a new message type named ProcessTerminationResult, which contains the termination result in a more structured way. It can now either be an exit code or a signal number. If it turns out that other operating systems provide even more ways, we can add additional `oneof` cases. For the exit code in case of normal termination I decided to use an int64 instead of int32. The reason being that POSIX allows you to get the full exit code back, using APIs like waitid(). On ILP64 systems, the exit code may thus be 64 bits. For the signal in case of abnormal termination I decided to use a string. The reason being that signal numbers are not covered by POSIX. Everybody always assumes that 9 is SIGKILL, but POSIX only documents this as a convention for the command line flags of the `kill` utility. Not the the actual `SIG*` constants. Adding an enumeration would also be unwise, as operating systems are free to add their own signals that are not covered by POSIX (e.g., SIGINFO on BSD). Fixes: bazelbuild#240
EdSchouten
added a commit
to EdSchouten/remote-apis
that referenced
this issue
Mar 15, 2023
POSIX issue 7, 2018 edition, section 3.303 "Process Termination" states the following: > There are two kinds of process termination: > > 1. Normal termination occurs by a return from main(), when requested > with the exit(), _exit(), or _Exit() functions; or when the last > thread in the process terminates by returning from its start > function, by calling the pthread_exit() function, or through > cancellation. > > 2. Abnormal termination occurs when requested by the abort() function > or when some signals are received. Both these cases can be detected by parent processes separately. Unfortunately, REv2 currently doesn't allow this, as we only provide an exit code. Implementations such as Buildbarn tend to set the exit code for abnormal termination to 128+signum, meaning that if a segmentation fault occurs, an action terminates with exit code 139. This tends to be fairly confusing for non-expert users. The goal of this change is to make all of this less confusing. It adds a new message type named ProcessTerminationResult, which contains the termination result in a more structured way. It can now either be an exit code or a signal number. If it turns out that other operating systems provide even more ways, we can add additional `oneof` cases. For the exit code in case of normal termination I decided to use an int64 instead of int32. The reason being that POSIX allows you to get the full exit code back, using APIs like waitid(). On ILP64 systems, the exit code may thus be 64 bits. For the signal in case of abnormal termination I decided to use a string. The reason being that signal numbers are not covered by POSIX. Everybody always assumes that 9 is SIGKILL, but POSIX only documents this as a convention for the command line flags of the `kill` utility. Not the the actual `SIG*` constants. Adding an enumeration would also be unwise, as operating systems are free to add their own signals that are not covered by POSIX (e.g., SIGINFO on BSD). Fixes: bazelbuild#240
EdSchouten
added a commit
to EdSchouten/remote-apis
that referenced
this issue
Mar 15, 2023
POSIX issue 7, 2018 edition, section 3.303 "Process Termination" states the following: > There are two kinds of process termination: > > 1. Normal termination occurs by a return from main(), when requested > with the exit(), _exit(), or _Exit() functions; or when the last > thread in the process terminates by returning from its start > function, by calling the pthread_exit() function, or through > cancellation. > > 2. Abnormal termination occurs when requested by the abort() function > or when some signals are received. Both these cases can be detected by parent processes separately. Unfortunately, REv2 currently doesn't allow this, as we only provide an exit code. Implementations such as Buildbarn tend to set the exit code for abnormal termination to 128+signum, meaning that if a segmentation fault occurs, an action terminates with exit code 139. This tends to be fairly confusing for non-expert users. The goal of this change is to make all of this less confusing. It adds a new message type named ProcessTerminationResult, which contains the termination result in a more structured way. It can now either be an exit code or a signal number. If it turns out that other operating systems provide even more ways, we can add additional `oneof` cases. For the exit code in case of normal termination I decided to use an int64 instead of int32. The reason being that POSIX allows you to get the full exit code back, using APIs like waitid(). On ILP64 systems, the exit code may thus be 64 bits. For the signal in case of abnormal termination I decided to use a string. The reason being that signal numbers are not covered by POSIX. Everybody always assumes that 9 is SIGKILL, but POSIX only documents this as a convention for the command line flags of the `kill` utility. Not the the actual `SIG*` constants. Adding an enumeration would also be unwise, as operating systems are free to add their own signals that are not covered by POSIX (e.g., SIGINFO on BSD). Fixes: bazelbuild#240
EdSchouten
added a commit
to EdSchouten/bazel
that referenced
this issue
Jul 1, 2023
On POSIX-like systems, processes may either terminate normally with an integer exit code. The exit code may span the full range of int, even though all but the waitid() function retur the bottom eight bits. In addition to that, processes may terminate abnormally due to a signal (SIGABRT, SIGSEGV, etc.) POSIX shells (sh, bash, etc.) are more restrictive, in that they can only return exit codes between 0 and 126. 127 is used to denote that the executable cannot be found. Exit codes above 128 indicate that the process terminated due to a signal. Right now we let test-setup.sh terminate using the exit code obtained using $?. This means that if a program terminates due to SIGABRT, test-setup.sh terminates with exit code 128+6=134. This causes us to lose some information, as the (remote) execution environment now only sees plain exit codes. This change extends test-setup.sh to check for exit codes above 128. In that case it will send a signal to itself, so that the original signal condition is raised once again. See also: bazelbuild/remote-apis#240
copybara-service bot
pushed a commit
to bazelbuild/bazel
that referenced
this issue
Jul 13, 2023
On POSIX-like systems, processes may either terminate normally with an integer exit code. The exit code may span the full range of int, even though all but the waitid() function retur the bottom eight bits. In addition to that, processes may terminate abnormally due to a signal (SIGABRT, SIGSEGV, etc.) POSIX shells (sh, bash, etc.) are more restrictive, in that they can only return exit codes between 0 and 126. 127 is used to denote that the executable cannot be found. Exit codes above 128 indicate that the process terminated due to a signal. Right now we let test-setup.sh terminate using the exit code obtained using $?. This means that if a program terminates due to SIGABRT, test-setup.sh terminates with exit code 128+6=134. This causes us to lose some information, as the (remote) execution environment now only sees plain exit codes. This change extends test-setup.sh to check for exit codes above 128. In that case it will send a signal to itself, so that the original signal condition is raised once again. See also: bazelbuild/remote-apis#240 Closes #18827. PiperOrigin-RevId: 547773406 Change-Id: Ia29a6ea1eefdb8caa5624a799755b420a61478a0
iancha1992
pushed a commit
to iancha1992/bazel
that referenced
this issue
Jul 13, 2023
On POSIX-like systems, processes may either terminate normally with an integer exit code. The exit code may span the full range of int, even though all but the waitid() function retur the bottom eight bits. In addition to that, processes may terminate abnormally due to a signal (SIGABRT, SIGSEGV, etc.) POSIX shells (sh, bash, etc.) are more restrictive, in that they can only return exit codes between 0 and 126. 127 is used to denote that the executable cannot be found. Exit codes above 128 indicate that the process terminated due to a signal. Right now we let test-setup.sh terminate using the exit code obtained using $?. This means that if a program terminates due to SIGABRT, test-setup.sh terminates with exit code 128+6=134. This causes us to lose some information, as the (remote) execution environment now only sees plain exit codes. This change extends test-setup.sh to check for exit codes above 128. In that case it will send a signal to itself, so that the original signal condition is raised once again. See also: bazelbuild/remote-apis#240 Closes bazelbuild#18827. PiperOrigin-RevId: 547773406 Change-Id: Ia29a6ea1eefdb8caa5624a799755b420a61478a0
iancha1992
added a commit
to bazelbuild/bazel
that referenced
this issue
Jul 13, 2023
On POSIX-like systems, processes may either terminate normally with an integer exit code. The exit code may span the full range of int, even though all but the waitid() function retur the bottom eight bits. In addition to that, processes may terminate abnormally due to a signal (SIGABRT, SIGSEGV, etc.) POSIX shells (sh, bash, etc.) are more restrictive, in that they can only return exit codes between 0 and 126. 127 is used to denote that the executable cannot be found. Exit codes above 128 indicate that the process terminated due to a signal. Right now we let test-setup.sh terminate using the exit code obtained using $?. This means that if a program terminates due to SIGABRT, test-setup.sh terminates with exit code 128+6=134. This causes us to lose some information, as the (remote) execution environment now only sees plain exit codes. This change extends test-setup.sh to check for exit codes above 128. In that case it will send a signal to itself, so that the original signal condition is raised once again. See also: bazelbuild/remote-apis#240 Closes #18827. PiperOrigin-RevId: 547773406 Change-Id: Ia29a6ea1eefdb8caa5624a799755b420a61478a0 Co-authored-by: Ed Schouten <eschouten@apple.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am glad that REv2 already uses
int32
as its data type forexit_code
. Even though most UNIX systems historically only had 8-bit exit codes, modern versions of POSIX provide interfaces for obtaining the full integer value. Using functions likewaitid()
, you can obtain asiginfo_t
whosesi_status
field contains the full value.Where we do deviate from C/POSIX is that there are no facilities for indicating that a process did not terminate by calling
_Exit()
, but through a signal. This leads to confusing situations where people wonder why their build actions terminate with exit code 139, while in reality it terminated throughSIGSEGV
(signal 11, giving exit code 128+11=139). Even worse are modern versions of Go'sos/exec
package, which will always return an exit code of -1 in case of signal delivery. This means that users can't even figure out why their actions terminated.One solution to this would be to change the existing field:
// The exit code of the command. int32 exit_code = 4;
To something like this:
Even though virtually all operating systems use 9 for
SIGKILL
and 15 forSIGTERM
, C/POSIX make no such requirement. The only place where that does happen, is in the definition of thekill
utility, where as part of the X/Open System Interfaces (XSI) they provide a hardcoded list of numerical command line options.My recommendation would thus be to use this instead:
Unfortunately it's not safe for us to add a
oneof
to the protocol retroactively, as it would causeexit_code
to be zero whensignal_name
is set. Maybe we can already add this to REv2 in the form of a separate field, with the remark thatexit_code
is set to a non-zero value ifsignal_name
is set...The text was updated successfully, but these errors were encountered: