-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add '-p' flag to get signalled when parent dies #114
Conversation
Does this need to be included in Tini to work? In particular, would the following not work:
(where |
Ah, it looks like that might actually be the approach you're following there: util-linux/util-linux@23f54ce (as an aside: sorry about the delay in getting back to you here, I was out on vacation this week) |
No worries, it's fine. The thing is, it's not really possible to implement this as part of another program here in all situations. While it is possible to use the approach I implement in setpriv(1) from util-linux in some situations, it gets unusable as soon as AppArmor or SELinux LSMs get involved. Switching profiles, in this case from setpriv to the executed program, will cause the parent death signal to be reset again when those LSMs are in use. So it's not a full solution to the problem I'm trying to solve (at least not as far as I know). That being said, take this PR with a grain of salt. I could totally relate to you not wanting to accept that PR -- after all, it's debatable whether killing the init process should be part of an init manager itself. I'd obviously be happy to have it accepted as it solves a real issue I have, but yeah. |
So, just to be clear: what I'm suggesting above is that you can set the parent death signal after you drop privileges using setpriv, using a separate program. That program can then spawn Tini. In other words, rather than have Tini set the parent death signal, add another process in your "call chain" that will do it, then E.g. use this: #include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/prctl.h>
int main(int argc, char *argv[]) {
if (prctl(PR_SET_PDEATHSIG, SIGTERM) != 0) {
perror("prctl");
return 1;
}
execvp(argv[1], &argv[1]);
perror("execvp");
return 1;
} Then, if I run:
Then, as expected, if I send a
As long as the death signal is not discarded across the exec boundary from
My rule of thumb when reviewing a PR is generally to consider whether the thing being added can be done without Tini doing it, to ensure we keep Tini lean. For example, I wouldn't add switching users to Tini, because you can fairly easily chain Tini with su-exec, gosu, setpriv, but I did merge process-group-killing, because you can't really do this without doing signal forwarding and orphan reaping. In this particular case... I don't really know the answer yet 😮, as I'm not super familiar with LSMs and when they might reset the parent death signal. If you have any reading to share on this, I'd be interested. |
Ok, so I did test this with AppArmor, and it works as long as I'm a little on the fence as to what the right approach here... My intuition is that this flag is only going to be relevant if you have a LSM (because if you don't have a LSM, you might as well use the program I pasted above), but I'm not entirely certain how complex (or feasible) implementing the AppArmor profile I described above is (or doing the same with SELinux). |
Unfortunately, I can't really share the observation you've made with "inherit execute" retaining the parent death signal. Simplest case with setpriv built from master, which includes my recently accepted "--pdeathsig" option:
I've had the same results with all kinds of different profiles -- inherit execute is trivial to use here, as setpriv always uses the same profile. But still, the pdeathsig got reset. The profile I've basically used (trimmed down a bit to remove all kinds of capabilities, also no abstractions included):
I've also tried to use setpriv's "--apparmor-profile /usr/bin/setpriv" with explicit "change_profile" transitions in different variations, but the pdeathsig always got reset. I can't share any readings, as I didn't find any conclusive documentation on this. Some observations:
|
I think I've found the real issue here why code behaves differently for us. I've been doing these tests as root, which will additionally clear the parent death signal on
So setting up the parent death signal cannot work in case you want to inherit it across an exec as soon as you're running with elevated privileges. |
Thanks for digging into this @pks-t, that's helpful. On the one hand, I feel like the Kernel cleaning out this flag systematically might not be 100% desirable (if your parent has permission to send you a signal, then I'd expect that their death has permission to send you that signal as well), since it essentially means that every program in existence has to have a "set the parent death signal" flag... which means I don't know if this really is something that should be fixed in Tini. On the other hand, I understand there might be reasons why it's that way in the Kernel that I don't understand. Besides, if we add the "set the parent death signal" flag in Tini, then that essentially creates a new use case for Tini: you can use it to set the parent death signal when the Kernel won't let you, even if you're not going to use e.g. the zombie reaping "core" functionality (i.e. what you really want from Tini in this case is setting the parent death signal + forwarding said signal). I'd be curious to better understand your use case here, but I think I'd be fine adding this flag in Tini. I'd like it to be configurable (at a minimum, with a signal number), though, since SIGKILL is probably not super helpful for cleanup (and it can't be forwarded!). I'm happy to actually work on that piece though, but let me know what you think! |
Yeah, I totally get and share your concerns here. The parent death signal is broken for a lot of use cases, with the reason being security. Accepting signals across process hierarchies can be a security risk in some scenarios, e.g. by causing your childs to dump core, having additional side channels across security boundaries, etc. My use case here is rather simple: I use runit to manage system services and want to use PID namespaces to "containerize" all my services. My initial attempt was to simply use "setpriv --fork --pid", but this doesn't really work out in case I want to restart the service. Stopping a service in runit will simply cause runit to send SIGKILL to the immediate child, and as setpriv does not forward that signal to the process group the actual service stays running. So tini does have two purposes here in my use case: first, correctly handle child reaping for services in the PID namespace, and secondly forwarding the signal to the process group when runit sends SIGKILL. So in case you want to accept this PR I'd be happy to further extend it by making the signal configurable, no problem at all. I'd have done so right from the beginning, but I was missing the mapping from signal name to its value (e.g. SIGKILL -> 9) to correctly parse the command line. Sure enough we could ask users to just use the actual value instead of name, but that's an awkward interface. I could use |
Sorry, I obviously mean "nsenter --fork --pid", not "setpriv" |
Oh I absolutely agree, but my comment here is mainly that if the parent can already signal you anyway (which is the reason why Tini would be able to forward a signal to you), then I'm not sure there is anything security gain by preventing the parent's death from signalling you. In case there is actually elevation of privilege happening at
Thanks for this valuable detail!
I don't think that'd work. This returns a description of the signal, not the actual "symbolic" name. For example, I can think we can stick with a basic list of signals here:
|
src/tini.c
Outdated
"SIGWINCH", | ||
"SIGIO", | ||
"SIGPWR", | ||
"SIGSYS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little more verbose... but I'd very much prefer having an array that doesn't actually depend on ordering to be safer. In other words, I'd rather have this be an array of [signame, signum] structs, rather than have the signum be implicitly provided by the string's position in the array.
I've pushed a fixup commit (which should be squashed first, but this makes review easier). We should also understand signal names from POSIX.1-2001, I'd say, where the list already comes nearly complete. As there are only three non-POSIX calls here between 1 and 31, I just opted to have the complete list here as it makes the implementation simpler. I can remove those additional signals though, if you want. |
Thanks @pks-t - I left a comment on there. I understand the simplicity of including all signals... but as commented there, I'm not a huge fan of implicitly providing the signal number through the position in the array — I'd be a little more comfortable with the name / number pairs being structs. Notably, that approach also lets us add aliases, such that we could have both |
Fair enough. Amended to use an array. For "KILL" vs "SIGKILL" -- I actually wouldn't handle that by adding more entries to the array, but we could just skip the "SIG" prefix and compare then. Otherwise, we'd have a minimum of two entries for each signal. |
Fair enough, and thanks! I think that, as is, this will do! We probably want to add tests for this. The test suite is a little messy — would you like me to handle that part? |
Yeah, just wanted to query you about the test. Writing a test is easy enough, but writing the right test here is what I'm not sure about yet. We'd need to spawn tini as child of another process which by itself wouldn't kill tini if it got killed and then test that tini still gets the configured signal. Any ideas? |
The tests right now spawn a bunch of Python processes that do things, I think we could essentially have the test script spawn a few of those like so:
Thoughts? |
(this would happen in https://github.com/krallin/tini/blob/master/test/run_inner_tests.py) |
Ah, I only saw the smoke tests -- the actual tests do in fact look a bit messy. So feel free to pick writing tests up in case you can get something done quickly, otherwise I'll do so. Just let me know. |
Ah, another small request: can you add this flag to the help: Lines 190 to 226 in 5b117de
|
Nevermind, I'm an idiot, you already did! |
We should probably add an example in the help to clarify that we expect e.g. |
What about a simple wording like that:
|
@pks-t I pushed tests to your branch. |
@pks-t yes that's perfect |
Add a new flag '-p', which sets up the parent death signal to `SIGKILL`. This will cause the kernel to send us a `SIGKILL` as soon as the direct parent process dies. This is useful e.g. in combination with unshare(1) from util-linux when using PID namespaces. When unshare forks the child, which is about to become PID 1, killing the unshare parent will not cause the child to exit. When executing the command $ unshare --pid --fork tini -- <prog> then killing unshare will not cause tini to be killed. Since util-linux v2.32, unshare has an option "--kill-child=<SIGNAL>" that will set up the parent death signal for the forked process. This does not help though in case either SELinux or AppArmor are in use and credentials of the forked process change (e.g. by changing its UID), as these LSMs will clear the parent death signal again. The following example would trigger that situation: $ unshare --pid --fork setpriv --reuid user tini -s -- <prog> The parent death signal will get reset by the LSMs as soon as `setpriv` switchets its user ID to that of "user", and killing unshare will again not result in tini being killed. The new '-p' flag helps that exact scenario: $ unshare --pid --fork setpriv --reuid user tini -s -p SIGKILL -- <prog> As soon as unshare is getting killed, tini will get signalled SIGKILL and exit as well, tearing down <prog> with it.
Thanks a lot for these tests! I've added the example and squashed in the fixup commits |
Thanks! Finally, we should update the README. I'll push another commit to your branch if that works? |
Sure, feel free to do so. Just remember to fetch first from my updated branch :) |
@pks-t updated the README. Let me know if you see anything off there (notably, I added you to the contributors list, but let me know if you don't want that). Thanks! |
I'm fine with being listed in the contributors list. The updated README looks fine to me, as well! |
@pks-t I released this today |
Thanks a lot for this discussion and collaboration! |
- Configure pdeathsignal (krallin#114) - Environment variable for -g (krallin#101)
ENV TINI_VERSION v0.17.0 |
Add a new flag '-p', which sets up the parent death signal to
SIGKILL
.This will cause the kernel to send us a
SIGKILL
as soon as the directparent process dies. This is useful e.g. in combination with unshare(1)
from util-linux when using PID namespaces. When unshare forks the child,
which is about to become PID 1, killing the unshare parent will not
cause the child to exit. When executing the command
then killing unshare will not cause tini to be killed. Since util-linux
v2.32, unshare has an option "--kill-child=" that will set up
the parent death signal for the forked process. This does not help
though in case either SELinux or AppArmor are in use and credentials of
the forked process change (e.g. by changing its UID), as these LSMs will
clear the parent death signal again. The following example would trigger
that situation:
The parent death signal will get reset by the LSMs as soon as
setpriv
switchets its user ID to that of "user", and killing unshare will again
not result in tini being killed. The new '-p' flag helps that exact
scenario:
As soon as unshare is getting killed, tini will get signalled SIGKILL
and exit as well, tearing down with it.
For now, this is only implemented as a flag. Intent is to first get to know whether the project is interested in that feature at all. If so and if you think the signal that is being sent by the kernel should be configurable, I can extend it further to accept an optional signal