-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Node.js process hangs on SIGTERM if a DNS resolution is in progress #25349
Comments
Everything should be interruptible from the signal, however some system calls get automatically restarted, but poll() is not one of them http://man7.org/linux/man-pages/man7/signal.7.html (scroll down to (Interruption of system calls and library functions by signal handlers) . The node SIGTERM handler just calls the default handler so I'm tempted to say this is a Linux issue, but i have to dig a bit deeper to be sure |
No one else is. You can write a simple 10 line example like I have shown above and node.js will hang. Having said that, I can try this on a Windows/Mac and see what the behaviour is on those operating systems. |
Whoops, my bad, I didnt see that you had installed the signal handlers that changes things. |
As a work around, replacing |
Ok, so here is what is happening. |
To add some more specific detail as I understand this:
so we expect that if any of these are running when process.exit(1) are called libuv will wait until the threads finish their work.
|
Discussion on libuv side is being continuted in libuv/libuv#203 |
Is there any further update on this? libuv guys are thinking of adding a thread pool to solve this problem. That will probably be done in future. Is there a short term workaround? |
The only simple workaround I can think of is calling process.abort rather than process.exit |
I'm not sure this is the right place for the comment as the issue is in the archive. There's a corresponding issue in libuv, but I think it could be at least mitigated on the node side. Adding a clause (footnote?) about the possibility of this issue to process.exit docs would help a lot. I've ran into this issue on the node side (4.x) and it's extremely alarming where we can somewhat consistently reproduce process NOT exiting after calling There's obviously little one can reliably do when a thread is not willing to terminate after told to do so. However from the whole system perspective, I'd prefer an explicit kill timeout (say, 5 sec). Node would set a 5 second timer that would
This way it would't go unnoticed, but at the same time it wouldn't create a zombie node process. This could certainly be done in a package as a native addon. |
I am using node 10.38 on Linux (Ubuntu 4.10, FC20, etc...).
I have some code in startup which looks like:
process.on('SIGTERM', function() {
process.exit(1);
});
process.on('SIGINT', function() {
process.exit(1);
});
Somewhere else in the process, I have code like this:
dns.lookup("somehostname", function(err, addresses, family) {
// do something
});
Many times, if you send SIGTERM to the process, node will not quit. It will hang for as long as it takes to resolve DNS. Sometimes, if DNS server does not respond, it can take up to 5 minutes to quit. If you take a GDB stack trace at this time, you see a stack trace like this. If you attach a gdb debugger, you will see that it is stuck in trying to resolve the hostname we are trying to resolve.
I would have thought that gethostbyname can be interrupted by signals. Can someone shed some insight into it?
Thread 3 (process 18074):
#0 0x00007fabac3bed26 in poll () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fababcdce90 in __libc_res_nsend () from /lib64/libresolv.so.2
No symbol table info available.
#2 0x00007fababcdbcb6 in __libc_res_nquery () from /lib64/libresolv.so.2
No symbol table info available.
#3 0x00007fababcdbf27 in __libc_res_nquerydomain () from /lib64/libresolv.so.2
No symbol table info available.
#4 0x00007fababcdc14b in __libc_res_nsearch () from /lib64/libresolv.so.2
No symbol table info available.
#5 0x00007fababeeb8ef in _nss_dns_gethostbyname3_r () from
/lib64/libnss_dns.so.2
No symbol table info available.
#6 0x00007fababeebb64 in _nss_dns_gethostbyname2_r () from
/lib64/libnss_dns.so.2
No symbol table info available.
#7 0x00007fabac3b02bf in gaih_inet () from /lib64/libc.so.6
No symbol table info available.
#8 0x00007fabac3b178e in getaddrinfo () from /lib64/libc.so.6
No symbol table info available.
#9 0x0000000000a0cbb2 in uv_getaddrinfo ()
No symbol table info available.
#10 0x0000000000a127c4 in uv_queue_work ()
No symbol table info available.
#11 0x0000000000a08462 in uv_thread_create ()
No symbol table info available.
The text was updated successfully, but these errors were encountered: