-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetch
can make worker_threads
stuck and process hang
#2026
Comments
Looks like the same minimal reproduction gets stuck even with Node's |
worker_threads
can get stuck and make process hangfetch
can make worker_threads
stuck and process hang
This comment was marked as outdated.
This comment was marked as outdated.
Yep. I can confirm this happens. |
Can you repro with just undici.request? Is this a fetch specific issue? |
https://www.npmjs.com/package/why-is-node-running
|
@jasnell How is |
@KhafraDev any ideas? |
|
And now I can't reproduce it. 😢 |
I was hoping this wouldn't come up 😐. I've had this issue in the past, no idea how to debug it or even cause it. It might be an issue with web readable streams (ie. something like nodejs/node#44985?), but I'm unsure. If it is an issue with zlib, that'd make sense at least. request doesn't decompress the body so you'd only experience it in fetch (such as this issue). |
This reproduction case is a bit flaky. You may need to add more parallel requests or increase the Are there any work-arounds available for now? How could we make the main thread force the workers to terminate when |
It should be noted that the worker.terminate docs explicitly say that it will "[s]top all JavaScript execution in the worker thread as soon as possible". There's no guarantee that it'll actually close the worker if there is a resource (seemingly a stream from the debugging above) still ref'd. Do you notice this issue if you run tests sequentially? We run hundreds of fetch tests in workers w/o issue, but one at a time. We noticed some issues with sockets staying open (not because of fetch) when running in parallel, but this could be a similar issue? |
Yes, Also when I replace
The issue comes up even when only a single worker is used. But if worker fires
Here is failing test case: #2032 |
That test case isn't failing on my machine (both windows and ubuntu, using node v19.8.1). |
Sure, the point was that it fails the CI: https://github.com/nodejs/undici/actions/runs/4551655590/jobs/8026203674#step:14:12832. CI should be used as reference instead of developer machines. It's much easier to reproduce the issue constantly that way. |
Failing, yes, but it's hard to fix an issue that can't be debugged locally. Decompressing the body has a... pretty large number of issues that I can see. I wouldn't be surprised if If we take a look at https://github.com/node-fetch/node-fetch/blob/e87b093fd678a9ea39c5b17b2a1bdfc4691eedc7/src/index.js#L277-L366 (which is how node-fetch decompresses the body), it's more or less identical to what undici is doing in this instance. I'm unsure as to what exactly is causing it to hang here, and not node-fetch, unless of course it's not an issue with zlib, but something else? |
I've been experiencing this as well, random hangs that only reproduce on Linux and never on macOS. The one time where I oberved it interactively in a Vagrant VM, I observed 200% CPU usage and the process was completely stuck somewhere and did not even accept SIGINT via keyboard. There were no brotli-encoded responses involved, all responses were uncompressed and were coming from local node.js server. |
I've been trying to reproduce this issue without /*
curl --raw -H 'Accept-Encoding: br' https://api.spacexdata.com/v4/starlink > raw-response
*/
import { createBrotliDecompress } from "node:zlib";
import { pipeline } from "node:stream";
import { createReadStream } from "node:fs";
import { isMainThread, parentPort, Worker } from "node:worker_threads";
import { fileURLToPath } from "node:url";
import { cpus } from "node:os";
const THREADS = cpus().length - 1;
const ROUNDS = 100;
if (isMainThread) {
const timeout = setTimeout(() => {
console.log("Forcing process.exit on main thread");
process.exit();
}, 60_000);
const task = async () => {
const worker = new Worker(fileURLToPath(import.meta.url), {});
const [promise, resolve] = getPromise();
worker.on("message", (m) => m === "DONE" && resolve());
await promise;
const timer = setTimeout(() => {
console.log("Unable to terminate");
}, 10_000);
await worker.terminate();
clearTimeout(timer);
};
const pool = Array(ROUNDS).fill(task);
async function execute() {
const task = pool.shift();
if (task) {
await task();
return execute();
}
}
await Promise.all(
Array(THREADS)
.fill(execute)
.map((task) => task())
);
console.log("All done!");
clearTimeout(timeout);
} else {
const pool = Array(10).fill(decompress);
await Promise.all(pool.map((task) => task()));
parentPort.postMessage("DONE");
}
async function decompress() {
const [promise, resolve, reject] = getPromise();
const output = pipeline(
createReadStream("./raw-response"),
createBrotliDecompress(),
(err) => err && reject(err)
);
const chunks = [];
output.on("readable", () => {
let chunk;
while (null !== (chunk = output.read())) {
chunks.push(chunk);
}
});
output.on("end", resolve);
output.on("error", reject);
await promise;
const data = chunks.join("");
const json = JSON.parse(data);
console.assert(json != null, `JSON unexpcted: ${JSON.stringify(json)}`);
}
function getPromise() {
let resolve, reject;
const promise = new Promise((...args) => {
[resolve, reject] = args;
});
return [promise, resolve, reject];
}
I'm happy to help by providing debugging logs or similar if there is anything you'd like to see. I'm able to reproduce the original issue constantly on my local machine. I've been trying to look for any hanging HTTP sockets with |
vitest-dev/vitest#2008 (comment) may be relevant as well where I managed to attach a chrome debugger via Would definitely help if we can get similar traces via |
I wonder if this is the same of nodejs/node#47228 |
I'm not sure. I did observe this with node 18. Also, that hang is with idle cpu, this one here was with pegged cpu. |
I don't think so. That issue mentions that it does not reproduce on Node 18 while this one does. Also by passing |
Can confirm that this issue occurs on both Node v17.9.0 and v19.8.1 using undici v5.21.2, and not just with
import { Worker } from "worker_threads";
import path from "path";
import { fileURLToPath } from "url";
new Worker(path.join(path.dirname(fileURLToPath(import.meta.url)), "./worker.js"));
import { request } from "undici";
request("https://media.tenor.com/qy_WcGdRzfgAAAAC/xluna-high-five.gif").then(res => res.body.arrayBuffer()).then(buf => Buffer.from(buf))
.then(() => ({
buffer: Buffer.alloc(10),
fileExtension: "bin"
}))
.then(() => {
process.exit();
}); |
Now removing import { request } from "undici";
request("https://media.tenor.com/qy_WcGdRzfgAAAAC/xluna-high-five.gif").then(res => res.body.arrayBuffer()).then(buf => Buffer.from(buf))
.then(() => ({
buffer: Buffer.alloc(10),
fileExtension: "bin"
}))
.then(() => {
//// comment out process.exit and the worker would not exit anymore
// process.exit();
}); I think this might be a bug in Node.js after all. @addaleax, have you got an idea why Even worse, adding |
What's the simplest way to override all global reference to Presumably, if we can do that, then we should be able avoid the bug. It seems we are hit pretty hard by this with every couple of test runs flaking as a result (vitest-dev/vitest#4415). |
That issue seems unrelated, undici does not allocate any |
We ended up removing all use of |
Yep, running into this now with import fetch from "node-fetch"
globalThis.fetch = fetch Issue is now resolved. |
Hi @AriPerkkio, |
Great work @tsctx and @jasnell (from nodejs/node#51255, 👋)! This issue seems to be fixed. I ran these tests using
Looks like the fix was orignally included in This also seems to fix two other
Let's close this issue as fix on Node's side has landed and root cause was not in Undici. |
Bug Description
When
undici
is run insideworker_threads
it may prevent the worker from terminating and makes the whole process hang.Reproducible By
This Node script runs
undici
in worker threads and does 4x requests there before exiting. If this does not reproduce on your fast environment, try increasing the requests andconst ROUNDS
.Expected Behavior
Package should be safe to run inside
worker_threads
. If it's not safe, it should detect such environment and throw an error.Logs & Screenshots
Environment
MacOS 13.2.1
Node 18.14.0
Additional context
vitest-dev/vitest#3077
The text was updated successfully, but these errors were encountered: