-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: socket hang up #294
Comments
I think this is the reason https://medium.com/dkatalis/eventloop-in-nodejs-macrotasks-and-microtasks-164417e619b9 |
Yes, considering your logs and your scenario, it indeed looks like when the event loop is blocked for too long, the timer that handles an idle socket removal is not working as expected (see clickhouse-js/packages/client-node/src/connection/node_base_connection.ts Lines 521 to 529 in e78b6f0
await sleep(50) helps (probably await sleep(0) would've worked, too).
|
@slvrtrn What helped me was adding this code before the request: // TODO: make macrotask
await new Promise((resolve) => setTimeout(resolve, 0)); |
Side note: you could also try https://clickhouse.com/docs/en/optimize/asynchronous-inserts instead, even without waiting for an ack, instead of batching on the client side. There are examples in the repo as well: https://github.com/ClickHouse/clickhouse-js/blob/main/examples/async_insert_without_waiting.ts
This checks out. Probably makes sense to add it at the start of the |
I tried this way, but this way I think is wrong because I had a request to the server on every call to addToInsertQueue, and the insertion speed on one worker dropped from 800 per second to 300. And also the clickhouse server was overloaded (7% -> 30%, 8cpu). Maybe I did something wrong. I don't have the ability to accumulate records, so with this approach I have to call insert for each event. |
IIRC, Kafka consumers poll multiple messages at a time (and kafkajs has this), depending on the batch size in bytes, of course. So maybe with async_insert the interface could be public async addToInsertQueue(rows: Array<T>) {
//
} instead. Anyways, would you like to open a PR with a zero timeout in request, if it is confirmed to resolve the issue? Otherwise, I can add it later. |
It took me a few days to figure out the cause of this error, for now it will be enough for me to locally apply my variant, but I hope it will be fixed later. By the way, I found out that calling sleep(0) before insert does not help, or rather it does, but not for long. It crashes about once every 30 minutes, instead of crashing once every 5-10 minutes. Only sleep(0) before each addToInsertQueue call saves, otherwise the timeout, judging by the logs, is still not respected, the difference is not 2000 ms, as in the logs above, but within 100ms. |
The fix is included in 1.4.1. |
Thank you! Checked, no more errors like that. |
@slvrtrn As it turns out the error still occurs. But now not a few minutes after startup, but about 1-2 times a day. I dug into the source code of http.Request and http.Agent implementations and I think it is related to calling callback functions on socket (http.Request) and destroy (http.Agent) events through process.nextTick after emit method call. I think you should think how to get rid of setTimeout in request method completely or, more simple solution, add mutex implementation. class Mutex {
constructor() {
this._lastPromise = Promise.resolve();
}
async lock() {
let resolveUnlock;
const unlockPromise = new Promise(resolve => {
resolveUnlock = resolve;
});
const currentPromise = this._lastPromise.then(() => unlockPromise);
this._lastPromise = currentPromise;
await currentPromise;
return resolveUnlock;
}
}
this.mutex = new Mutex();
async function request() {
const unlock = await this.mutex.lock();
await sleep(0);
return new Promise(() => {
// ...
request.on('socket', (socket) => {
// ...
unlock();
});
// ...
});
} |
@uginroot, aside from that, could you try adding backpressure handling to the stream that is provided into the I checked the code from the OP again; it could be that this.batchContext.insertStream.push(row);
this.batchContext.rowsCount++; is called too often, and the event loop is overloaded with various events emitted by that. In that case, Also, is there a good reason why this promise is dangling? void this._commit(); |
I added this code, I'll see if it works or not import { once } from "events";
// ...
while (!this.batchContext.insertStream.push(row)) {
await once(this.batchContext.insertStream, "drain");
} Regarding But I've already checked and made sure that it's not that. The problem is that the I tried to come up with a nice and simple solution without using By the way, if you add your own http.Agent, the problem doesn't appear anymore: http_agent: new http.Agent({
keepAlive: true,
keepAliveMsecs: 1000,
timeout: 1000,
maxSockets: 20,
}), |
while (!this.batchContext.insertStream.push(row)) {
await once(this.batchContext.insertStream, "drain");
} Shouldn't it be if (!this.batchContext.insertStream.push(row)) {
await once(this.batchContext.insertStream, "drain")
} instead? Cause in the docs:
In any case, as I mentioned earlier, you could also try writing an entire batch received from your message broker (see EachBatch) without wrapping essentially every single row in a promise (with your |
I guess we have to use eachBatch to ensure that all events are recorded. |
You mentioned that the issue now happens 1-2 times a day. Maybe it hasn't been triggered yet when using a "custom" agent? Also, this is almost identical to how the internal HTTP agent is instantiated:
this is the default value.
IIRC this essentially calls |
Describe the bug
Similar problem which was sort of solved in version 0.3.0: #150
Error: socket hang up
This problem only appears at very high insertion rates in dozens of tables.
I am reading data from kafka in 10 threads and the addToInsertQueue call is going to different tables depending on the message received from kafka.
If I add await sleep(50) before calling addToInsertQueue, the problem appears much less often about once every 2-3 hours. If there is only one table, the problem appears about once a week.
I specifically use a machine with 2 CPUs and 4GB of memory. The CPU is at 100% on one of the processors, the memory is at 1.7GB.
Steps to reproduce
Expected behaviour
No uncaughtException messages
Code example
Error log
Configuration
Environment
ClickHouse server
The text was updated successfully, but these errors were encountered: