-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket hang up after 2nd query #150
Comments
What is the environment where the code is executed and the Node.js version? |
Sure Clickhouse server 21.7.4.18 in createClient use only host username password database from .env |
We also try to understand the cause of ECONNRESET we see randomly |
Nope. Just remote clickhouse server, every 10 min by crontab start node script, createClient to CH and work with data. |
Updated new version 0.0.15. It doesn't help |
@GGo3, have you tried increasing Additionally, CH 21.7 is quite old. Have you tried it with more recent CH versions? |
@GGo3 could you provide a code snippet to help us to reproduce the problem? I tried to reproduce it locally, but it worked fine. My env: Fedora 37, Node.js v16.17.0, ClickHouse 23.3, @clickhouse/client 0.0.15 import {createClient} from '@clickhouse/client'
void (async () => {
const client = createClient({
connect_timeout: 5_000,
request_timeout: 60_000,
})
let rows
for (let i = 1; i <= 10; i++) {
rows = await longRunningQuery(client)
console.info(`#${i}`, await rows.json())
if (i !== 10) {
await sleep(30_000)
}
}
await client.close()
})()
function longRunningQuery(client) {
return client.query({
query: `SELECT sleep(3), rand64();`,
format: 'JSONEachRow',
})
}
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
} I suppose it is similar to the use case described in the OP. It prints:
and there are no exceptions. Additionally, I tested it with our Cloud, similar code, only added |
Hmm. Test your script and it work fine. I will deal with it and inform you later. |
Hi, my fix for timeout case was related to nodejs 19, I was able to fix it just because it was reproducible. Maybe we should use undici for clickhouse client... |
I see the same problem in my project. Ubuntu 22.04, nodejs 18.16 + clickhouse 22. I do not think that It depends of clickhouse or os version, more probably it depends of nodeJS version. I changed versions node.js use 16 and 18 versions and caught socket hang up very often. My case: I call long http request to API(need to get big batch of data for one request) - it takes 1-2 minutes and after that try to delete(query below) old data and insert new batch of data. And I see the same problem then run nodeJS 20 version rarely.
|
And It does not depend that query I do. For example, I send For HTTP request I use node-fetch package, version 3.3.0 |
@olexiyb Did you find a fix to the hang ups / did you try undici? |
We reworked how request/socket timeouts work internally in 0.1.0. Can someone please check if the issue is still reproducible after the upgrade? |
The problem is still presented. Update version from 0.0.14 to 0.1.0. Also, try to set Maybe need to test on the latest CH version, because we test it on 21.7.4.18 |
@GGo3, could you provide a minimal repro for the issue? |
Script run on server Ubuntu 18.04.4 LTS Clickhouse server 21.7.4.18 run script start with create an instance new Clickhouse(config). CH:
after that |
@GGo3, a few questions regarding the example:
|
|
@slvrtrn Today I updated to the latest version and It seems to work well. I'll try to get this into production next week and send my feedback later. Thanks. |
@slvrtrn Today our team tested it and it seems the same problems are existed |
@GGo3, @nlapshin, Could you try tuning the server keep alive setting? For example, in
The number here (in seconds) should be more than the expected idle of the application. In the meantime, we are figuring out the proper solution on the client side. |
It looks like with keep_alive off and client reconnection upon a error thrown, everything is pretty stable so far. I guess the docs need to mention keep_alive nuances and examples need to include proper error handling strategies. I wish I could help with that, but currently i'm en-route with very little spare time till August. If this still requires attention, I may PR then. Thanks heaps for your help! |
@movy, how do you reconnect the client? By closing it (effectively destroying the underlying HTTP agent) and creating a new instance? |
Alright, I was going nuts with the same issue. Thankfully I found this thread. Is it recommended to keep the keep_alive off? I'd think it'll optimal to keep the connection open to reduce latency. In my APIs I'm getting lot of socket hang up error when user make frequent requests. Won't it add to latency with the TCP connection overhead? Another question, I'm not calling |
@amitava82, you could try the workarounds described in the docs. |
Thanks! I'm trying out the suggestions. Not seeing any errors but I'll report if any. |
I just learned that my reconnection routine does not work, as after most recent My code: class ClickHouse {
#createClient() {
return createClient({
host: this.#local ? process.env.CLICKHOUSE_HOST : process.env.CLOUDHOUSE_HOST,
username: 'default',
password: this.#local ? process.env.CLICKHOUSE_SECRET : process.env.CLOUDHOUSE_SECRET,
keep_alive: {
enabled: false,
},
})
async closeStreams() {
for (const stream of Object.values(this.#streams)) {
stream.push(null)
}
await Promise.all(Object.values(this.#insertPromises))
await this.#client.close()
}
this.#insertPromises[table] = this.#client
.insert({
table,
values: this.#streams[table],
format: 'JSONEachRow',
})
.then(() => console.info(`${table} data ingestion is finished`))
.catch((error) => {
this.closeStreams().then(() => this.#client = this.#createClient())
})
} @slvrtrn, what's the correct way to recover from this error? Also note, this error happens only with ClickHouse Cloud set up. With my local CH instance so far hasn't had any socket hang ups. |
@movy, you just need to recreate the failed ingestion promise for a particular table in case of an error. In your snippet, I see that you recreate the client (without closing the previous instance), but I don't think you need to do it at all. |
@slvrtrn I have slightly different issue (with a similar error):
As opposed to this, When I use query in order to ping (tried a few different approaches), I don't get the same error (only timeout error without socket hang up that kills my process). Thanks! |
I was running it with 0.2.0 (was there any chance to 0.2.1 ?), We have debugged it and the destroy there is not doing the job. |
@kobi-co, yes, that's what I was thinking as well - the request stream is not destroyed properly. Thanks for the report! |
Getting same error so far with 2.3.0 ver. Also using streams as the participants above. Tried with or without keepalive option. Env: node 18 alpine, CH 23.5.3.24. Everything running on docker separate containers. |
@igolka97, can you please provide more details about your use case?
|
I am thinking about releasing this recent branch main...idle-sockets-timeouts as From my tests, with an endless loop with a random number of concurrent requests and random waits between them, it is stable, and from the logs, I could tell that the sockets were expiring while idling as expected. @GGo3 @amitava82 @gabibora @nlapshin @olexiyb |
return createClient({ host, username, password, database });
export class StreamRequestLoggerService<T extends PossibleLogRequest>
extends RequestLoggerInterface<T>
implements OnApplicationShutdown
{
constructor(
private readonly tableName: string,
private readonly clickhouseClient: ClickHouseClient,
) {
super();
this.insertStream = this.clickhouseClient.insert({
table: this.tableName,
format: 'JSONEachRow',
values: this.stream,
});
}
stream = new Readable({
objectMode: true,
read: () => {
/**/
},
});
insertStream: Promise<ConnBaseResult>;
onApplicationShutdown() {
this.stream.push(null);
return this.insertStream;
}
async logRequest(request: Array<T>) {
request.forEach((r) => {
this.stream.push(r);
});
}
} |
@igolka97 how do you use |
@mshustov UPD BTW |
@igolka97, the code you provided looks similar to what we have in our examples for endless streams, so I see no obvious issues. I adjusted it slightly for my tests to specifically check the disabled keep alive setting, as you mentioned it: import { ClickHouseClient, createClient } from "@clickhouse/client";
import { randomInt } from "crypto";
import { Readable } from "stream";
export class StreamRequestLoggerService {
stream = new Readable({
objectMode: true,
read: () => {
/**/
},
});
insertStream: Promise<unknown>;
constructor(
private readonly tableName: string,
private readonly clickhouseClient: ClickHouseClient,
) {
this.insertStream = this.clickhouseClient
.insert({
table: this.tableName,
format: "JSONEachRow",
values: this.stream,
})
.then(() => console.info("\nData ingestion is finished"));
}
onApplicationShutdown() {
this.stream.push(null);
return this.insertStream;
}
logRequest<T>(request: Array<T>) {
request.forEach((r) => {
this.stream.push(r);
});
}
}
async function main() {
const client = createClient({
keep_alive: {
enabled: false,
},
});
const tableName = "test_logs";
await client.command({
query: `
CREATE OR REPLACE TABLE ${tableName}
(id UInt64, name String)
ENGINE MergeTree()
ORDER BY (id)
`,
});
const log = new StreamRequestLoggerService("test_logs", client);
for (let i = 0; i < 10_000; i++) {
console.info(`[${i + 1}] Pushing several records into the stream...`);
const data = [...Array(randomInt(100, 10_000))].map(() => ({
id: randomInt(1, 100_000_000),
name: Math.random().toString(36).slice(2),
}));
log.logRequest(data);
await new Promise((resolve) => setTimeout(resolve, randomInt(1, 1000)));
}
// When Ctrl+C is pressed...
async function cleanup() {
await log.onApplicationShutdown();
await client.close();
process.exit(0);
}
process.on("SIGINT", cleanup);
process.on("SIGTERM", cleanup);
}
main().catch((err) => {
console.error(err);
process.exit(1);
}); No issues with thousands of iterations here on my local setup... The only way to trigger How often do you experience |
I think I managed to create a reproducible test case, see below. In my new project I use a universal Clickhouse class that is used for streaming inserts or for simple inserts / updates, depending on who's using the class. This lead me to discovery that if I create a I realized such client 'mixed-use' might be an anti-pattern in the realm of this library, so I tried creating separate clients for each function call (via Eventually, I split my code into separate branches, so After reading node-fetch/node-fetch#1735 and vlucas/frisby#594 I tried different versions of Node (17-21 I believe), same result everywhere. Please note that the error is not catchable, i.e. inevitably crashes the whole app each time it's encountered. I tried myriad of The code includes 4 cases, two of them will fail: import { createClient } from '@clickhouse/client'
import * as utils from './utils.js'
import Stream from 'node:stream'
export class ClickHouse {
#insertPromise
#client
constructor() {
this.#createStream()
this.#client = createClient({
host: process.env.CLICKHOUSE_HOST,
username: 'default',
password: process.env.CLICKHOUSE_SECRET,
keep_alive: {
enabled: false,
// socket_ttl: 2500,
// retry_on_expired_socket: true,
}
})
}
#createStream() {
this.stream = new Stream.Readable({
objectMode: true,
read() {},
})
}
async init() {
await this.#client.command({
query: `
CREATE TABLE IF NOT EXISTS test_table(
id UInt32,
name String,
time DateTime64
) ENGINE = Memory
`,
}).catch(error => console.error('⚠️ clickhouse init error:', error))
}
async createPromise(table) {
this.#insertPromise = this.#client
.insert({
table,
values: this.stream,
format: 'JSONEachRow',
})
.catch(async (error) => {
console.error(error)
process.exit(255)
})
}
async streamTestData() {
console.log('streaming in test data')
for (let index = 0; index < 1000; index++) {
this.stream.push({
id: index,
name: 'test',
time: Date.now(),
})
}
}
// insert data using INSERT (not stream) with 5 sec sleep in between
async insertTestData() {
for (let index = 0; index < 10; index++) {
console.log('inserting test data', index)
await this.#client.insert({
table: 'test_table',
values: [
{
id: index,
time: Date.now(),
name: 'test',
}
],
format: 'JSONEachRow',
}).catch(console.error)
await utils.sleep(5000)
}
}
async closeStreams() {
// count rows in test_table
const { data } = await (await this.#client.query({ query: 'SELECT count(*) FROM test_table' })).json()
console.log('count rows in test_table', data)
// close stream
console.log('clickhouse cleanup')
this.stream.push(null)
// stream.destroy()
// when the stream is closed, the insert stream can be awaited
await this.#insertPromise
await this.#client.close()
console.log('clickhouse cleanup done')
}
}
// this passes
const test1 = async () => {
// await clickhouse.createPromise('test_table')
// await clickhouse.streamTestData()
await clickhouse.insertTestData()
}
// this passes
const test2 = async () => {
await clickhouse.createPromise('test_table')
await clickhouse.streamTestData()
// await clickhouse.insertTestData()
}
// this fails with SOCKET_TIMEOUT
const test3 = async () => {
await clickhouse.createPromise('test_table')
await clickhouse.streamTestData()
await clickhouse.insertTestData()
}
// this fails with Error: socket hang up ECONNRESET
const test4 = async () => {
await clickhouse.createPromise('test_table')
// await clickhouse.streamTestData()
await clickhouse.insertTestData()
}
const clickhouse = new ClickHouse()
await clickhouse.init()
// await test1()
// await test2()
// await test3()
await test4()
await clickhouse.closeStreams()
|
@movy, I checked your example, but in my case (Node.js 18x., Fedora), all four pass. I see one socket hang up error at the very end when the application quits, though. First take with the output (exact same code, just slightly adjusted to TS): https://gist.github.com/slvrtrn/392c6dd5371d84651a3c2d13ed32946a Looks like socket hang up happens because we have some dangling promises produced here: async createPromise(table) {
this.#insertPromise = this.#client
.insert({
table,
values: this.stream,
format: 'JSONEachRow',
})
.catch(async (error) => {
console.error(error)
process.exit(255)
})
} The stream is reused for all of these promises (which causes issues cause we have some conflicting listeners now, i.e., UB). Here is a fixed snippet (see some adjustments in NB: it also flushes the data correctly, as it should be 2030 (and not 30) records there. Enabling all KeepAlive settings with the fixed example like this:
Causes no issues. EDIT: I also checked your example with my unreleased idle sockets timeout branch, and, most likely, the "socket hang up" in the 4th example happens cause we create an empty stream without any initial data coming in, and server just severs the connection by itself exactly after 3 seconds of waiting (this is the default setting of See: https://gist.github.com/slvrtrn/7fb24918661e9b5066a131f32b194ca1 If you are interested in trying out the reworked idle sockets and giving your feedback on how it works in your scenario, please send me a DM in the community Slack (the same name as in the GitHub profile). I will share the build (as I am not in favor of releasing breaking changes if it does not fix the issue or introduce other ones). |
@slvrtrn, I apologise for the confusion, but tests1-4 are to be run separately (hence I commented out all but one in my code). Thanks for your explanation regarding test4. Maybe it's worth to be documented, as may be obvious in hindsight, it might not be so for new library users. Basically it means we'd have to instantly push at least one record upon a promise creation, as in many practical use cases a promise is created on an app's launch, but data can come at any point later in time, not necessarily immediately after start. 3 seconds timeout is definitely insufficient here. One big problem with ECONNRESET is that it's uncatchable and often crashes the whole app without proper stack-trace, i.e. if we use many connections / streams and one of them hangs up like this, it is very tricky to find the piece of code that caused it. As for test3, if run separately, the promise is utilised and the stream is destroyed correctly, yet still SOCKET_TIMEOUT error. I have never seen this in production though, only while creating this test-case. |
@breslavsky, why is your socket_ttl set to 60_000? what is your ClickHouse server |
We set 120 seconds. |
@breslavsky, could you DM me in the community Slack? |
Closing this, as it should be fixed as of 0.3.0. If there are still issues with sockets after upgrading, please open a new one. |
Get this error after 2nd query to clickhouse
Steps to reproduce
Also, try to increase connect_timeout to 20 sec but it didn't help.
The text was updated successfully, but these errors were encountered: