feat: add "timeoutPerCommand" option to detect dead connection #658

luin · 2018-07-06T18:18:54Z

WIP. Need discussion, tests and documentation.

Usage:

const redis = new Redis({
  timeoutPerCommand: 10000
})

Known issues:

Don't work with blocking commands

jcstanaway · 2018-07-06T19:14:45Z

lib/redis/event_handler.js

+          debug('Command timed out. Pending commands: %s', commandQueueLength);
+          var err = new Error('Command timed out');
+          self.flushQueue(err, { offlineQueue: false });
+          self.silentEmit('error', err);


Is emitting an error event sufficient? In my usage, I'm always supplying a callback in the form of:

redis.get(key, function (err, reply) { if (err) { // handle error } else { // handle data } });

While I do also handle the error event, that handling is somewhat generic. It would be better if the callback was called with the error so that specific error handling could be invoked.

Yes, the callbacks will be called besides an error event. self.flushQueue() does the trick.

jcstanaway · 2018-07-06T19:18:46Z

lib/redis/event_handler.js

+          var err = new Error('Command timed out');
+          self.flushQueue(err, { offlineQueue: false });
+          self.silentEmit('error', err);
+          self.disconnect(true);


I assume that this will trigger an automatic reconnect to that server, correct?

Yes, that's correct.

Good, but see my follow-up to #587. If this node was part of a cluster, retryStrategy is overridden and set to null so there would in fact be no automatic reconnect to that node.

Per #587, there will not be any automatic reconnect in a cluster environment.

vweevers · 2018-07-06T19:49:22Z

Is socket.setKeepAlive(true) not an option? TCP keep-alive probes are very cheap and work well.

jcstanaway · 2018-07-06T19:51:37Z

lib/redis/event_handler.js

+          self.silentEmit('error', err);
+          self.disconnect(true);
+        }
+      }, 500);


It appears that the minimum practical value for timeoutPerCommand is 500ms. Even if I specify timeoutPerCommand = 100, the check for a timed out command occurs every 500ms and so the command time out error won't trigger as quickly as I would expect.

jcstanaway · 2018-07-06T20:00:22Z

lib/redis/event_handler.js

+        var commandQueueLength = self.commandQueue.length;
+        if (
+          commandQueueLength > 0 &&
+          Date.now() - self.lastWriteTime > self.options.timeoutPerCommand


Since self.lastWriteTime is updated every time a command is added to the commandQueue, if the client attempts commands fast enough (more than one every timeoutPerCommand ms), then self.lastWriteTime continues to get updated such that Date.now() - self.lastWriteTime will continue to be less than timeoutPerCommand and thus the original command will never time out.

Perhaps in Redis.prototype.sendCommand lastWriteTime is only updated if commandQueue.length === 1.

jcstanaway · 2018-07-06T20:21:49Z

@vweevers While a little old, see nodejs/node-v0.x-archive#6194. The main take away is that node.js doesn't provide sufficient configurability of the TCP keepalive functionality. Thus we'd have to wait up 10-11 minutes before detecting that the connection has closed. Searching through the node 10.x documentation, I didn't find anything that allows further configuring TCP keep alives to detect a broken connection any faster.

vweevers · 2018-07-06T20:29:41Z

@ccs018 Oh I see, the issue is about detecting dead connections faster. That's cool. I just hope the added functionality will remain opt-in (because for my needs, waiting 10 minutes is perfectly fine) and not add side effects to an already complicated module.

jcstanaway · 2018-07-07T02:55:52Z

It's not sure much about detecting dead connections fast. It's I issue a redis command (e.g., get), but if the connection dies before the response is received nothing is reported back to the client - the callback is never invoked.

luin · 2018-07-07T04:53:20Z

@vweevers This option will be disabled by default and it won't add any side effects when disabled.

@ccs018 If you issue a command, The timeoutPerCommand interval will check if the response has been received every 500ms, and if not after the specified time (timeoutPerCommand option), the callback will be invoked with a timeout error.

vweevers · 2018-07-07T09:05:47Z

@ccs018 This must mean TCP keep-alive is disabled by default (client side).

@luin Does ioredis expose the raw socket somewhere, so that I can call setKeepAlive on it?

luin · 2018-07-18T15:41:35Z

@vweevers Keep-alive is enabled by default. Refer to https://github.com/luin/ioredis/blob/master/API.md for details on the options for that.

neg3ntropy · 2018-07-20T15:14:45Z

lib/redis/event_handler.js

+    if (typeof self.options.timeoutPerCommand === 'number') {
+      var timeoutPerCommand = self.options.timeoutPerCommand;
+      debug('Per-command timeout set to %s', self.options.timeoutPerCommand);
+      self.timeoutPerCommand = setInterval(() => {


how about using this: https://nodejs.org/api/net.html#net_socket_settimeout_timeout_callback to generate a timeout event from the socket if it becomes inactive while the queue is being processed?

This timeout guards against a socket which has no activity. The value to use probably greatly depends on the use cases. In a production environment, there's probably enough traffic to keep such a timeout from triggering. But in a dev environment, there might not be any traffic for relatively long stretches of time making this an idle connection which then times out. Setting a per command timeout seems better.

Also, if this socket timeout occurs, it doesn't enable specific error handling / recovery for the specific commands which timed out.

Or were you suggesting this as an additional check and not an alternative solution?

I am suggesting to set this to something reasonable while we are processing commands and to disable it while the socket is not used.
It just seems that it can be easier to implement than setting and cancelling timeouts in the lib. And more powerful as well.

stale · 2018-08-24T15:23:39Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

silverwind · 2018-08-27T22:35:38Z

I have a feeling that this should be a per-command option. I could certainly see using different timeouts for different commands.

I'm currently using a wrapper function that includes a Promise constructor and a setTimeout, but seeing that ioredis does not expose a way to cancel a pending command, there is a risk that it would accumulate lots of pending commands over time and eventually exhaust the system's resources.

silverwind · 2018-08-27T23:02:33Z

I guess we could do without this new option if commands were cancellable. A common way to do this on promise interfaces is to expose a .cancel method on the promise. There's also p-cancellable which adds a bit more.

stale · 2018-09-26T23:19:20Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

edevil · 2018-11-02T17:44:18Z

I still think that a configurable global command timeout is useful when dealing with unresponsive servers because there are some modules that take a Redis proxy object and issue the commands themselves. All these modules would need to be updated to correctly guard for timeouts.

edevil · 2019-07-23T08:36:12Z

ping

andrewaustin · 2021-02-18T15:42:27Z

@luin Any chance of reviving this one?

BryanDonovan · 2022-02-16T17:44:49Z

In case anyone stumbles upon this issue, I think it's closed by #1320

seminarian · 2024-03-21T16:27:25Z

Cancelling the promise which is sending the command to the server after commandTimeout was hit would be nice..

luin mentioned this pull request Jul 6, 2018

Client waiting endlessly for info from unresponsive server #634

Open

feat: add "timeoutPerCommand" option to detect dead connection

a8afd00

luin force-pushed the command-timeout branch from 7913477 to a8afd00 Compare July 6, 2018 18:26

jcstanaway reviewed Jul 6, 2018

View reviewed changes

luin added the Work In Progress label Jul 7, 2018

Merge branch 'master' into command-timeout

d9a9193

neg3ntropy reviewed Jul 20, 2018

View reviewed changes

stale bot added the wontfix label Aug 24, 2018

stale bot removed the wontfix label Aug 27, 2018

stale bot added the wontfix label Sep 26, 2018

stale bot closed this Oct 3, 2018

edevil mentioned this pull request Nov 2, 2018

feat(offlineQueue): add option to limit the offline queue size #241

Closed

ajinkyarajput mentioned this pull request Jun 9, 2020

Cluster: Fail to reconnect to node #587

Open

mariusandra mentioned this pull request Feb 19, 2021

Replace io-REDIS PostHog/plugin-server#178

Closed

luin deleted the command-timeout branch March 14, 2022 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add "timeoutPerCommand" option to detect dead connection #658

feat: add "timeoutPerCommand" option to detect dead connection #658

luin commented Jul 6, 2018 •

edited

Loading

jcstanaway Jul 6, 2018

luin Jul 6, 2018

jcstanaway Jul 6, 2018

luin Jul 6, 2018

jcstanaway Jul 6, 2018

jcstanaway Jul 23, 2018

vweevers commented Jul 6, 2018

jcstanaway Jul 6, 2018

jcstanaway Jul 6, 2018

jcstanaway commented Jul 6, 2018

vweevers commented Jul 6, 2018

jcstanaway commented Jul 7, 2018

luin commented Jul 7, 2018

vweevers commented Jul 7, 2018

luin commented Jul 18, 2018

neg3ntropy Jul 20, 2018

jcstanaway Jul 23, 2018

neg3ntropy Jul 25, 2018

stale bot commented Aug 24, 2018

silverwind commented Aug 27, 2018

silverwind commented Aug 27, 2018

stale bot commented Sep 26, 2018

edevil commented Nov 2, 2018

edevil commented Jul 23, 2019

andrewaustin commented Feb 18, 2021

BryanDonovan commented Feb 16, 2022

seminarian commented Mar 21, 2024 •

edited

Loading

feat: add "timeoutPerCommand" option to detect dead connection #658

feat: add "timeoutPerCommand" option to detect dead connection #658

Conversation

luin commented Jul 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vweevers commented Jul 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcstanaway commented Jul 6, 2018

vweevers commented Jul 6, 2018

jcstanaway commented Jul 7, 2018

luin commented Jul 7, 2018

vweevers commented Jul 7, 2018

luin commented Jul 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Aug 24, 2018

silverwind commented Aug 27, 2018

silverwind commented Aug 27, 2018

stale bot commented Sep 26, 2018

edevil commented Nov 2, 2018

edevil commented Jul 23, 2019

andrewaustin commented Feb 18, 2021

BryanDonovan commented Feb 16, 2022

seminarian commented Mar 21, 2024 • edited Loading

luin commented Jul 6, 2018 •

edited

Loading

seminarian commented Mar 21, 2024 •

edited

Loading