-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add "timeoutPerCommand" option to detect dead connection #658
Conversation
debug('Command timed out. Pending commands: %s', commandQueueLength); | ||
var err = new Error('Command timed out'); | ||
self.flushQueue(err, { offlineQueue: false }); | ||
self.silentEmit('error', err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is emitting an error
event sufficient? In my usage, I'm always supplying a callback in the form of:
redis.get(key, function (err, reply) {
if (err) {
// handle error
} else {
// handle data
}
});
While I do also handle the error
event, that handling is somewhat generic. It would be better if the callback was called with the error so that specific error handling could be invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the callbacks will be called besides an error
event. self.flushQueue()
does the trick.
var err = new Error('Command timed out'); | ||
self.flushQueue(err, { offlineQueue: false }); | ||
self.silentEmit('error', err); | ||
self.disconnect(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that this will trigger an automatic reconnect to that server, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, but see my follow-up to #587. If this node was part of a cluster, retryStrategy
is overridden and set to null
so there would in fact be no automatic reconnect to that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per #587, there will not be any automatic reconnect in a cluster environment.
Is |
self.silentEmit('error', err); | ||
self.disconnect(true); | ||
} | ||
}, 500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that the minimum practical value for timeoutPerCommand
is 500ms. Even if I specify timeoutPerCommand = 100
, the check for a timed out command occurs every 500ms and so the command time out error won't trigger as quickly as I would expect.
var commandQueueLength = self.commandQueue.length; | ||
if ( | ||
commandQueueLength > 0 && | ||
Date.now() - self.lastWriteTime > self.options.timeoutPerCommand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since self.lastWriteTime
is updated every time a command is added to the commandQueue
, if the client attempts commands fast enough (more than one every timeoutPerCommand
ms), then self.lastWriteTime
continues to get updated such that Date.now() - self.lastWriteTime
will continue to be less than timeoutPerCommand
and thus the original command will never time out.
Perhaps in Redis.prototype.sendCommand
lastWriteTime
is only updated if commandQueue.length === 1
.
@vweevers While a little old, see nodejs/node-v0.x-archive#6194. The main take away is that node.js doesn't provide sufficient configurability of the TCP keepalive functionality. Thus we'd have to wait up 10-11 minutes before detecting that the connection has closed. Searching through the node 10.x documentation, I didn't find anything that allows further configuring TCP keep alives to detect a broken connection any faster. |
@ccs018 Oh I see, the issue is about detecting dead connections faster. That's cool. I just hope the added functionality will remain opt-in (because for my needs, waiting 10 minutes is perfectly fine) and not add side effects to an already complicated module. |
It's not sure much about detecting dead connections fast. It's I issue a redis command (e.g., get), but if the connection dies before the response is received nothing is reported back to the client - the callback is never invoked. |
@vweevers This option will be disabled by default and it won't add any side effects when disabled. @ccs018 If you issue a command, The |
@ccs018 This must mean TCP keep-alive is disabled by default (client side). @luin Does ioredis expose the raw socket somewhere, so that I can call |
@vweevers Keep-alive is enabled by default. Refer to https://github.com/luin/ioredis/blob/master/API.md for details on the options for that. |
if (typeof self.options.timeoutPerCommand === 'number') { | ||
var timeoutPerCommand = self.options.timeoutPerCommand; | ||
debug('Per-command timeout set to %s', self.options.timeoutPerCommand); | ||
self.timeoutPerCommand = setInterval(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using this: https://nodejs.org/api/net.html#net_socket_settimeout_timeout_callback to generate a timeout event from the socket if it becomes inactive while the queue is being processed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout guards against a socket which has no activity. The value to use probably greatly depends on the use cases. In a production environment, there's probably enough traffic to keep such a timeout from triggering. But in a dev environment, there might not be any traffic for relatively long stretches of time making this an idle connection which then times out. Setting a per command timeout seems better.
Also, if this socket timeout occurs, it doesn't enable specific error handling / recovery for the specific commands which timed out.
Or were you suggesting this as an additional check and not an alternative solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suggesting to set this to something reasonable while we are processing commands and to disable it while the socket is not used.
It just seems that it can be easier to implement than setting and cancelling timeouts in the lib. And more powerful as well.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
I have a feeling that this should be a per-command option. I could certainly see using different timeouts for different commands. I'm currently using a wrapper function that includes a |
I guess we could do without this new option if commands were cancellable. A common way to do this on promise interfaces is to expose a |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
I still think that a configurable global command timeout is useful when dealing with unresponsive servers because there are some modules that take a Redis proxy object and issue the commands themselves. All these modules would need to be updated to correctly guard for timeouts. |
ping |
@luin Any chance of reviving this one? |
In case anyone stumbles upon this issue, I think it's closed by #1320 |
Cancelling the promise which is sending the command to the server after |
WIP. Need discussion, tests and documentation.
Usage:
Known issues: