Client waiting endlessly for info from unresponsive server #634

neg3ntropy · 2018-05-30T10:08:06Z

It appears that in some instances the connection to the server is established but the server will not respond (we are running AWS lambdas and elasticache, which is not really a good marriage).

In this scenario the client is stuck waiting for a response and queuing commands but never making progress or giving up and retrying the connection. Things just halt forever, since no new traffic is sent to detect that the connection is lost.

It can be reproduced by opening a port with netcat and just ignoring incoming traffic (nc -l <port>).

Could a socket timeOut be set up to handle this while waiting for responses?

The text was updated successfully, but these errors were encountered:

brettkiefer · 2018-05-30T15:37:23Z

@soulrebel What node.js and ioredis versions?

brettkiefer · 2018-05-31T21:35:16Z

@soulrebel Is it possible you're seeing #633 ?

luin · 2018-06-04T01:38:54Z

The default retryStrategy option waits for the recovering of the connection forever. You may want to pass a customized one in the case that the connection may be lost forever. Refer to https://github.com/luin/ioredis#auto-reconnect for details.

neg3ntropy · 2018-06-04T14:03:54Z

@brettkiefer Node v 6.0.11 and ioredis 3.2.2. Not the same bug.

@luin Retrying forever is ok, but I have a log on the error event and it never fires in this case. It simply waits for the server to respond, but that never happens and there are no retries.

I think something like this could happen anytime a server goes offline while processing a command. The reply will never come, the socket will not timeout since the outgoing data is sent and ioredis will wait indefinitely.

livoras · 2018-06-05T15:32:10Z

I came across the same problem, and I agree that the client can retry to connect the server on its own. But should this kind of retrying be separated from every individual client's request? Every request can has its own timeout and throw a timeout error:

try {
   await redis.keys('user*')
} catch (err) {
   // err: redis request timeout for 3000ms
}

Otherwise the program will be forever pending, and we have no way to kill this kind of pending.

neg3ntropy · 2018-06-06T08:04:25Z

@livoras I don't believe that for 99.999% of the cases you would need different timeouts for different commands.

I think that a possible solution is to use two different timeouts for connections:

a long timeout (or no timeout) when the connection is not being used.
a short timeout after a command is sent.

By timing out on data traffic at the connection level, you don't require the entire command to finish before the timeout, but just the server to start sending data, so it should not affect big resultsets.

neg3ntropy · 2018-06-19T08:52:38Z

@luin I could have the time to work on this. I just would like to know what would be an acceptable design so that it can be merged.

elyobo · 2018-06-20T22:06:12Z

See also #61 which seems related.

…er command #634, #61 BREAKING CHANGE: The maxRetriesPerRequest is set to 20 instead of null (same behavior as ioredis v3) by default. So when a redis server is down, pending commands won't wait forever until the connection become alive, instead, they only wait about 10s (depends on the retryStrategy option)

neg3ntropy · 2018-06-25T13:08:56Z

@elyobo Not really, this scenario does not trigger retries

elyobo · 2018-06-25T21:46:55Z

You're right, I misread the issue.

luin · 2018-07-01T17:34:04Z

maxRetriesPerRequest has added in the beta relase v4.0.0-0. 🍺

neg3ntropy · 2018-07-02T10:17:46Z

maxRetriesPerRequest does not help with this. @luin This should not be closed yet.

The failure happens when the socket is connected but the server is not responding. Probably the OS will timeout at some point, but it appears that it will take 15 minutes by default on Linux to give up on that (https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html)

luin · 2018-07-02T11:44:49Z

@soulrebel What about implementing a heartbeat strategy that sending PING every several seconds and the connection will be considered lost once timed out?

livoras · 2018-07-05T09:57:12Z

I have a naive idea, what about simply using setTimeout to timeout a pending request?

neg3ntropy · 2018-07-06T16:58:37Z

@luin I am not sure that would be enough. You can't send pings while waiting for responses to other commands (AFAIK) and you keep the connection busy and damage performance and you don't fix the issue if it happens during a non-ping command.

@livoras That could work, although it is not that easy to clean up afterwards.

I am intrigued by the possibility to setup a socket timeout during commands: in this case you should get a network error and the socket should be cleaned up automatically. In case of success, you just set a new timeout to 0 to disable. Since it's all handled by Node or the OS, based on TCP progress, this would also work well for cases where the response is coming, but slowly. I just do not know if this is a legit way to use sockets.

jcstanaway · 2018-07-06T17:12:43Z

I think part of the problem is that the RESP (Redis Protocol) is really basic and most of the commands are blocking. I don't believe that there is any kind of sequence/transaction/request ID used. So if the client timed out and issued another command, then the response to the previous command could first be received. I'm not sure how that would be handled - probably not well if there is no transaction ID. It might be safer that if a time out occurs to simply terminate the connection and reconnect.

It would be cleaner if ioredis set a timeout on every command issued and if the timeout expired, return an error (if callback used) or reject (if promise used), terminate the connection and automatically attempt to reconnect.

luin · 2018-07-06T18:21:53Z

Added a pull request trying to address this issue: #658. What's your thoughts?

edevil · 2018-11-02T17:51:13Z

Seems like the referenced PR was abandoned. What's the current status on handling unresponsive servers?

tomwhale · 2018-11-27T11:24:41Z

Would be great to get this in. As it stands, when we terminated an instance running redis, ECS still tries to connect to it. We are using Redis Sentinel though.

Ponjimon · 2019-10-01T11:33:49Z

Still having the same issue. Is there any news? It's almost been a year since the last post here

nicokaiser · 2019-10-07T11:29:18Z

Adding timeout features to ioredis seems to be a difficult (or unwanted) task, although it is needed in some cases. I solved this by using promise-timeout and wrapping each and every ioredis call. This is not very elegant (and may also lead to internal reference leak when queries do not terminate), but at least I can handle timeouts in HTTP requests this way.

rilopez · 2023-03-31T17:57:55Z

isn't this already resolved by #1320

luin added the pinned label Jun 23, 2018

luin added this to the v4 milestone Jun 23, 2018

luin closed this as completed Jul 1, 2018

luin reopened this Jul 2, 2018

mariusandra mentioned this issue Feb 19, 2021

Replace io-REDIS PostHog/plugin-server#178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client waiting endlessly for info from unresponsive server #634

Client waiting endlessly for info from unresponsive server #634

neg3ntropy commented May 30, 2018

brettkiefer commented May 30, 2018

brettkiefer commented May 31, 2018

luin commented Jun 4, 2018

neg3ntropy commented Jun 4, 2018

livoras commented Jun 5, 2018

neg3ntropy commented Jun 6, 2018

neg3ntropy commented Jun 19, 2018

elyobo commented Jun 20, 2018

neg3ntropy commented Jun 25, 2018

elyobo commented Jun 25, 2018

luin commented Jul 1, 2018

neg3ntropy commented Jul 2, 2018

luin commented Jul 2, 2018

livoras commented Jul 5, 2018

neg3ntropy commented Jul 6, 2018

jcstanaway commented Jul 6, 2018

luin commented Jul 6, 2018

edevil commented Nov 2, 2018

tomwhale commented Nov 27, 2018

Ponjimon commented Oct 1, 2019

nicokaiser commented Oct 7, 2019

rilopez commented Mar 31, 2023

Client waiting endlessly for info from unresponsive server #634

Client waiting endlessly for info from unresponsive server #634

Comments

neg3ntropy commented May 30, 2018

brettkiefer commented May 30, 2018

brettkiefer commented May 31, 2018

luin commented Jun 4, 2018

neg3ntropy commented Jun 4, 2018

livoras commented Jun 5, 2018

neg3ntropy commented Jun 6, 2018

neg3ntropy commented Jun 19, 2018

elyobo commented Jun 20, 2018

neg3ntropy commented Jun 25, 2018

elyobo commented Jun 25, 2018

luin commented Jul 1, 2018

neg3ntropy commented Jul 2, 2018

luin commented Jul 2, 2018

livoras commented Jul 5, 2018

neg3ntropy commented Jul 6, 2018

jcstanaway commented Jul 6, 2018

luin commented Jul 6, 2018

edevil commented Nov 2, 2018

tomwhale commented Nov 27, 2018

Ponjimon commented Oct 1, 2019

nicokaiser commented Oct 7, 2019

rilopez commented Mar 31, 2023