Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client waiting endlessly for info from unresponsive server #634

Open
neg3ntropy opened this issue May 30, 2018 · 22 comments
Open

Client waiting endlessly for info from unresponsive server #634

neg3ntropy opened this issue May 30, 2018 · 22 comments
Labels
Milestone

Comments

@neg3ntropy
Copy link

It appears that in some instances the connection to the server is established but the server will not respond (we are running AWS lambdas and elasticache, which is not really a good marriage).

In this scenario the client is stuck waiting for a response and queuing commands but never making progress or giving up and retrying the connection. Things just halt forever, since no new traffic is sent to detect that the connection is lost.

It can be reproduced by opening a port with netcat and just ignoring incoming traffic (nc -l <port>).

Could a socket timeOut be set up to handle this while waiting for responses?

@brettkiefer
Copy link

@soulrebel What node.js and ioredis versions?

@brettkiefer
Copy link

@soulrebel Is it possible you're seeing #633 ?

@luin
Copy link
Collaborator

luin commented Jun 4, 2018

The default retryStrategy option waits for the recovering of the connection forever. You may want to pass a customized one in the case that the connection may be lost forever. Refer to https://github.com/luin/ioredis#auto-reconnect for details.

@neg3ntropy
Copy link
Author

@brettkiefer Node v 6.0.11 and ioredis 3.2.2. Not the same bug.

@luin Retrying forever is ok, but I have a log on the error event and it never fires in this case. It simply waits for the server to respond, but that never happens and there are no retries.

I think something like this could happen anytime a server goes offline while processing a command. The reply will never come, the socket will not timeout since the outgoing data is sent and ioredis will wait indefinitely.

@livoras
Copy link

livoras commented Jun 5, 2018

I came across the same problem, and I agree that the client can retry to connect the server on its own. But should this kind of retrying be separated from every individual client's request? Every request can has its own timeout and throw a timeout error:

try {
   await redis.keys('user*')
} catch (err) {
   // err: redis request timeout for 3000ms
}

Otherwise the program will be forever pending, and we have no way to kill this kind of pending.

@neg3ntropy
Copy link
Author

@livoras I don't believe that for 99.999% of the cases you would need different timeouts for different commands.

I think that a possible solution is to use two different timeouts for connections:

  • a long timeout (or no timeout) when the connection is not being used.
  • a short timeout after a command is sent.

By timing out on data traffic at the connection level, you don't require the entire command to finish before the timeout, but just the server to start sending data, so it should not affect big resultsets.

@neg3ntropy
Copy link
Author

@luin I could have the time to work on this. I just would like to know what would be an acceptable design so that it can be merged.

@elyobo
Copy link

elyobo commented Jun 20, 2018

See also #61 which seems related.

@luin luin added the pinned label Jun 23, 2018
@luin luin added this to the v4 milestone Jun 23, 2018
luin added a commit that referenced this issue Jun 24, 2018
…er command

#634, #61

BREAKING CHANGE:
The maxRetriesPerRequest is set to 20 instead of null (same behavior as ioredis v3)
by default. So when a redis server is down, pending commands won't wait forever
until the connection become alive, instead, they only wait about 10s (depends on the
retryStrategy option)
@neg3ntropy
Copy link
Author

@elyobo Not really, this scenario does not trigger retries

@elyobo
Copy link

elyobo commented Jun 25, 2018

You're right, I misread the issue.

@luin
Copy link
Collaborator

luin commented Jul 1, 2018

maxRetriesPerRequest has added in the beta relase v4.0.0-0. 🍺

@luin luin closed this as completed Jul 1, 2018
@neg3ntropy
Copy link
Author

maxRetriesPerRequest does not help with this. @luin This should not be closed yet.

The failure happens when the socket is connected but the server is not responding. Probably the OS will timeout at some point, but it appears that it will take 15 minutes by default on Linux to give up on that (https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html)

@luin
Copy link
Collaborator

luin commented Jul 2, 2018

@soulrebel What about implementing a heartbeat strategy that sending PING every several seconds and the connection will be considered lost once timed out?

@luin luin reopened this Jul 2, 2018
@livoras
Copy link

livoras commented Jul 5, 2018

I have a naive idea, what about simply using setTimeout to timeout a pending request?

@neg3ntropy
Copy link
Author

@luin I am not sure that would be enough. You can't send pings while waiting for responses to other commands (AFAIK) and you keep the connection busy and damage performance and you don't fix the issue if it happens during a non-ping command.

@livoras That could work, although it is not that easy to clean up afterwards.

I am intrigued by the possibility to setup a socket timeout during commands: in this case you should get a network error and the socket should be cleaned up automatically. In case of success, you just set a new timeout to 0 to disable. Since it's all handled by Node or the OS, based on TCP progress, this would also work well for cases where the response is coming, but slowly. I just do not know if this is a legit way to use sockets.

@jcstanaway
Copy link

I think part of the problem is that the RESP (Redis Protocol) is really basic and most of the commands are blocking. I don't believe that there is any kind of sequence/transaction/request ID used. So if the client timed out and issued another command, then the response to the previous command could first be received. I'm not sure how that would be handled - probably not well if there is no transaction ID. It might be safer that if a time out occurs to simply terminate the connection and reconnect.

It would be cleaner if ioredis set a timeout on every command issued and if the timeout expired, return an error (if callback used) or reject (if promise used), terminate the connection and automatically attempt to reconnect.

@luin
Copy link
Collaborator

luin commented Jul 6, 2018

Added a pull request trying to address this issue: #658. What's your thoughts?

@edevil
Copy link

edevil commented Nov 2, 2018

Seems like the referenced PR was abandoned. What's the current status on handling unresponsive servers?

@tomwhale
Copy link

Would be great to get this in. As it stands, when we terminated an instance running redis, ECS still tries to connect to it. We are using Redis Sentinel though.

@Ponjimon
Copy link

Ponjimon commented Oct 1, 2019

Still having the same issue. Is there any news? It's almost been a year since the last post here

@nicokaiser
Copy link

Adding timeout features to ioredis seems to be a difficult (or unwanted) task, although it is needed in some cases. I solved this by using promise-timeout and wrapping each and every ioredis call. This is not very elegant (and may also lead to internal reference leak when queries do not terminate), but at least I can handle timeouts in HTTP requests this way.

@rilopez
Copy link

rilopez commented Mar 31, 2023

isn't this already resolved by #1320

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests