-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JedisCluster uses maxRedirects
retries when cluster master is down
#1238
Comments
@roblg. If you want to fail immediately you can set the JedisCluster retries to 0 and in that case JedisCluster won't try to contact other nodes upon failure. I'm in mobile now I'll explain the rationale for this functionality later |
@marcosnils Thanks for the response. Yeah, I originally considered lowering retries to 0, but I think I actually do want retries for "MOVED" responses, so that we can add/remove nodes from the cluster and pick up those changes. I think maybe ideally there would be two kinds of retries -- error retries and redirect retries, but they seem to be the same right now. In all honesty, it would probably work well enough to just set retries to 1, so that if we add a new node and rebalance the slots, Jedis would be able to correct for that, but in the case of a socket connect timeout, we don't pay a huge penalty. |
Hmm... it also looks like if there's a ConnectException when trying to connect to a particular node, the request will be retried, but is retried with |
@roblg |
Hmm... I think the logic is up to how fast Redis Cluster completes failover. If Redis Cluster cannot complete failover fast enough, there's likely to just half of max retry count recurrences of connecting random node -> failed node and finally throw JedisConnectionException anyway. |
@HeartSaVioR I believe this shouldn't be much trouble because even though Jedis might return MaxRedirects very soon depending on how much time RedisCluster completes its failover, whenever the Jedis user tries to issue any command using the same JedisCluster instance, Jedis will try to reconnect again and try to dispatch the command to the appropriate node. In this scenario, Jedis users can catch the |
@marcosnils |
PR for backing off while retrying on connection failures: #2358 |
This issue is marked stale. It will be closed in 30 days if it is not updated. |
This is sort of related to #1236 that I filed earlier today, and might be related to #1120.
We had a machine that is running one of our redis cluster masters go down hard today, and while it was down we saw some interesting behavior from the jedis pool. We got a lot of errors like this:
It looks like
redis.clients.jedis.Connection.connect()
is wrapping thejava.net.ConnectException
from the failed socket up in aJedisConnectionException
, which gets wrapped again byredis.clients.util.Pool.getResource()
, and because it's aJedisConnectionException
, it triggers a retry.I think this issue is maybe a little less obviously a bug than #1236, because I can see some uses where someone might want to retry immediately if a socket threw a
ConnectException
, but I think one could also make an argument that if aConnectException
is being thrown something is seriously wrong, and failing immediately seems reasonable. (If users were experiencing the ConnectException for nodes that weren't down, they could alway increase the connect timeout.) As it is, when a node goes down, users could potentially end up waiting a total of(maxRedirects * connectTimeout) ms
, and still get an error, which is kind of unfortunate, especially when a regular cache hit is over in just a few ms. :)Thoughts?
Redis / Jedis Configuration
Jedis version:
2.8.0
Redis version:
3.0.6
Java version:
JDK8
The text was updated successfully, but these errors were encountered: