Snowball effect with reconnecting to poor performing node #1252

Spikhalskiy · 2016-04-06T00:38:28Z

We have a problem with JedisPool that aggravates perf issues of redis nodes when this issues only start to appear.

What we have:

Short timeouts. Like 3ms.
Significantly loaded redis that sometimes starts to respond slowly with number of connections like 40000 per node.

If redis is starting to stuck for 4ms, instead of each read, we do

read
after timeout and marking Jedis as broken, JedisFactory gently sends quit to Redis in destroyObject
we establish new connection
PING-PONG

and only after that we have new Jedis instance for new read, but... actually nothing changed, we could just continue to use old instance.

So, when our Redis Cluster start to experience some perf issues - we finish it off by invalidating Jedis.

Any thoughts?
Only one from me - maybe we could add an ability to pass some type of "InvalidationStrategy" to Jedis? For example, strategy by default will mark as broken and do everything like now and 3rd party can implement it's own strategy, for example, send PING-PONG before quit. "read with timeout - PING-PONG, give it a chance - read" looks better than current mandatory invalidation flow.

I could implement and provide PR for any solution solving or providing possibility to improve current standard flow.

What do you think?

… cluster, fix slots clearing without filling

…iscover cluster

… cluster, fix slots clearing without filling

…iscover cluster

1. New special exception for “No reachable nodes” 2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes 3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis 4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException (cherry picked from commit c567161)

Spikhalskiy · 2016-04-25T15:44:16Z

Final version which is working in our prod and does it fine: https://github.com/Spikhalskiy/jedis/releases/tag/PP-2

It's master with merged related pull requests.
Issue could be closed after merging PRs to upstream.

marcosnils · 2016-04-25T15:46:44Z

~~#1249~~ ~~#1251~~ #1253 #1256

@Spikhalskiy amazing contribution. I've had a rough weeks lately. As soon as I have some time I promise to look at those changes.

…uster, fix slots clearing without filling (#1253) * Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling * Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

1. New special exception for “No reachable nodes” 2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes 3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis 4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

…uster, fix slots clearing without filling (#1253) * Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling * Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster Conflicts: src/main/java/redis/clients/jedis/JedisClusterInfoCache.java

…ed with rediscovery at the end (#1256) * Issue #1252: 1. New special exception for “No reachable nodes” 2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes 3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis 4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

…ed with rediscovery at the end (#1256) * Issue #1252: 1. New special exception for “No reachable nodes” 2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes 3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis 4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException Conflicts: src/main/java/redis/clients/jedis/BinaryJedisCluster.java src/main/java/redis/clients/jedis/JedisCluster.java

sazzad16 · 2017-11-21T11:49:28Z

Resolved by #1253 and #1256

This was referenced Apr 6, 2016

Issue #1252: Fix creating a lot of new Jedis instances on unstable cluster, fix slots clearing without filling #1253

Merged

Introduces ConnectionBrokenDeterminer #1101

Open

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue redis#1252: Fix creating lot of new Jedis instances on unstable…

3798c1e

… cluster, fix slots clearing without filling

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue redis#1252: Acquire one long lock for trying all nodes when red…

f934dee

…iscover cluster

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue redis#1252: Fix creating lot of new Jedis instances on unstable…

a5961d0

… cluster, fix slots clearing without filling

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue redis#1252: Acquire one long lock for trying all nodes when red…

ec87a52

…iscover cluster

Spikhalskiy mentioned this issue Apr 13, 2016

Issue #1252: Random node + rediscovery on connection exception replaced with rediscovery at the end #1256

Merged

jpe42 mentioned this issue Apr 25, 2016

JedisCluster uses maxRedirects retries when cluster master is down #1238

Closed

skyhawk1981 mentioned this issue Dec 3, 2016

JedisCluster throws JedisConnectionException when a cluster master killed #1439

Closed

sazzad16 closed this as completed Nov 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snowball effect with reconnecting to poor performing node #1252

Snowball effect with reconnecting to poor performing node #1252

Spikhalskiy commented Apr 6, 2016

Spikhalskiy commented Apr 25, 2016 •

edited

Loading

marcosnils commented Apr 25, 2016 •

edited

Loading

sazzad16 commented Nov 21, 2017 •

edited

Loading

Snowball effect with reconnecting to poor performing node #1252

Snowball effect with reconnecting to poor performing node #1252

Comments

Spikhalskiy commented Apr 6, 2016

Spikhalskiy commented Apr 25, 2016 • edited Loading

marcosnils commented Apr 25, 2016 • edited Loading

sazzad16 commented Nov 21, 2017 • edited Loading

Spikhalskiy commented Apr 25, 2016 •

edited

Loading

marcosnils commented Apr 25, 2016 •

edited

Loading

sazzad16 commented Nov 21, 2017 •

edited

Loading