Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowball effect with reconnecting to poor performing node #1252

Closed
Spikhalskiy opened this issue Apr 6, 2016 · 3 comments
Closed

Snowball effect with reconnecting to poor performing node #1252

Spikhalskiy opened this issue Apr 6, 2016 · 3 comments

Comments

@Spikhalskiy
Copy link
Contributor

We have a problem with JedisPool that aggravates perf issues of redis nodes when this issues only start to appear.

What we have:

  • Short timeouts. Like 3ms.
  • Significantly loaded redis that sometimes starts to respond slowly with number of connections like 40000 per node.

If redis is starting to stuck for 4ms, instead of each read, we do

  • read
  • after timeout and marking Jedis as broken, JedisFactory gently sends quit to Redis in destroyObject
  • we establish new connection
  • PING-PONG

and only after that we have new Jedis instance for new read, but... actually nothing changed, we could just continue to use old instance.

So, when our Redis Cluster start to experience some perf issues - we finish it off by invalidating Jedis.

Any thoughts?
Only one from me - maybe we could add an ability to pass some type of "InvalidationStrategy" to Jedis? For example, strategy by default will mark as broken and do everything like now and 3rd party can implement it's own strategy, for example, send PING-PONG before quit. "read with timeout - PING-PONG, give it a chance - read" looks better than current mandatory invalidation flow.

I could implement and provide PR for any solution solving or providing possibility to improve current standard flow.

What do you think?

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
… cluster, fix slots clearing without filling
Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
… cluster, fix slots clearing without filling
Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
(cherry picked from commit c567161)
@Spikhalskiy
Copy link
Contributor Author

Spikhalskiy commented Apr 25, 2016

Final version which is working in our prod and does it fine: https://github.com/Spikhalskiy/jedis/releases/tag/PP-2

It's master with merged related pull requests.
Issue could be closed after merging PRs to upstream.

@marcosnils
Copy link
Contributor

marcosnils commented Apr 25, 2016

#1249 #1251 #1253 #1256

@Spikhalskiy amazing contribution. I've had a rough weeks lately. As soon as I have some time I promise to look at those changes.

marcosnils pushed a commit that referenced this issue Jul 11, 2016
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster
Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Jul 18, 2016
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
marcosnils pushed a commit that referenced this issue Jul 19, 2016
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

Conflicts:
	src/main/java/redis/clients/jedis/JedisClusterInfoCache.java
marcosnils pushed a commit that referenced this issue Jul 19, 2016
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

Conflicts:
	src/main/java/redis/clients/jedis/JedisClusterInfoCache.java
marcosnils pushed a commit that referenced this issue Jul 19, 2016
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
marcosnils pushed a commit that referenced this issue Jul 19, 2016
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Conflicts:
	src/main/java/redis/clients/jedis/BinaryJedisCluster.java
	src/main/java/redis/clients/jedis/JedisCluster.java
marcosnils pushed a commit that referenced this issue Jul 19, 2016
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Conflicts:
	src/main/java/redis/clients/jedis/BinaryJedisCluster.java
	src/main/java/redis/clients/jedis/JedisCluster.java
@sazzad16
Copy link
Collaborator

sazzad16 commented Nov 21, 2017

Resolved by #1253 and #1256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants