Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do cluster slot renewal in a background thread #1347

Open
xetorthio opened this issue Jul 8, 2016 · 25 comments
Open

Do cluster slot renewal in a background thread #1347

xetorthio opened this issue Jul 8, 2016 · 25 comments

Comments

@xetorthio
Copy link
Contributor

Instead of doing it as soon as a MOVED was received from redis.

@marcosnils
Copy link
Contributor

marcosnils commented Jul 11, 2016

Hmm I'm wondering if this makes sense.

The only scenario where I believe this could be useful is if you have little traffic to the cache or a specific node and the background renewal happens before the user gets a MOVED response. I still believe this scenario is pretty rare and I don't really see the benefit of doing this.

IMHO slot migration (because of failures or new node additions) shouldn't be something that happens often, so I don't think it's bad to lock if two queries get a MOVED at the same time and wait until the cluster heals. In addition to this, some questions came to my mind and I'm wondering if you considered them:

  • How often will you perform slot renewal by default?. Even though you can make this configurable, default should be good enough to fit "most" cases.
  • I guess you'll still be doing slot renewal on MOVED responses as well?. How would you coordinate the background renewal vs the one triggered by the MOVED response?

A hybrid idea that might work is to perform slot renewal in a separated thread only when receiving a MOVED response. As it is today, Jedis won't return a response to the user until the slow renewal has finished, maybe we can improve this and perform the slow renewal in a separate thread so we can let new queries to be processed.

@HeartSaVioR @antirez WDYT?

@xetorthio
Copy link
Contributor Author

I think you got it wrong. It doesn't need to happen in background before a
MOVED to make sense. The whole idea is not to wait for the cache to catch
up when there is a MOVED, there is no need. We already know where to go,
why penalizing the command?
The idea is to continue and serve the command, and just update the cache in
the background. This also avoid unnecessary locks on the cache.
So it not only makes the command response latency smoother in general but
also probably make the code easier to follow.

On Mon, Jul 11, 2016, 08:28 Marcos Nils notifications@github.com wrote:

Hmm I'm wondering if this makes sense.

The only scenario where I believe this could be useful is if you have
little traffic to the cache or a specific node and the background renewal
happens before the user gets a MOVED response. I still believe this
scenario is pretty rare and I don't really see the benefit of doing this.

IMHO slot migration (because of failures or new node additions) shouldn't
be something that happens often, so I don't think it's bad to lock if
two queries get a MOVED at the same time and wait until the cluster heals.
In addition to this, some questions came to my mind and I'm wondering if
you considered them:

How often will you perform slot renewal by default?. Even though you
can make this configurable, default should be good enough to fit "most"
cases.

I guess you'll still be doing slot renewal on MOVED responses as
well?. How would you coordinate the background renewal vs the one triggered
by the MOVED response?

A hybrid idea that might work is to perform slot renewal in a separated
thread only when receiving a MOVED response. As it is today, Jedis won't
return a response to the user until the slow renewal has finished, maybe we
can improve this and perform the slow renewal in a separate thread so we
can let new queries to be processed.

@HeartSaVioR https://github.com/HeartSaVioR WDYT?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1347 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIhp3hGAy-aQCSSe7usHbhZGK38vvnkks5qUijfgaJpZM4JID3f
.

@marcosnils
Copy link
Contributor

The idea is to continue and serve the command, and just update the cache in
the background. This also avoid unnecessary locks on the cache.

Ok, I got it wrong then. This is something similar to what I thought about 😄

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Jul 11, 2016

The idea is to continue and serve the command, and just update the cache in the background. This also avoid unnecessary locks on the cache.

Please correct me if I'm wrong, but then slot will be shared variable between user threads and background thread. If we don't set a lock, while background thread reconstructs slots operations can be probably failing. We should make operations waiting anyway while slot is reconstructing.

@marcosnils
Copy link
Contributor

marcosnils commented Jul 11, 2016

If we don't set a lock, while background thread reconstructs slots all operations can be probably failing

@HeartSaVioR how so?. Threads that don't see the change will still get the MOVED response and will be redirected to the new node while new threads will go to the correct node. The background thread can only process one batch slot update at a time.

This way, you won't make any transactions to wait until slot renewal finishes.

@HeartSaVioR
Copy link
Contributor

We are "moving" the slot information, which means we "delete" the slot, and "add" the slot. It can't be atomic, and it means there's race condition between main thread and background thread.

But I like the idea @xetorthio said. Main thread doesn't need to wait for cache to be updated. If we handle the case what I'm saying properly, I'm OK to change.

@HeartSaVioR
Copy link
Contributor

Yeah, I think I did a mistake again. Slots are stored to a concurrent hashmap. I'm OK with this. Great idea.

@marcosnils
Copy link
Contributor

Yeah, I think I did a mistake again. Slots are stored to a concurrent hashmap. I'm OK with this. Great idea.

I was about to say this :D

@xetorthio
Copy link
Contributor Author

Also current implementation cleans all slots before filling them. We could avoid doing that. Just update the slots without doing the cleanup. This way we'll be making a better use of the ConcurrentHashMap

@xetorthio
Copy link
Contributor Author

We could use a SingleThreadExecutor to make sure there in a single background thread and if it crashes for whatever reason, it will be re-launched.
The only thing missing is a signaling mechanism to tell the thread to "wake up" and renew slots when a MOVED was found.
Still thinking about hot to do it. Any ideas?

@HeartSaVioR
Copy link
Contributor

AFAIK, we can just create a new task which contains slot information from MOVED response to that ExecutorService whenever we found MOVED, and background thread runs every new task.

@xetorthio
Copy link
Contributor Author

Ok. That means we never ask for slots again, right?

On Wed, Jul 13, 2016, 01:11 Jungtaek Lim notifications@github.com wrote:

AFAIK, we can just create a new task which contains slot information from
MOVED response to that ExecutorService whenever we found MOVED, and
background thread runs every new task.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1347 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIhp_J5WMACDgqiIDwY8rrMHmcBTBeGks5qVGV4gaJpZM4JID3f
.

@marcosnils
Copy link
Contributor

@HeartSaVioR but won't tasks queue if we do this?. The idea is that there can one be one task in the queue, it wouldn't make sense to queue slot renewal tasks.

@xetorthio
Copy link
Contributor Author

I'm not sure I understood you comment @marcosnils

@marcosnils
Copy link
Contributor

@HeartSaVioR suggest adding a new tasks to the ExecutorService each time a MOVED response is returned. The SingleThreadExecutor will execute one task at a time but every time you add a task the task will be queued for later execution. What I'm saying is that there shouldn't be any queuing in this process, only one task can be processed at a time and that's it.

@xetorthio
Copy link
Contributor Author

But how your suggestion works? Imagine you get a MOVED at the same time from 2 different jedis instances. Both need to do something. What do they do? With what @HeartSaVioR suggest, both queue them in the executor. And eventually the background thread will refresh the cache.

What will your suggestion do?

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Jul 13, 2016

@xetorthio @marcosnils
I'd like to clarify once more. There're two different strategy of updating slot information when we get MOVED.

  1. update one slot which is based on MOVED response
    Then I'd like to see how much time a thread is waiting for concurrent map to write. If it's fairly tiny we even don't need to make slot update fire and forget. Just update it.
  2. update whole slots (say updating all 16384 slots)
    We need to set strategy in this case.

What we're currently doing is in fact slowest but clearest to resolve this scenario. No race condition because it's guarded with RW lock.
Apart from race condition, there might be some threads throwing MOVED which reads slot information before write lock taking effect.

One sketched idea to resolve is comparing fetched timestamp (I mean the moment when the thread queries to the cache) and slot updated timestamp (in cache), and do update slots when only fetched timestamp is greater than slot updated timestamp. (which means that it queries to Redis Cluster with latest update of slot information but Redis responds MOVED.)

If we don't want to maintain RW lock but want to update whole slots, sketched idea could be extended to have timestamp for each slot (might be crazy to maintain timestamp of each slot). It's just a sketched idea so we could have better alternatives or this idea could be improved.

tl.dr; Summary of my sketched idea is: if borrowed thread got MOVED response, check that it refers latest information of that slot. If it does, we need to update slot(s). If not, we don't need to update slot(s).

@xetorthio
Copy link
Contributor Author

I think that updating only one slot has it's benefits, but it could be also
bad. This depends on the scenario and what exactly happens is the cluster.
I guess that a MOVED shouldn't be something that happens very frequently,
so updating all slots when it happens sounds pretty good.

Now this doesn't need to be too strict. Because if cache is not up to date,
everything still works, only that we'll have hops.

So probably the best solution is the one that it' simplest to understand,
code and maintain.

A background thread that does the update sounds good. What triggers the
update is still something I'm not sure of.

On Wed, Jul 13, 2016, 11:23 Jungtaek Lim notifications@github.com wrote:

@xetorthio https://github.com/xetorthio @marcosnils
https://github.com/marcosnils
I'd like to clarify once more. There're two different strategy of updating
slot information when we get MOVED.

update one slot which is based on MOVED response
Then I'd like to see how much time a thread is waiting for concurrent
map to write. If it's fairly tiny we even don't need to make slot update
fire and forget. Just update it.
2.

update whole slots (say updating all 16384 slots)
We need to set strategy in this case.

What we're currently doing is in fact slowest but clearest to resolve this
scenario. No race condition because it's guarded with RW lock.
Apart from race condition, there might be some threads throwing MOVED
which reads slot information before write lock taking effect.

One sketched idea to resolve is comparing fetched timestamp (I mean the
moment when the thread queries to the cache) and slot updated timestamp (in
cache), and do update slots when only fetched timestamp is greater than
slot updated timestamp. (which means that it queries to Redis Cluster with
latest update of slot information but Redis responds MOVED.)

If we don't want to maintain RW lock but want to update whole slots,
sketched idea could be extended to have timestamp for each slot. It's just
a sketched idea so we could have better alternatives or this idea could be
improved.

tl.dr; Summary of my sketched idea is: if borrowed thread got MOVED
response, check that it refers latest information of that slot. If it does,
we need to update slot(s). If not, we don't need to update slot(s).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1347 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIhpw5MTLQy-adY_ezNT0cFJK0XR8I7ks5qVPT6gaJpZM4JID3f
.

@mp911de
Copy link
Contributor

mp911de commented Jul 13, 2016

IMHO updating slot-wise on each MOVED redirection isn't necessarily efficient. lettuce follows the concept of having multiple strategies

  1. Regular refresh(polling in time intervals)
  2. Adaptive refresh

The adaptive topology refresh listens to persistent disconnects, MOVED and ASK redirections (configurable). If the cluster connection runs into a -MOVED error, then this is the signal to trigger a topology update (in lettuce topology updates are async anyway).

Another point to consider is a rate-limiting. Busy applications can run into 1000's of events per second, you don't want to trigger that many updates.

Integrating Threads into a framework always pulls in more stuff than expected. You don't want to launch a Thread per connection as that's expensive and with sharing Thread s you quickly end up adding configuration facilities. Wanted just mention that point to you . So sharing an ExecutorService with single/pooled implementation would be good when you want to perform updates using worker threads.

@xetorthio
Copy link
Contributor Author

Thanks for your input @mp911de
Checking the ruby and nodejs client, they both do what we where thinking at the beginning of this thread. They handle the MOVED and serve the command, and then trigger the refresh. There is no regular refresh.
Now one thing that I like about ioredis, is that it does 2 things when it gets a MOVED. It changes the slot and then triggers the refresh (async of course). The good thing about this, is that any other command that happen to ask for the same slot, which the background thread refreshes the cache, will go to the right place. So it is a nice approach we should consider.

@DanielYWoo
Copy link

IMHO when client receives a MOVED command, we should not deem it as an exception ( JedisMovedDataException), it's quite normal during migration. We can just continue to serve MOVE command and update the slot cache in a background thread. Since redis is not a CP model in CAP, we can sacrifice consistency for liveness.
Actually, we don't even have to do it each time we see a MOVED response, when there is a brain-split each clients will frequently update the slot cache in vain, when the notorious "too many redirects" error occurs, our clients will be spending most of IO resources to update the slot cache, and update it wrongly.

@jocull
Copy link

jocull commented Jul 12, 2023

This thread is very old, but I don't think the problem has been solved, or it has not been solved consistently. I'm not overly familiar with all of the different cluster, connection pool, configuration options, but I do have a requirement to hold a consistent connection for a particular hash slot for a series of operations. Here is the basic code I worked out, after ruling out JedisCluster:

try (Connection conn = connectionProvider.getConnectionFromSlot(JedisClusterCRC16.getSlot(aStringKey))) {
    final Jedis client = new Jedis(conn);
    final String result = client.get(aStringKey); // ?
}

If the client sees a MOVED error that might work, I did not try it. But if the client gets a connection exception because the host has gone offline it never seems to recover from this, even after the cluster has performed a failover and recovered itself. It will be persistently stuck until you take care of the problem.

One solution is something like this - similar to Lettuce's adaptive refresh:

try (Connection conn = connectionProvider.getConnectionFromSlot(JedisClusterCRC16.getSlot(aStringKey))) {
    try {
        final Jedis client = new Jedis(conn);
        final String result = client.get(aStringKey); // ?
    } catch (JedisConnectionException ex) { // Only catches connection errors, will likely miss `MOVED` or others
        connectionProvider.renewSlotCache();
        // enter a retry loop here
    }
}

Another solution, or a companion solution in case you need both is to do the periodic refresh that was suggested above. Lettuce has this option and it has become absolutely necessary for stability in production.

// Somewhere in your application setup...
final ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
final ClusterConnectionProvider connectionProvider = new ClusterConnectionProvider(hosts, jedisClientConfig);
scheduledExecutorService.scheduleAtFixedRate(connectionProvider::renewSlotCache, 0, 15, TimeUnit.SECONDS);

@jocull
Copy link

jocull commented Jul 12, 2023

What exception did you see? JedisMovedDataException or something else?

I was primarily testing cluster destruction and failovers where a primary is lost. In that case you see socket exceptions, wrapped under a JedisConnectionException and the cluster topology needs to be refreshed from another source since the primary for the slot (at the time of the last slot cache update) is offline.

@jocull
Copy link

jocull commented Jul 12, 2023

I did test this with JedisCluster and it works very well on failovers. I think the implementations are different when you take direct control over a Connection. Perhaps I can work around that requirement to hold a specific connection by using a pipeline or some other transaction mechanism.

I was migrating the test from Lettuce where the equivalent of JedisCluster::waitReplicas does not accept a key and thus cannot be routed properly by the cluster management. I originally missed that in Jedis that is not the case and it simplifies the implementation greatly! 😄

Copy link

This issue is marked stale. It will be closed in 30 days if it is not updated.

@github-actions github-actions bot added the stale label Jul 12, 2024
@sazzad16 sazzad16 removed the stale label Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants