JedisCluster didn't reconnect with new IP when domain changes #2907

mrpre · 2022-02-11T08:11:00Z

Expected behavior

I use the domain not the IP of Redis Server as parameter of JedisCluster

Set<HostAndPort> testjedisClusterNode = new HashSet<HostAndPort>();
testjedisClusterNode.add(new HostAndPort("mydomain.com", 6379));
new JedisCluster(testjedisClusterNode,...)

But when the Redis is rebuilt and it's ip changes, It seems that the JedisCluster isn't reconfigured by the new ip.

Actual behavior

User must restart the client manually even the server has recovered.
Typically, more and more Servers run on Kubernetes as POD and POD's IP can be changed easily

Steps to reproduce:

Redis / Jedis Configuration

Jedis version:

Latest

Redis version:

6.0

Java version:

8

The text was updated successfully, but these errors were encountered:

mrpre · 2022-02-11T11:45:59Z

Well...after searching the PR for a while, I find @yangbodong22011 has already create the PR #2643. I'm not clear enough about the discussion under pr but is there any smooth way for client users to recover their clients except for restarting ? @sazzad16

sazzad16 · 2022-02-11T12:10:55Z

@mrpre Is it possible to recreate JedisCluster when you're rebuilding Redis?

#2643 does address your issue but it would break many/most existing applications. We're trying to find a solution without breaking but this is taking some time.

We'd welcome any PR regarding this.

PS: Exact version helps better than "Latest".

yangbodong22011 · 2022-02-14T03:01:53Z

@sazzad16 Hi, i think JedisClsuter should stick to this principle: as long as user-configured startNodes (physical ip, vip, domain) are available, then JedisCluster is normal.

Therefore, in renewClusterSlots, we should prefer user-configured "rock-solid" startNodes over getShuffledNodesPool. This is a bug fix, not a breaking change.

#2643 does address your issue but it would break many/most existing applications.

I think it's a background behavior that won't break the user's existing program, can you be specific about your concerns?

sazzad16 · 2022-02-14T15:51:44Z

@yangbodong22011 Let's consider a cluster with master nodes M1, M2, M3 and their respective replicas R1, R2, R3. We provide [M1] as startNodes in JedisCluster. Now, if M1 goes down, the entire JedisCluster would be unusable because it would try to connect only through startNodes which contains only one node M1 which is unavailable.

getShuffledNodesPool() approach would not fail in this scenario. I hope this clarifies.

yangbodong22011 · 2022-02-15T01:45:46Z

@sazzad16 Thanks for your explanation.
I realize the problems with this change. In our production environment, the vast majority of services will use a domain name or vip instead of a physical ip. What do you think of the following steps?

First use startNodes to call Cluster Slots
If it fails, fallback to getShuffledNodesPool()

These two orders cannot be reversed, otherwise the previous problem will still exist.

mrpre · 2022-02-15T03:25:59Z

Users don't know when they should recreate JedisCluster(when "connect refused" happen?) or it's too costly for user to code. It is preferred such accident can be closed in JedisCluster.
Also I agree changes should not break any existing application.

sazzad16 · 2022-02-15T06:28:11Z

@yangbodong22011 I was thinking about having modes for the users to choose from. Something like:

CLUSTER_NODES (default)
START_NODES (your 1st choice)
CLUSTER_NODES_THEN_START_NODES
START_NODES_THEN_CLUSTER_NODES (your 2nd choice)

yangbodong22011 · 2022-02-16T03:33:02Z

@yangbodong22011 I was thinking about having modes for the users to choose from. Something like:

CLUSTER_NODES (default)

START_NODES (your 1st choice)

CLUSTER_NODES_THEN_START_NODES

START_NODES_THEN_CLUSTER_NODES (your 2nd choice)

@sazzad16 I think this background behavior can be done correctly without burdening the user with choices. Let's keep simple.
p.s. you mean CLUSTER SLOTS right ? not CLUSTER NODES

sazzad16 · 2022-02-16T04:27:42Z

@yangbodong22011 I meant CLUSTER_NODES (with an underscore) as a mode.

yangbodong22011 · 2022-02-16T07:53:33Z

@yangbodong22011 I meant CLUSTER_NODES (with an underscore) as a mode.

Do you mean to add CLUSTER NODES to the command to get the topology? (Currently CLUSTER SLOTS)

EDIT：I think CLUSTER_NODES just means the previous getShuffledNodesPool, right?

sazzad16 · 2022-02-16T08:02:09Z

I think CLUSTER_NODES just means the previous getShuffledNodesPool, right?

@yangbodong22011 Yes.

yangbodong22011 · 2022-02-16T08:08:15Z

@sazzad16 OK , back to the topic
I hope to agree with you on START_NODES_THEN_CLUSTER_NODES.
I don't want us to reveal too many concepts for users to choose from and understand.

sazzad16 · 2022-02-16T08:11:08Z

@yangbodong22011 Or, simply, we can start with first two. We'll see later about others.

yangbodong22011 · 2022-02-16T08:29:56Z

@yangbodong22011 Or, simply, we can start with first two. We'll see later about others.

I insist that this strategy should not be exposed to users, we just need to do it silently. This strategy doesn't make sense to users and only confuses them.

Also, no Redis client exposes such a policy. Lettuce provides Refresh the cluster topology view for the user to choose, call cluster nodes at intervals or call cluster nodes when it encounters moved, the implementation of Jedis is the second.

sazzad16 · 2022-02-16T09:56:10Z

I insist that this strategy should not be exposed to users, we just need to do it silently. This strategy doesn't make sense to users and only confuses them.

@yangbodong22011 Sorry, I'm missing what you're meaning by "this".

Also, if I'm not mistaken, Lettuce provides many options.

yangbodong22011 · 2022-02-16T10:07:42Z

Sorry, I'm missing what you're meaning by "this".

"this" means multiple modes. I think we should only support one correct mode, Currently I think it is: START_NODES_THEN_CLUSTER_NODES

yes, lettuce provides many options, but it doesn't give the user the option to choose START_NODES OR CLUSTER_NODES, the user controls is when they care to perform a cluster topology refresh.

sazzad16 · 2022-02-16T12:03:55Z

@dengliming thoughts?

dengliming · 2022-02-16T13:39:06Z

Hey. Sorry for the late reply. IMO I agree with @yangbodong22011. Providing more mode will increase the cost of user understanding. We can just do one model for now and see later if any users need multi mode for more scenarios. BTW Would it be better if START_NODES => ORIGINAL_NODES ?

sazzad16 · 2022-02-16T17:11:21Z

Would it be better if START_NODES => ORIGINAL_NODES ?

@dengliming Which order do you prefer?

I think doing ORIGINAL_NODES first doesn't make that much sense for the case of @yangbodong22011 and @mrpre

dengliming · 2022-02-17T02:11:33Z

ORIGINAL_NODES

Just a naming opinion. never mind : )

Agree with START_NODES_THEN_CLUSTER_NODES

sazzad16 · 2022-02-17T04:55:44Z

There won't be any naming as there won't be multiple modes 😃

sazzad16 · 2022-02-17T04:56:31Z

@yangbodong22011 Would you like to work on this?

yangbodong22011 · 2022-02-17T05:04:58Z

@yangbodong22011 Would you like to work on this?

Of course, I will continue to modify #2643 .

sazzad16 · 2022-02-17T10:11:36Z

Resolved by #2643

eldarj · 2023-04-05T08:16:51Z

@sazzad16 @yangbodong22011 seeing that this PR didn't require a lot of changes, but it fixes a huge problem - are there any chances for backporting this, for a previous jedis version, which doesn't require Spring 3.x?

sazzad16 · 2023-04-05T10:44:35Z

seeing that this PR didn't require a lot of changes, but it fixes a huge problem - are there any chances for backporting this, for a previous jedis version, which doesn't require Spring 3.x?

@eldarj We'll consider if we see user demand.

eldarj · 2023-04-06T06:45:46Z

@sazzad16 I'm sure there will be, going forward for most teams this will be an issue, especially considering k8s usage and considering that upgrading to Spring Boot 3.x isn't straight forward.

Can we be proactive here and do it straight away? I'd volunteer if needed.

eldarj · 2023-04-06T07:31:09Z

@sazzad16 @yangbodong22011 Actualy, I just tried this out with Spring Boot 3.0.1 and jedis 4.3.2, and it seems that the issue is still there.

Maybe I missunderstood this issue and the PR in the first place, but what I'm trying to accomplish is to have jedis re-discover redis nodes, after those nodes' IPs change.

What I'm doing and seeing is the following:

Run redis-sentinel in k8s
Run a sample java service with jedis, configuring it with the k8s redis service host name i.e. rfs-redis-sentinel-cluster and master mymaster
On startup jedis connects to redis, resolves all nodes i.e. IPs
I delete all pods in redis-sentinel in order to acquire new IPs
jedis throws

redis.clients.jedis.JedisSentinelPool    : Lost connection to Sentinel at rfs-redis-sentinel-cluster:26379. Sleeping 5000ms and retrying.
redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.
...
[ns-pool-evictor] redis.clients.jedis.JedisFactory         : Error while validating pooled Jedis object.
...

After a while I see jedis logs the following which seems weird

[-cluster:26379]] redis.clients.jedis.JedisSentinelPool    : Created JedisSentinelPool to master at 127.0.0.1:6379

Any following operation with e.g. redisTemplate fails with

org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection] with root cause redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to any host resolved for DNS name.

Did I miss something? Was this supposed to be solved, or it was some other problem?

sazzad16 · 2023-04-06T10:11:08Z

Did I miss something? Was this supposed to be solved, or it was some other problem?

@eldarj This issue is related to Cluster but yours is related to Sentinel.

eldarj · 2023-04-07T06:37:12Z

Okay, I'll have to check why would Cluster be any different from Sentinel

If you can point me in the right direction, that would be great. Why are these two different? @sazzad16

Also, if the same needs to be implemented for Sentinel, do you guys accept PR contributions, or should I create a separate issue then?

sazzad16 · 2023-04-09T05:44:05Z

@eldarj I couldn't find a direct comparison in redis site. Here are two separate links from there:

Cluster: https://redis.io/docs/management/scaling/
Sentinel: https://redis.io/docs/management/sentinel/

Coming to the comparison, there are resources available elsewhere online. For example:

https://www.baeldung.com/redis-sentinel-vs-clustering

Hope these help.

if the same needs to be implemented for Sentinel, do you guys accept PR contributions

Yes, of course.

should I create a separate issue then?

This helps to attract more eyes and may lead to resolve it faster and/or more efficiently.

yangbodong22011 · 2023-04-10T02:12:26Z

@eldarj This is a closed state, you can open a new issue to let the problem continue, in the new issue, please note:

Code, what is your initialization code like?
Architecture, How many sentinel nodes are configured behind rfs-redis-sentinel-cluster? 1 or 3 or more?

sazzad16 added the discussion needed label Feb 16, 2022

sazzad16 removed the discussion needed label Feb 17, 2022

sazzad16 closed this as completed Feb 17, 2022

sazzad16 added this to the 4.2.0 milestone Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JedisCluster didn't reconnect with new IP when domain changes #2907

JedisCluster didn't reconnect with new IP when domain changes #2907

mrpre commented Feb 11, 2022 •

edited

Loading

mrpre commented Feb 11, 2022

sazzad16 commented Feb 11, 2022

yangbodong22011 commented Feb 14, 2022

sazzad16 commented Feb 14, 2022

yangbodong22011 commented Feb 15, 2022 •

edited

Loading

mrpre commented Feb 15, 2022

sazzad16 commented Feb 15, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022 •

edited

Loading

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

dengliming commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

dengliming commented Feb 17, 2022

sazzad16 commented Feb 17, 2022

sazzad16 commented Feb 17, 2022

yangbodong22011 commented Feb 17, 2022

sazzad16 commented Feb 17, 2022

eldarj commented Apr 5, 2023 •

edited

Loading

sazzad16 commented Apr 5, 2023

eldarj commented Apr 6, 2023 •

edited

Loading

eldarj commented Apr 6, 2023

sazzad16 commented Apr 6, 2023

eldarj commented Apr 7, 2023

sazzad16 commented Apr 9, 2023

yangbodong22011 commented Apr 10, 2023

JedisCluster didn't reconnect with new IP when domain changes #2907

JedisCluster didn't reconnect with new IP when domain changes #2907

Comments

mrpre commented Feb 11, 2022 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce:

Redis / Jedis Configuration

Jedis version:

Redis version:

Java version:

mrpre commented Feb 11, 2022

sazzad16 commented Feb 11, 2022

yangbodong22011 commented Feb 14, 2022

sazzad16 commented Feb 14, 2022

yangbodong22011 commented Feb 15, 2022 • edited Loading

mrpre commented Feb 15, 2022

sazzad16 commented Feb 15, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022 • edited Loading

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

yangbodong22011 commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

dengliming commented Feb 16, 2022

sazzad16 commented Feb 16, 2022

dengliming commented Feb 17, 2022

sazzad16 commented Feb 17, 2022

sazzad16 commented Feb 17, 2022

yangbodong22011 commented Feb 17, 2022

sazzad16 commented Feb 17, 2022

eldarj commented Apr 5, 2023 • edited Loading

sazzad16 commented Apr 5, 2023

eldarj commented Apr 6, 2023 • edited Loading

eldarj commented Apr 6, 2023

sazzad16 commented Apr 6, 2023

eldarj commented Apr 7, 2023

sazzad16 commented Apr 9, 2023

yangbodong22011 commented Apr 10, 2023

mrpre commented Feb 11, 2022 •

edited

Loading

yangbodong22011 commented Feb 15, 2022 •

edited

Loading

yangbodong22011 commented Feb 16, 2022 •

edited

Loading

eldarj commented Apr 5, 2023 •

edited

Loading

eldarj commented Apr 6, 2023 •

edited

Loading