-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Store] _cat/recovery APIs provides inconsistent results #12047
Comments
Hi, I would like to take this ticket. While conducting my investigation, I welcome your insights and recommendations on specific areas to focus on. |
Yes I believe this should be reproducible on a multi-node setup, hosting shards of few 100MBs, where we exclude IP of one node and trigger a relocation process of shards on the excluded node. |
@Bukhtawar Thanks! Would the following scenario be a good candidate? Imagine a two node cluster, flowchart TB
Primary_A
Primary_B
Primary_C
subgraph "Node 1"
Primary_A
Primary_B
Primary_C
end
subgraph "Node 2"
end
Next, we exclude the
This should (if I am not mistaken) trigger replication of all shards from flowchart TB
Primary_A--"Replicating"-->Replica_A
Primary_B--"Replicating"-->Replica_B
Primary_C--"Replicating"-->Replica_C
subgraph "Node 1"
Primary_A
Primary_B
Primary_C
end
subgraph "Node 2"
Replica_A
Replica_B
Replica_C
end
Now, while shards are being replicated, we can request I assume we need shards to be of a "larger size" only because we need to make sure the replication activity takes some time (enough time for us to be able to request counts and compare). How about if we instead throttle the amount of data for replication? This means that shards could be quite small but it will still take some time to replicate. Do you think this will also lead to issue reproduction? The point is that if using throttling is possible then we should be able to implement a regular unit test. |
Hi @Bukhtawar I was looking at this and I found that the following integration test is already testing something very similar:
For example it has a test called
I was experimenting and modified some tests and added "admin.cluster.health" request into them to get initializing and relocating shard counts and so far I was not able to spot/replicate the count discrepancy. Do you think it can be because the size of the index in the test is quite small (just couple of 100kbs)? Though, the test explicitly makes sure the counts are obtained while the recovery process is throttled and the shard recovery stage is not DONE (in other words the counts are compared while the recovery is still running). However, there is still another question I wanted to ask. Did you have anything specific in mind when you said:
Can you elaborate on this please? I will push modification of the test tomorrow so that you can see what I mean. |
This is WIP to drive the discussion further, do not merge it! Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>
Please see #12792 Can you think of some hits about what to change in order to recreate the issue? For example, do you think the shard recovery state stage has to be |
Adding @sachinpkale for his thoughts as well. Will take a look shortly |
@sachinpkale @Bukhtawar This PR is waiting on your inputs. Can you bring this to closure? |
Describe the bug
_cluster/health
API, the_cat/recovery?active_only
shows an inconsistent count of recoveries in progress.Related component
Storage:Remote
Expected behavior
Consistent API results
The text was updated successfully, but these errors were encountered: