You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Occasional crash of member after a snapshot installation due to attempt to read a command for an already consumed message. The reproduction steps are highly artificial but this crash has been seen in the wild a couple of times and could happen if a follower member on a node with consumers that come and go runs slowly.
This is easiest to re-create on 4.0.x but can happen on 3.13.x also
create a quorum queue "q1" in a 3 node cluster with the leader on rabbit-1 with the quorum_min_checkpoint_interval application config set to 1.
stop the member on rabbit-3: e.g. ra:stop_server(quorum_queues, {'%2F_q1', node()}).
publish 2 messages
trigger a checkpoint for the leader member: ra:cast_aux_command({'%2F_q1', 'rabbit-1@HOST'}, force_checkpoint).
publish 1 more message
Attach then detach a consumer for the queue connected to rabbit-3 (no message should be delivered but they will show as unacked)
purge the queue
restart the member on rabbit-3 ra:restart_server(quorum_queues, {'%2F_q1', node()}).
Observer a member crash on rabbit-3
The member may recover after step 9 - this is also, in fact, a bug.
Expected behavior
No crash
Additional context
currently a queue that experiences this error can be fixed by removing the faulty member from the quorum queue cluster, wait a bit and then re-adding it back using rabbitmq-queues delete_member and rabbitmq-queues add_member
The text was updated successfully, but these errors were encountered:
Describe the bug
Occasional crash of member after a snapshot installation due to attempt to read a command for an already consumed message. The reproduction steps are highly artificial but this crash has been seen in the wild a couple of times and could happen if a follower member on a node with consumers that come and go runs slowly.
Reproduction steps
This is easiest to re-create on 4.0.x but can happen on 3.13.x also
quorum_min_checkpoint_interval
application config set to 1.ra:stop_server(quorum_queues, {'%2F_q1', node()}).
ra:cast_aux_command({'%2F_q1', 'rabbit-1@HOST'}, force_checkpoint).
ra:restart_server(quorum_queues, {'%2F_q1', node()}).
The member may recover after step 9 - this is also, in fact, a bug.
Expected behavior
No crash
Additional context
currently a queue that experiences this error can be fixed by removing the faulty member from the quorum queue cluster, wait a bit and then re-adding it back using
rabbitmq-queues delete_member
andrabbitmq-queues add_member
The text was updated successfully, but these errors were encountered: