QQ: crash after snapshot installation #12635

kjnilsson · 2024-11-01T08:50:25Z

Describe the bug

Occasional crash of member after a snapshot installation due to attempt to read a command for an already consumed message. The reproduction steps are highly artificial but this crash has been seen in the wild a couple of times and could happen if a follower member on a node with consumers that come and go runs slowly.

2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0> **  [{lists,zipwith,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>             [#Fun<rabbit_fifo.60.126061837>,[],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>              [{1,[7352901|4]},{2,[7352904|4]}],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>              fail],
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>             [{file,"lists.erl"},{line,844}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {lists,zipwith,4,[{file,"lists.erl"},{line,845}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {rabbit_fifo,'-delivery_effect/3-anonymous-5-',4,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                   [{file,"rabbit_fifo.erl"},{line,2062}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {ra_server_proc,handle_effect,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                      [{file,"src/ra_server_proc.erl"},{line,1385}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {ra_server_proc,handle_effects,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                      [{file,"src/ra_server_proc.erl"},{line,1301}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {lists,foldl_1,3,[{file,"lists.erl"},{line,2151}]},
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>      {ra_server_proc,handle_effects,5,
2024-11-01 08:46:29.102566+00:00 [error] <0.23583.0>                      [{file,"src/ra_server_proc.erl"},{line,1301}]}]

Reproduction steps

This is easiest to re-create on 4.0.x but can happen on 3.13.x also

create a quorum queue "q1" in a 3 node cluster with the leader on rabbit-1 with the quorum_min_checkpoint_interval application config set to 1.
stop the member on rabbit-3: e.g. ra:stop_server(quorum_queues, {'%2F_q1', node()}).
publish 2 messages
trigger a checkpoint for the leader member: ra:cast_aux_command({'%2F_q1', 'rabbit-1@HOST'}, force_checkpoint).
publish 1 more message
Attach then detach a consumer for the queue connected to rabbit-3 (no message should be delivered but they will show as unacked)
purge the queue
restart the member on rabbit-3 ra:restart_server(quorum_queues, {'%2F_q1', node()}).
Observer a member crash on rabbit-3

The member may recover after step 9 - this is also, in fact, a bug.

Expected behavior

No crash

Additional context

currently a queue that experiences this error can be fixed by removing the faulty member from the quorum queue cluster, wait a bit and then re-adding it back using rabbitmq-queues delete_member and rabbitmq-queues add_member

The text was updated successfully, but these errors were encountered:

QQ: handle case where a stale read request results in member crash.

kjnilsson added the bug label Nov 1, 2024

kjnilsson added this to the 4.0.4 milestone Nov 1, 2024

kjnilsson mentioned this issue Nov 1, 2024

QQ: handle case where a stale read request results in member crash. #12636

Merged

michaelklishin added a commit that referenced this issue Nov 1, 2024

Merge pull request #12636 from rabbitmq/gh-12635

ef2c8df

QQ: handle case where a stale read request results in member crash.

michaelklishin closed this as completed in #12636 Nov 1, 2024

mergify bot mentioned this issue Nov 1, 2024

QQ: handle case where a stale read request results in member crash. (backport #12636) #12637

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QQ: crash after snapshot installation #12635

QQ: crash after snapshot installation #12635

kjnilsson commented Nov 1, 2024 •

edited

Loading

QQ: crash after snapshot installation #12635

QQ: crash after snapshot installation #12635

Comments

kjnilsson commented Nov 1, 2024 • edited Loading

Describe the bug

Reproduction steps

Expected behavior

Additional context

kjnilsson commented Nov 1, 2024 •

edited

Loading