DLX data safety guarantees with node restarts with classic queues on RabbitMQ 3.9 #12863
-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
RabbitMQ 3.9 has been out of support for many years. Try to reproduce this with RabbitMQ 4.0 (and therefore classic queues v2, since only v2 is available in 4.0). There's a pretty good chance things will work better just by upgrading. However, there's indeed no credit flow mechanism when dead lettering between classic queues. You can switch to quorum queues (even in a single-node environment potentially), since quorum queues do have more reliable dead-lettering in newer versions. We are not going to investigate issues for such old versions. |
Beta Was this translation helpful? Give feedback.
-
@usernameisnull DLX is not magic, it is a special kind of publisher. Publishing can and will fail when nodes are restarted. This is why Publisher Confirms exist, so that clients could track such cases and retry (or do something else). However, DLX is not a regular client and it cannot execute arbitrary code. So there are only two options to choose from in modern versions, and only for quorum queues who have sufficient data safety guarantees and focus to implement a reasonably safe retry for DLX. Specifically the source queue must be a quorum one, although using QQs with non-mirrored classic queues for dead lettered messages will negate most data safety characteristics of QQs. As @mkuratczyk has already explained, the "at least once" DLX strategy was introduced in 3.10. All versions in 3.x are out of community support and will receive no future open source releases. It's time to upgrade, possibly using a Blue/Green deployment strategy given that you are five release series behind. |
Beta Was this translation helpful? Give feedback.
-
I upgraded to 3.10 and haven't experienced data loss in this situation. However, sometimes when an OOM (Out of Memory) occurs, it seems that the dead letter queue receives more data than expected. Could this be because, with the quorum ensuring at least once delivery in 3.10, there might be multiple deliveries to the dead letter queue? |
Beta Was this translation helpful? Give feedback.
@usernameisnull DLX is not magic, it is a special kind of publisher. Publishing can and will fail when nodes are restarted. This is why Publisher Confirms exist, so that clients could track such cases and retry (or do something else).
However, DLX is not a regular client and it cannot execute arbitrary code. So there are only two options to choose from in modern versions, and only for quorum queues who have sufficient data safety guarantees and focus to implement a reasonably safe retry for DLX. Specifically the source queue must be a quorum one, although using QQs with non-mirrored classic queues for dead lettered messages will negate most data safety characteristics of QQs.
As @mkuratczyk…