-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reactive messaging Emitter stops working correctly in dev and test modes #40118
Comments
/cc @cescoffier (reactive-messaging), @ozangunalp (reactive-messaging) |
@jponge too thanks @johnaohara for reporting, I see you already have searched where it doesn't make progress |
I've tried to reproduce it locally on 2 different machines with no success; I have to try with a new laptop and see if I'm lucky |
I don't believe we changed anything. @ozangunalp any idea? |
@johnaohara Thanks for the detailed explanation. There weren't any changes to the emitter, except the Mutiny update :) In AMQP you need to have a listener for the address for produced messages to get delivered, this may happen if during restart producer sends messages before the consumer starts listening. |
We have similar problem where our applications emitted messages are enqueued into the buffer of After a lot of debugging I believe the issue is that the The
Why the One way I can reproduce the issue is to suspend the application in
and stop the activemq and let the application run and start the activeMQ again. |
I am coming back to this now. In our case There is nothing complicated in the messaging setup atm. I basically have the following configuration in application.properties:
and an
I am calling the emitter with;
I don't think this is a recent behaviour. our CI has been failing intermittently for a while with errors related to this. My machine was working again, but is consistently failing atm. I will spend some more time digging into what is happening Thanks |
I've been spending some time to reproduce this. I can reproduce some reconnection scenarios, there is indeed one of them which throws an I think #40592 is also a similar problem. BTW I observe that because of the credit-based flow control, when a reconnection happens the order of outgoing messages are no longer respected, even with or without our retry mechanism. @johnaohara having requested to 0 would be expected if there isn't any requests from the downstream ( |
Hi @ozangunalp thank you for looking into this. It is worth noting that this problem does not just occur for me on restart, if I start our quarkus based app in dev mode (with the amq broker instantiated by dev services), the first invocation fails as well |
@johnaohara I can't reproduce the problem with the dev mode, using the code.quarkus.io amqp 1.0 messaging example (which sends messages at startup using an emitter) |
@ozangunalp it does not happen in all environments, and is not always reproducible. For example, my machine has been "working" (i.e. tests in our project work etc) for the past couple of weeks, but occasionally fails without any changes to the messaging code paths, or upgrades. Our CI fails occasionally often with messaging related tests where messages are not delivered, but if I re-run the tests they pass. I looks like a race cond at startup, when |
@ozangunalp I have noticed that we are currently running on an older quarkus release (3.8.4) and smallrye-reactive-messaging-amqp has been upgrade from 4.18.0 to 4.21.0 between quarkus 3.8.4 and 3.10.1 The implementation of I will try and upgrade to quarkus 3.10.0 and see if we still have the problem. |
I've tested the dev mode both on 3.8.4 and 3.10.1 with a raspberrypi (for a change). I couldn't reproduce it. There were some instances when forcing restart on dev mode, and not using the @johnaohara During your tests, were you able to check on the Artemis UI whether messages are queued and not delivered to the consumer? Or were messages stuck on the emitter buffer? |
@ozangunalp yes I checked the Aretmis UI and there are not messages in the queue. I can see the messages are all backed up in memory in BufferItemMultiEmitter.queue |
Trying to recreate this consistently has been tricky. I have a very simple application that exhibits the behaviour on the machine that tends to fail: https://github.com/johnaohara/quarkus-issue-40118 This application works as expected on my laptop. I have not been able to recreate the issue where the emitter fails on the first startup of the application, only on restart desktop:
To reproduce;
with error message
|
I have managed to trigger the condition on my "working" machine (although only once) There appears to be a race condition on I can set a debug breakpoint on setCredit to occasionally trigger the condition in the working env If I enable debug messages, on the working machine i see the following log;
on the failing machine:
|
In theory the proton stuff should always run from within their Netty event loop threads, @gemmellr Am i right? |
They do, I am now a little bit further down the stack, looks like on a restart, we are missing a |
That is correct, its expressly single threaded, so it cant 'race' unless being mis-used. |
If you are not seeing a flow event, the most likely reason is a flow frame was not sent. I'd be looking at whether you are falling foul of flow control by the broker to block production (by not flow'ing credit to send anything), e.g due to its current memory and/or disk limits configuration (e.g the broker defaults to a 90% max-disk-usage limit if not otherwise configured). You can more closely see what is actually sent using env variable PN_TRACE_FRM=true (on client and/or broker sides since both are using proton-j underneath) to provoke a protocol trace to stdout. |
@gemmellr thank you very much for the info. wrt the "race"; in quarkus dev mode, when the application is restarted, the i/o processing move from one eventloop thread to another. At start up all buffer processing is running on I have captured the network packets, and i am seeing the flow packet being retruned from the broker, with the expected number of credits. However, this is not being propagated to I am still currently digging through the code path to understand why this packet is not handled correctly. |
So this issue is caused by flow control :( The disk that was mounted into the broker container was 95% full, the first client connection works, but subsequent connections are not allocated and credits to send requests @ozangunalp Am wondering if we can capture this state and warn users when this is the case? atm it all appears to work, but there are no credits to send messages. This condition appears to be tested for already in |
Thank you very much @gemmellr for pointing me in the right direction! |
@johnaohara So it was the disk mounted to the container that's full? Actually, we've the message (on debug maybe it can be changed): |
yeah, looking back, it was shown as a debug message:
Maybe this could be a warning? idk what other situations where this could be an expected state. I think if this message had been a warning I would have started looking into that message |
Yes, that was the root cause, it was at 95% of capacity |
I agree, let's change that to warn. |
I wouldnt necessarily do that on every check, you may end up repeatedly emitting the warning any time new credit isnt being granted quite as fast as you could send, which may be entirely expected behaviour, at which point youll instead start getting questions about why it is warning in the course of doing what its meant to be doing (as it actually is now...). I'd either establish a larger time over which it is warned, or say it is actually info. |
I think that is a fair point, esp. when running in prod mode. When running in Dev/test mode, might we want a logging different level? |
Changed the warning in smallrye/smallrye-reactive-messaging#2632, integrated in #41851 |
Describe the bug
We have an application that uses AMQ broker for async message processing and were experiencing test failures in our test suite where messages are not being passed to the AMQ broker via
quarkus-smallrye-reactive-messaging-amqp
clientOur test suite uses an AMQ broker that is automatically provisioned by dev services
What I noticed was when we called
org.eclipse.microprofile.reactive.messaging.Emitter.send()
with a msg payload, the messages were being enqueued in a buffer, but not delivered to the underlying AMQ client. Therefore the messages were not delivered to the broker.In order to reproduce this issue, I created a sample application from code.quarkus.io, just selecting the
Messaging - AMQP Connector [quarkus-messaging-amqp]
extension.I found that if I start that application in dev mode the messages are processed as expected, but if i restart dev mode 3-4 times (by pressing
s
in the dev console) the messages are no longer delivered to the broker and they are buffered in a queue, in the same way our test suites behaves.There appears to be a race/bug where the
requested
counter inio.smallrye.mutiny.operators.multi.builders.BufferItemMultiEmitter
is set to0
during a restart and in test mode and prevents the call todrain()
from emitting the messagesThis does not appear to happen on all machines. I see this issue in Fedora 39 on x86_64 , but our CI environment (github) or Mac M2 does not demonstrate this behaviour.
Expected behavior
The messages in the sample app to be outputted every time quarkus is restarted in dev mode:
Actual behavior
After 1-2 restarts, the messages are no longer dispatched to the AMQ broker:
How to Reproduce?
Messaging - AMQP Connector [quarkus-messaging-amqp]
extensions
in the terminalOutput of
uname -a
orver
Linux fedora 6.8.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 20:45:21 UTC 2024 x86_64 GNU/Linux
Output of
java -version
openjdk version "21.0.1" 2023-10-17 LTS OpenJDK Runtime Environment Temurin-21.0.1+12 (build 21.0.1+12-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode, sharing)
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)3.9.3
Additional information
No response
The text was updated successfully, but these errors were encountered: