-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to pre-declare all the used message bus queues on service startup (allow users to use non RabbitMQ kombu transports) #3639
Conversation
This is required when using worker model where all the services which consume from queues might not already be online before the producers. Because of the implementation details, this is not required with RabbitMQ transport, but it's requires with some other transports such as Redis one. This feature can be enabled by setting messaging.predeclare_queues st2.conf config option to True (False by default).
less likely to collide with other names.
# Because of the worker model used, this is required with some non-standard transports such as | ||
# Redis one. Even though kombu requires queues to be pre-declared upfront in such scenarios, | ||
# RabbitMQ transport doesn't require that. | ||
QUEUES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.
Curious if that approach can fix #3290
Will test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious as well :). Thanks for watching out @armab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked.
Sadly, #3290 MQ-related bug is still clearly reproducible with the new predeclare_queues=True
Xenial package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Le sigh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the fun of it - you could also try setting this config option value to True and using Redis transport to see if this makes a difference (if the issue is RabbitMQ related).
This is already a default value right now, but this way we avoid surprised in case default value changes in teh future.
@@ -155,7 +155,11 @@ def register_opts(ignore_errors=False): | |||
cfg.IntOpt('connection_retries', default=10, | |||
help='How many times should we retry connection before failing.'), | |||
cfg.IntOpt('connection_retry_wait', default=10000, | |||
help='How long should we wait between connection retries.') | |||
help='How long should we wait between connection retries.'), | |||
cfg.BoolOpt('predeclare_queues', default=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What queues? :P. I'd prefer us to be explicit and say AMQP queues or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of the messaging
config option through so I think it's fine as it is (we also don't include more verbose info in description for other messaging group options :)
st2common/st2common/config.py
Outdated
help='How long should we wait between connection retries.') | ||
help='How long should we wait between connection retries.'), | ||
cfg.BoolOpt('predeclare_queues', default=False, | ||
help=('True to pre-declare all the queues on service setup. This is required ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Set this to True
to pre-declare all AMQP queues on service startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
liveaction.get_status_management_queue(name='st2.preinit', routing_key='init'), | ||
reactor.get_trigger_cud_queue(name='st2.preinit', routing_key='init'), | ||
reactor.get_trigger_instances_queue(name='st2.preinit', routing_key='init'), | ||
reactor.get_sensor_cud_queue(name='st2.preinit', routing_key='init'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these fake queues that will never be consumed? Because the acutal work queues are named differently. What does rabbitmqadmin tell you about the size of these queues? Do they grow? See #3622. I am worried that the queue sizes would grow without consumers. Can you double check, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they will never be consumed - in fact only one queue which will never be used and grow will be created. Nothing publishes to them so they never grow.
We only do that, because that's the only way to make sure exchanges are created.
(virtualenv)vagrant@vagrant-ubuntu-trusty-64:/data/stanley$ sudo rabbitmqctl list_queues
Listing queues ...
...
st2.preinit 0
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the names are fixed. That's good. Thanks for digging in.
Thank you. |
This pull request adds ability to predeclare all the used message bus queues (and as such, exchanges) on service startup and allows users to use other non-RabbitMQ kombu transports which are not officially supported.
Background
#3635 inspired me to dig in a little to try to find out what is going on. After spending whole morning digging in and tracking what is going on, I managed to track down the root cause on why Redis kombu transport doesn't work.
#3638 fixed a small issue which was just related to retrying and error propagation, but the fix for the actual root cause of things not working is included in this PR.
When trying to use Redis kombu transport instead of RabbitMQ one, you got an error like this on each service start up:
If you inspect Redis keys you actually find out that none of the exchanges were pre-declared and that's why the issue arises and publishing a message to message bus exchange fails.
It turned out that only exchanges for queues which already exist were pre-declared (aka exchanges for which we already have consumers online). That's problematic and obviously won't work in distributed HA workers model.
In this model, producer can be online before the consumer (e.g. API is online before action runner and other services which actually bind a queue to a particular exchange). Another example of late binding is the stream service - we actually only bind to the exchange once first request hits
/v1/stream
endpoint aka stream service.In fact this happens very often in such setups and it's one of the benefits and reasons of using a worker model - you want messages to still be queued even if the consumer is not online yet and delivered / processed when consumer comes online (normal scenario for such model).
In our example this could mean action runner and other services not being online yet / being restarted / similar, but API service is online and user queues action execution. We obviously want to allow user to perform this operation and queue execution to be run which will be picked up and processed when the action runner service is online.
kombu documentation specifically clarifies this needs to be done in such scenarios (http://docs.celeryproject.org/projects/kombu/en/latest/userguide/producers.html), but because of the implementation details this is not actually required with RabbitMQ transport (nasty!) and our code works just fine with RabbitMQ transport (I guess we were just lucky).
Proposed solution
The solution which fixes this problem is to pre-declare all the queues which causes exchanges to be pre-declared in Redis.
After this change, Redis keyspace looks like this after service set up is ran.
After this change I tested basic functionality (run an action, etc.) and it seems to work fine, but who knows how many different edge case related issues could still be hiding underneath.
Having said that - RabbitMQ is still only officially supported and recommended backend by us and using any other backend is at user's own risk. You have been warned - if you use a non-officially supported backend we don't provide support for it and we can't guarantee everything will work. If you are fine with that, go ahead, but don't complain to us if things don't work.
Even though running this pre-declare code shouldn't in any negative way affect other existing stuff, I still decided to put it behind a feature flag (
messaging.predeclare_queues
config option).This whole pre-init / pre-declare phase is also nothing special and follows exactly the same pre-initialization approach which is required in a distributed system and we do for other things.
Resolves #3635.