Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide easy onramp for users of fedmsg.tail_messages() users #173

Open
dustymabe opened this issue May 13, 2019 · 7 comments
Open

provide easy onramp for users of fedmsg.tail_messages() users #173

dustymabe opened this issue May 13, 2019 · 7 comments

Comments

@dustymabe
Copy link

I think one of the easiest (though probably inefficient) ways to consume fedmsg was to just run fedmsg.tail_messages().

#!/usr/bin/python3
import fedmsg

# Set fedmsg logging to not print warnings
import logging
logger = logging.getLogger('fedmsg')
logger.setLevel(logging.ERROR)

def main():

    # Grab messages from fedmsg and process them as we go
    print("Starting listening for fedmsgs..")
    for name, endpoint, topic, msg in fedmsg.tail_messages():
        print(topic)

if __name__ == '__main__':
    main()

With this simple read/only operation one can trigger on pretty much any fedmsg and react to changes in Fedora. I think it will ease the transition to fedora messaging if we provide a very similar way for people to consume fedora-messaging. Here is an example where another user is using tail_messages().

Right now it is recommended that people use the CLI to invoke fedora messaging using a command like fedora-messaging consume --callback=fedora_messaging.example:printer. This works but it's a bit awkward to change a script that was previously being called directly to now be imported and be called by something else. For example if previously I was telling people to call my script (foo.py) in order to run this service now I have to tell them to copy the file into the correct location and call fedora-messaging consume --callback=foo.example:do-something.

It would be great if we could ease the transition by doing a few things:

  • make the shipped config connect to fedora servers by default

It would be nice if I didn't have to cp /etc/fedora-messaging/fedora.toml /etc/fedora-messaging/config.toml. I also didn't do the sed step from https://fedora-messaging.readthedocs.io/en/latest/fedora-broker.html#getting-connected but I imagine if that is valuable if we automate it would make it where not everyone would use the 00000000-0000-0000-0000-000000000000 queue.

  • provide a nice alternative to fedmsg.tail_messages() to be used as an almost drop in replacement

I think I should be able to do this with something like this (note it does still use a callback so not quite the same, but close):

#!/usr/bin/python3
from fedora_messaging import config, api, exceptions
import logging

# Set local logging 
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


def process_message(msg):
    print(msg.topic)

def main():
    config.conf.setup_logging()
    # Start consuming messages using our callback. This call will block until
    # a KeyboardInterrupt is raised, or the process receives a SIGINT or SIGTERM
    # signal.
    api.consume(process_message)

if __name__ == '__main__':
    main()

but that currently is hitting a permissions issue and it also requires us to set up logging config.conf.setup_logging() and I think it's missing setting up signal handling.

so maybe we can create a new function in the api that will be a convenience (basically the same thing as the cli.consume()) function, but able to be called from within python like api.consume() can be.

@jeremycline
Copy link
Member

I think one of the easiest (though probably inefficient) ways to consume fedmsg was to just run fedmsg.tail_messages().

#!/usr/bin/python3
import fedmsg

# Set fedmsg logging to not print warnings
import logging
logger = logging.getLogger('fedmsg')
logger.setLevel(logging.ERROR)

def main():

    # Grab messages from fedmsg and process them as we go
    print("Starting listening for fedmsgs..")
    for name, endpoint, topic, msg in fedmsg.tail_messages():
        print(topic)

if __name__ == '__main__':
    main()

With this simple read/only operation one can trigger on pretty much any fedmsg and react to changes in Fedora. I think it will ease the transition to fedora messaging if we provide a very similar way for people to consume fedora-messaging. Here is an example where another user is using tail_messages().

Right now it is recommended that people use the CLI to invoke fedora messaging using a command like fedora-messaging consume --callback=fedora_messaging.example:printer. This works but it's a bit awkward to change a script that was previously being called directly to now be imported and be called by something else. For example if previously I was telling people to call my script (foo.py) in order to run this service now I have to tell them to copy the file into the correct location and call fedora-messaging consume --callback=foo.example:do-something.

It will be a lot easier with #162. Just point the CLI at your script file, no fiddling with Python paths.

It would be great if we could ease the transition by doing a few things:

* **make the shipped config connect to fedora servers by default**

I picked the default because I imagined mostly people would be developing against a local broker. I suppose the RPM could make the Fedora broker one be the default, if that's what most people want.

It would be nice if I didn't have to cp /etc/fedora-messaging/fedora.toml /etc/fedora-messaging/config.toml. I also didn't do the sed step from https://fedora-messaging.readthedocs.io/en/latest/fedora-broker.html#getting-connected but I imagine if that is valuable if we automate it would make it where not everyone would use the 00000000-0000-0000-0000-000000000000 queue.

This would be the same thing as making the Fedora servers the default. I guess it's worth writing a post-install script to run sed because that step is important. All the steps in the documentation are important.

* **provide a nice alternative to `fedmsg.tail_messages()` to be used as an almost drop in replacement**

I think I should be able to do this with something like this (note it does still use a callback so not quite the same, but close):

#!/usr/bin/python3
from fedora_messaging import config, api, exceptions
import logging

# Set local logging 
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


def process_message(msg):
    print(msg.topic)

def main():
    config.conf.setup_logging()
    # Start consuming messages using our callback. This call will block until
    # a KeyboardInterrupt is raised, or the process receives a SIGINT or SIGTERM
    # signal.
    api.consume(process_message)

if __name__ == '__main__':
    main()

but that currently is hitting a permissions issue and it also requires us to set up logging config.conf.setup_logging() and I think it's missing setting up signal handling.

It intentionally does not set up logging or signal handling as we don't want to step on API user's pre-existing signal handlers and logging configuration. If you don't want to set up logging or deal with proper signal handling to ensure messages can be processed before halting, you will want to use the CLI.

so maybe we can create a new function in the api that will be a convenience (basically the same thing as the cli.consume()) function, but able to be called from within python like api.consume() can be.

Can you explain why you want all the CLI features without using the CLI? If you don't want to mess with logging and signal handling and so on, the CLI is there to do all that for you.

@abompard
Copy link
Member

I guess it's worth writing a post-install script to run sed because that step is important.

@jeremycline , I wonder if we could use Rabbit's server-named queues. When the client does not provide a queue name, RabbitMQ generates one in the amq.gen-* range, so we could limit the permissions there too.
And it should still be possible to choose a queue name in the config file if you want to have more than one consumer on the same queue.

@dustymabe
Copy link
Author

dustymabe commented May 16, 2019

It will be a lot easier with #162. Just point the CLI at your script file, no fiddling with Python paths.

Yeah that will make it easier to call the CLI. I agree.

It would be great if we could ease the transition by doing a few things:

* **make the shipped config connect to fedora servers by default**

I picked the default because I imagined mostly people would be developing against a local broker. I suppose the RPM could make the Fedora broker one be the default, if that's what most people want.

I represent a casual user who wants to build a service on top of fedora-messaging (consumer). I think as we move forward with automation objectives services like this (read-only fedmsg events, then react by doing some action) will become extremely popular. Do you know what the ratio of people using fedora-messaging to develop a provider of messages vs a consumer of messages would be? That might help us make a judgement call here.

It would be nice if I didn't have to cp /etc/fedora-messaging/fedora.toml /etc/fedora-messaging/config.toml. I also didn't do the sed step from https://fedora-messaging.readthedocs.io/en/latest/fedora-broker.html#getting-connected but I imagine if that is valuable if we automate it would make it where not everyone would use the 00000000-0000-0000-0000-000000000000 queue.

This would be the same thing as making the Fedora servers the default. I guess it's worth writing a post-install script to run sed because that step is important. All the steps in the documentation are important.

Since you said this step is important I wonder if fedora-messaging could detect if the 00000000-0000-0000-0000-000000000000 is in the queue name, assume it's a random consumer and generate a random UUID for the life of this process. It would achieve the goal of not using the same queue name and also make it easier to get started. This might be similar to what @abompard suggested.

* **provide a nice alternative to `fedmsg.tail_messages()` to be used as an almost drop in replacement**

It intentionally does not set up logging or signal handling as we don't want to step on API user's pre-existing signal handlers and logging configuration. If you don't want to set up logging or deal with proper signal handling to ensure messages can be processed before halting, you will want to use the CLI.

so maybe we can create a new function in the api that will be a convenience (basically the same thing as the cli.consume()) function, but able to be called from within python like api.consume() can be.

Can you explain why you want all the CLI features without using the CLI? If you don't want to mess with logging and signal handling and so on, the CLI is there to do all that for you.

I don't know if I would explain it as want all the CLI features without using the CLI. I think I would say that we want all the features that we had in the past. If we were able to do that then honestly we might even be able to come up with some magic to migrate people away from fedmsg.tail_messages() without them even changing their code. Just update the fedmsg package to use fedora-messaging and get rid of the fedmsg infrastructure in a shorter timeframe while consumers update to the new interface.

@jeremycline
Copy link
Member

jeremycline commented May 16, 2019 via email

@jeremycline
Copy link
Member

Since you said this step is important I wonder if fedora-messaging could detect if the 00000000-0000-0000-0000-000000000000 is in the queue name, assume it's a random consumer and generate a random UUID for the life of this process. It would achieve the goal of not using the same queue name and also make it easier to get started. This might be similar to what @abompard suggested.

I'd be more in favor of server-generated queue names than this. While I want this to Just Work out of the box, there isn't a good way to hide the fact that you need a unique, stable queue name. We can do what we're doing now, or we can generate a name on the fly, but neither one will actually Just Work. It'll just look like it's just working until you restart your application or notice you're getting 1/100 messages.

I don't know if I would explain it as want all the CLI features without using the CLI. I think I would say that we want all the features that we had in the past. If we were able to do that then honestly we might even be able to come up with some magic to migrate people away from fedmsg.tail_messages() without them even changing their code. Just update the fedmsg package to use fedora-messaging and get rid of the fedmsg infrastructure in a shorter timeframe while consumers update to the new interface.

People have to change their code to use fedora-messaging. The API isn't the same as fedmsg, and there are good reasons for that. The bridge from AMQP to ZMQ is there so that people don't have to change their code if they don't want to right now.

@jeremycline
Copy link
Member

I feel I should also address why I am dubious of implementing the same "fedmsg.tail_messages" API:

  • Halting gracefully. How are you going to handle that? Your service will get signals to shut down frequently (reboots, restarting the service, whatever). You need to make sure the message is either completely processed, or returned to the queue for later processing.

  • How do you handle applications that want to requeue messages instead of ignoring them? The current approach is have the callback raise an exception, but in an iterator there's no way to handle that exception or provide a different mechanism (off the top of my head, anyway).

  • It breaks the "One, obvious way to do it" rule. Offering multiple synchronous APIs is confusing. Offering an async and sync API the way we currently do is arguably confusing enough. I don't want to add to it.

Basically, there's a lot of edge cases fedmsg did not have to care about because, well, losing messages is not something it worries about.

@dustymabe
Copy link
Author

dustymabe commented May 16, 2019

Basically, there's a lot of edge cases fedmsg did not have to care about because, well, losing messages is not something it worries about.

I think these edge cases are important to address, but not all consumers are sensitive to them. For example, if my service goes down for a day or two I don't want to worry about events that happened two days ago. I care about right now and the future. I guess an analogy would be TCP vs UDP for something like a video stream of a live sporting event. Another API call could be useful (unreliable_consume()), but purely up to you to determine if it's valuable or not.

I'll leave the discussion there and let you determine the best way forward. This issue was mostly a way to share my experience with fedora-messaging and see if you or others had opinions on making the experience easier for people transitioning from the existing fedmsg API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants