Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching RiveScript Brain for a Web Application #72

Open
jayrav13 opened this issue Jan 6, 2017 · 4 comments
Open

Caching RiveScript Brain for a Web Application #72

jayrav13 opened this issue Jan 6, 2017 · 4 comments
Labels

Comments

@jayrav13
Copy link

jayrav13 commented Jan 6, 2017

Hello!

Thank you so much for this library - I've been using it quite a bit for the last few months.

My question is about the contents of the Brain. I have a large set of RiveScript files and I'm wrapping this library with a Flask web server so that clients can make HTTP requests and respond based on the user calling in. Unfortunately on each reply, loading the Brain takes quite a while.

Question: Has anyone solved for a means of caching the parsed content in memory for a large set of RiveScript files (~ 130 files, 500 - 1000 lines each)?

Examples:

I tried to cache a lot of the RiveScript class's private variables (_global, _var, etc) in Redis as JSON-encoded strings and load them using json.loads but the parsing process from string to dictionary is rather expensive.

I also tried MongoDB, but the _sorted dictionary of sort buffers exceeds a document's maximum size of 16MB.

Another data point - every private dictionary in RiveScript (minus _handlers and _regexc stored as one large JSON object in a file is about 200MB.

Any feedback, either potential solutions or commentary on any relevant architecture decisions, is much appreciated.

@kirsle
Copy link
Member

kirsle commented Jan 6, 2017

Since it's Python, have you tried pickle?

Also, how are you managing your RiveScript instance in your Flask app? If you keep one globally accessible instance, the requests can reuse that instead of needing to load the bot for each message. Here's an example. Depending on your use case it might be acceptable to initialize the bot on server startup and keep it in memory for the lifetime of your application, and not worry about serializing and storing the bot.

@kirsle kirsle added the question label Jan 6, 2017
@jayrav13
Copy link
Author

jayrav13 commented Jan 6, 2017

Thanks for your reply!

I had run across pickle but didn't end up trying it in favor of json / rapidjson, just gave it a try and looks like the time to unpack the binary is still a bit lengthy.

The latter is an option I tried and seemed to work well, but had seen that it was considered bad practice to declare global variables across requests? But to your point, it seems like the best option instead of having to serialize and store the bot.

Any feedback on the latter point is appreciate, but will be trying to implement that solution and follow up. Thanks again for this library and your help!

@kirsle
Copy link
Member

kirsle commented Jan 6, 2017

I haven't battle-tested it (read: sending tons of simultaneous requests to a Flask app configured this way), so it might take a little tweaking to get it to work in a thread-safe way if you find that to be a problem.

Some ideas I have for thread safety:

  • Put the RiveScript instance on an attribute of your Flask app object. Your endpoints can use from Flask import current_app and access it by doing current_app.bot.reply(...) for example. This seems to be Flask's preferred way to handle globals from what I've seen being used by Flask extensions.
  • Implement thread locking yourself: create your own get_reply() function that uses a mutex to lock concurrent access (e.g. with threading.Lock).

In production mode with a WSGI runner, Flask apps tend to run on multiple threads or processes. In the latter case, each process would initialize its own copy of the bot and then use that for its requests. This could have the side effect of fragmenting user variables (if a different process handles different requests), so you might need something external to synchronize user variables (e.g. a Redis cache that all procs can use). The Flask example I linked earlier accepts user variables from the client, if you want to offload that responsibility to the client side (e.g. have the client send in all the user variables it knows about, so no matter which Flask proc handles the request they will all have the same user vars).

@jayrav13
Copy link
Author

Thanks!

I went ahead and implemented the initial method for our use case (using current_app for thread safety). In the coming days, we will be battle-testing against a large number of requests (we've hit as many as 100,000 in a day, so we'll be aiming big as far as test rate goes).

Appreciate your help, and once again appreciate this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants