Caching RiveScript Brain for a Web Application #72

jayrav13 · 2017-01-06T16:34:50Z

Hello!

Thank you so much for this library - I've been using it quite a bit for the last few months.

My question is about the contents of the Brain. I have a large set of RiveScript files and I'm wrapping this library with a Flask web server so that clients can make HTTP requests and respond based on the user calling in. Unfortunately on each reply, loading the Brain takes quite a while.

Question: Has anyone solved for a means of caching the parsed content in memory for a large set of RiveScript files (~ 130 files, 500 - 1000 lines each)?

Examples:

I tried to cache a lot of the RiveScript class's private variables (_global, _var, etc) in Redis as JSON-encoded strings and load them using json.loads but the parsing process from string to dictionary is rather expensive.

I also tried MongoDB, but the _sorted dictionary of sort buffers exceeds a document's maximum size of 16MB.

Another data point - every private dictionary in RiveScript (minus _handlers and _regexc stored as one large JSON object in a file is about 200MB.

Any feedback, either potential solutions or commentary on any relevant architecture decisions, is much appreciated.

The text was updated successfully, but these errors were encountered:

kirsle · 2017-01-06T18:23:14Z

Since it's Python, have you tried pickle?

Also, how are you managing your RiveScript instance in your Flask app? If you keep one globally accessible instance, the requests can reuse that instead of needing to load the bot for each message. Here's an example. Depending on your use case it might be acceptable to initialize the bot on server startup and keep it in memory for the lifetime of your application, and not worry about serializing and storing the bot.

jayrav13 · 2017-01-06T19:57:00Z

Thanks for your reply!

I had run across pickle but didn't end up trying it in favor of json / rapidjson, just gave it a try and looks like the time to unpack the binary is still a bit lengthy.

The latter is an option I tried and seemed to work well, but had seen that it was considered bad practice to declare global variables across requests? But to your point, it seems like the best option instead of having to serialize and store the bot.

Any feedback on the latter point is appreciate, but will be trying to implement that solution and follow up. Thanks again for this library and your help!

kirsle · 2017-01-06T20:45:29Z

I haven't battle-tested it (read: sending tons of simultaneous requests to a Flask app configured this way), so it might take a little tweaking to get it to work in a thread-safe way if you find that to be a problem.

Some ideas I have for thread safety:

Put the RiveScript instance on an attribute of your Flask app object. Your endpoints can use from Flask import current_app and access it by doing current_app.bot.reply(...) for example. This seems to be Flask's preferred way to handle globals from what I've seen being used by Flask extensions.
Implement thread locking yourself: create your own get_reply() function that uses a mutex to lock concurrent access (e.g. with threading.Lock).

In production mode with a WSGI runner, Flask apps tend to run on multiple threads or processes. In the latter case, each process would initialize its own copy of the bot and then use that for its requests. This could have the side effect of fragmenting user variables (if a different process handles different requests), so you might need something external to synchronize user variables (e.g. a Redis cache that all procs can use). The Flask example I linked earlier accepts user variables from the client, if you want to offload that responsibility to the client side (e.g. have the client send in all the user variables it knows about, so no matter which Flask proc handles the request they will all have the same user vars).

jayrav13 · 2017-01-20T22:56:06Z

Thanks!

I went ahead and implemented the initial method for our use case (using current_app for thread safety). In the coming days, we will be battle-testing against a large number of requests (we've hit as many as 100,000 in a day, so we'll be aiming big as far as test rate goes).

Appreciate your help, and once again appreciate this project!

kirsle added the question label Jan 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching RiveScript Brain for a Web Application #72

Caching RiveScript Brain for a Web Application #72

jayrav13 commented Jan 6, 2017

kirsle commented Jan 6, 2017

jayrav13 commented Jan 6, 2017

kirsle commented Jan 6, 2017

jayrav13 commented Jan 20, 2017

Caching RiveScript Brain for a Web Application #72

Caching RiveScript Brain for a Web Application #72

Comments

jayrav13 commented Jan 6, 2017

kirsle commented Jan 6, 2017

jayrav13 commented Jan 6, 2017

kirsle commented Jan 6, 2017

jayrav13 commented Jan 20, 2017