-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Other potential backends #77
Comments
I think it would be extremely useful to have RDS-like drivers that overcome the serialization bottleneck. Would it be feasible to store unserialized binary blobs instead of RDS files? Can we leverage |
unserialized binary blobs don't really exist - there is not a linear memory map for all but the simplest structures.
It's ultimately a performance/generality tradeoff. If there is a storr backend that serialises only simple types (atomic types, lists, and therefore data.frame's) it will choke as soon as something adds an exotic attribute. What possibly could be done by someone sufficiently motivated would be to write a replacement for |
What about an RDS-like driver with the ability to choose how individual objects are loaded and saved? We might be able to store the optional custom saving/loading methods in the key files. This could be especially useful in my_storr$set(
key = "model",
value = keras_model,
save = keras::save_model_hdf5(value, file),
load = keras::load_model_hdf5(file)
)
my_storr$get(key = "model") With |
This would be possible to implement. We would need to know for each special case:
This requires a bit of fiddling around with the current hash functions, but it could be possible. The limitation would be that you'd pay a little extra I/O cost on each deserialisation because you'd need to check the first few bytes then read the whole thing, and if you had two things that serialised down to a format with the same magic number but different formats you'd be stuffed (so for example if keras saves models in an hdf5 format of one flavour and another thing in a slightly different hdf5 format with a different load function it just would not work). It might be worth thinking if you just want to special case these beasts though; it's going to put extra complexity somewhere and it's probably worth thinking about of you want to put that into a very fiddly configuration of the storr driver or if you want to just go "oh you're doing keras stuff, let me save a copy of that into a special |
Interesting. I was assuming we would need to store a deserialization reference somewhere else, like a key file, but it sounds like those first few bytes could save us some bookkeeping. Any reading material you would recommend on serialization internals?
I have not decided whether to have |
Hmm... my comment just now is quite long and very specific. I will relocate it to a new issue. |
@richfitz, I am coming back to your suggestion from the bottom of #77 (comment). I am proposing a decorated |
The text was updated successfully, but these errors were encountered: