You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our pickle cache's write operations are extremely inefficient, since we rewrite the entire binary file every time we add a new entry in the dictionary.
We could explore using the append option of the open function: with open(file, mode='ab') as f, but I'm not sure if this would provide us with performance benefits since we store everything inside a monolithic dictionary right now.
An alternative would be to test other database solutions, such as HDF5 and LMDB. I think these solutions natively handle expanding databases or at least provide efficient solutions to do so.
HDF5 does cover that and it is what we're using on the codesearch project. Though it doesn't support dictionaries, only arrays, I'm sure we could find a way to make it useful still.
That being said, I don't think we should. I think we should go forward with an AWS solution. We had come to an agreement regarding which one to use on the Deep Tagging Discord channel.
Here is the message I sent:
Our pickle cache's write operations are extremely inefficient, since we rewrite the entire binary file every time we add a new entry in the dictionary.
We could explore using the append option of the
open
function:with open(file, mode='ab') as f
, but I'm not sure if this would provide us with performance benefits since we store everything inside a monolithic dictionary right now.An alternative would be to test other database solutions, such as HDF5 and LMDB. I think these solutions natively handle expanding databases or at least provide efficient solutions to do so.
@cardoso-neto what do you think?
The text was updated successfully, but these errors were encountered: