Improve pickle cache write efficiency #1

martinduartemore · 2022-01-05T18:28:17Z

Our pickle cache's write operations are extremely inefficient, since we rewrite the entire binary file every time we add a new entry in the dictionary.

We could explore using the append option of the open function: with open(file, mode='ab') as f, but I'm not sure if this would provide us with performance benefits since we store everything inside a monolithic dictionary right now.

An alternative would be to test other database solutions, such as HDF5 and LMDB. I think these solutions natively handle expanding databases or at least provide efficient solutions to do so.

@cardoso-neto what do you think?

The text was updated successfully, but these errors were encountered:

cardoso-neto · 2022-01-07T01:45:02Z

HDF5 does cover that and it is what we're using on the codesearch project. Though it doesn't support dictionaries, only arrays, I'm sure we could find a way to make it useful still.
That being said, I don't think we should. I think we should go forward with an AWS solution. We had come to an agreement regarding which one to use on the Deep Tagging Discord channel.
Here is the message I sent:

Memcached:
running it locally: docker run -d -p 11211:11211 memcached
python integration: https://realpython.com/python-memcache-efficient-caching/
max_item_size flag: https://stackoverflow.com/a/31557737/11615853

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve pickle cache write efficiency #1

Improve pickle cache write efficiency #1

martinduartemore commented Jan 5, 2022

cardoso-neto commented Jan 7, 2022

Improve pickle cache write efficiency #1

Improve pickle cache write efficiency #1

Comments

martinduartemore commented Jan 5, 2022

cardoso-neto commented Jan 7, 2022