Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pickle cache write efficiency #1

Open
martinduartemore opened this issue Jan 5, 2022 · 1 comment
Open

Improve pickle cache write efficiency #1

martinduartemore opened this issue Jan 5, 2022 · 1 comment

Comments

@martinduartemore
Copy link
Contributor

Our pickle cache's write operations are extremely inefficient, since we rewrite the entire binary file every time we add a new entry in the dictionary.

We could explore using the append option of the open function: with open(file, mode='ab') as f, but I'm not sure if this would provide us with performance benefits since we store everything inside a monolithic dictionary right now.

An alternative would be to test other database solutions, such as HDF5 and LMDB. I think these solutions natively handle expanding databases or at least provide efficient solutions to do so.

@cardoso-neto what do you think?

@cardoso-neto
Copy link
Contributor

HDF5 does cover that and it is what we're using on the codesearch project. Though it doesn't support dictionaries, only arrays, I'm sure we could find a way to make it useful still.
That being said, I don't think we should. I think we should go forward with an AWS solution. We had come to an agreement regarding which one to use on the Deep Tagging Discord channel.
Here is the message I sent:

Memcached:
running it locally: docker run -d -p 11211:11211 memcached
python integration: https://realpython.com/python-memcache-efficient-caching/
max_item_size flag: https://stackoverflow.com/a/31557737/11615853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants