-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the current server call for offline products to something more scalable #4066
Comments
A solution could be to query the top 10k barcodes first, then to download the products as background tasks, in chunks of 1k or 100. |
The Mirabelle tool has a very aggressive cache, well suited for this kind of usage: when doing a request the first one can be a bit long (maybe more than 30s depending on your request), but the second one should be very fast. The database and the cache are refreshed every day. Datasette (the engine of Mirabelle) can export data in CSV or JSON:
See my examples here: openfoodfacts/openfoodfacts-server#6328 (comment) The service is working well but we should test the http connection (error code) and the result. |
This is what I'm going to implement:
That would also lead the way to something @teolemon will appreciate: refreshing all the products locally stored. |
|
@teolemon We are better off with an additional barcode work temporary table, that helps us split long running and potentially failing http queries into smaller bullet-proof operations. That will also help us share code with the "full refresh" feature: downloading all products from a list of barcodes, either the top 10k or the current local products. |
Really not sure what you're worried about, performance-wise: I've just managed to download the top 10K (FR_fr) barcodes in 191 seconds, with 100 item pages (less than 2 seconds per query). |
What
Potential solutions
WDYT @stephanegigandet @raphael0202 @alexgarel @CharlesNepote ?
Part of
The text was updated successfully, but these errors were encountered: