Replace the current server call for offline products to something more scalable #4066

teolemon · 2023-06-02T14:28:18Z

What

We're finally trying to ship the almost finished offline products feature, but
The API V2 works slow on 1K products: https://fr.openfoodfacts.org/api/v2/search?page_size=1000&fields=code,product_name,brands,nutrition_grade_fr,quantity,lang,ecoscore_grade,image_front_small_url
The API V2 does not work at all on 10K products: https://fr.openfoodfacts.org/api/v2/search?page_size=10000&fields=code,product_name,brands,nutrition_grade_fr,quantity,lang,ecoscore_grade,image_front_small_url

Potential solutions

@CharlesNepote has proposed using Mirabelle (which AFAIK doesn't offer knowledge panels) but could be useful for product existence (just barcode and name) - Generate on-demand miniaturized dump of the database and images for an offline mode openfoodfacts-server#6328 (comment)
We could add a nightly cron exports for the currently failing 10K/100k queries for all major countries

WDYT @stephanegigandet @raphael0202 @alexgarel @CharlesNepote ?

Part of

📶 🤳 Offline scanning (Tracker) #18

monsieurtanuki · 2023-06-02T14:53:21Z

A solution could be to query the top 10k barcodes first, then to download the products as background tasks, in chunks of 1k or 100.

CharlesNepote · 2023-06-06T15:21:08Z

The Mirabelle tool has a very aggressive cache, well suited for this kind of usage: when doing a request the first one can be a bit long (maybe more than 30s depending on your request), but the second one should be very fast. The database and the cache are refreshed every day.

Datasette (the engine of Mirabelle) can export data in CSV or JSON:

CSV: https://mirabelle.openfoodfacts.org/products.csv?_sql=[your request]&_size=max
JSON: see documentation: https://docs.datasette.io/en/stable/json_api.html

See my examples here: openfoodfacts/openfoodfacts-server#6328 (comment)

The service is working well but we should test the http connection (error code) and the result.

monsieurtanuki · 2023-06-11T07:04:51Z

This is what I'm going to implement:

create a new sql table: offline_barcode(barcode pk)
create a new background task that downloads the top 10k products
- download the top 10k barcodes, possibly with Mirabelle, possibly by 1k, and populate table offline_barcode
- download the products in offline_barcode that are not already downloaded, possibly by 1k

That would also lead the way to something @teolemon will appreciate: refreshing all the products locally stored.

teolemon · 2023-06-11T07:36:52Z

@monsieurtanuki

why should we create a new table ?
eventually, we should be able to invalidate local caches when needed (eg: when we refresh translations for knowledge panels, add a new one, or even like 2 weeks ago add clickability of the summary card for allergens)

monsieurtanuki · 2023-06-11T09:52:22Z

@teolemon We are better off with an additional barcode work temporary table, that helps us split long running and potentially failing http queries into smaller bullet-proof operations.

That will also help us share code with the "full refresh" feature: downloading all products from a list of barcodes, either the top 10k or the current local products.

monsieurtanuki · 2023-06-15T08:01:19Z

Really not sure what you're worried about, performance-wise: I've just managed to download the top 10K (FR_fr) barcodes in 191 seconds, with 100 item pages (less than 2 seconds per query).
The next step is just to download the products from a barcode list, again with 100 item pages.

teolemon added 🐛 bug Something isn't working Offline - Browsing labels Jun 2, 2023

teolemon added this to 🤳🥫 The Open Food Facts mobile app (Android & iOS) Jun 2, 2023

github-project-automation bot moved this to To discuss and validate in 🤳🥫 The Open Food Facts mobile app (Android & iOS) Jun 2, 2023

teolemon mentioned this issue Jun 2, 2023

📶 🤳 Offline scanning (Tracker) #18

Open

monsieurtanuki self-assigned this Jun 11, 2023

monsieurtanuki mentioned this issue Jun 12, 2023

fix: 4066 - top 1K pre-download and full refresh as background tasks #4131

Merged

monsieurtanuki mentioned this issue Jun 15, 2023

fix: 4066 - top n product download split in smaller robust parts #4166

Merged

monsieurtanuki closed this as completed in #4166 Jun 21, 2023

github-project-automation bot moved this from To discuss and validate to Done in 🤳🥫 The Open Food Facts mobile app (Android & iOS) Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace the current server call for offline products to something more scalable #4066

Replace the current server call for offline products to something more scalable #4066

teolemon commented Jun 2, 2023 •

edited

Loading

monsieurtanuki commented Jun 2, 2023

CharlesNepote commented Jun 6, 2023 •

edited

Loading

monsieurtanuki commented Jun 11, 2023

teolemon commented Jun 11, 2023

monsieurtanuki commented Jun 11, 2023

monsieurtanuki commented Jun 15, 2023

Replace the current server call for offline products to something more scalable #4066

Replace the current server call for offline products to something more scalable #4066

Comments

teolemon commented Jun 2, 2023 • edited Loading

What

Potential solutions

Part of

monsieurtanuki commented Jun 2, 2023

CharlesNepote commented Jun 6, 2023 • edited Loading

monsieurtanuki commented Jun 11, 2023

teolemon commented Jun 11, 2023

monsieurtanuki commented Jun 11, 2023

monsieurtanuki commented Jun 15, 2023

teolemon commented Jun 2, 2023 •

edited

Loading

CharlesNepote commented Jun 6, 2023 •

edited

Loading