Notion integration with BrickLink data. The integration allows you to keep track of your Lego collection in Notion.
Using Beautifulsoup for scraping, Notion SDK, and asyncio
Here are some screenshots of what it looks like in Notion:
python sqlite.py
-
CATEGORY values: [sw, sh, hp, avt, hfw, col]
-
TYPE values: [minifigs, sets]
Either automatically scrape lists from BrickLink:
python scraping/scrape_BL_init.py CATEGORY TYPE
Or from manually downloaded files (from BL):
python scraping/process_BL_minifigs_downloaded_file.py CATEGORY TYPE
python scraping/scrape_BL_set_info.py CATEGORY
and
python scraping/scrape_BL_minifig_info.py CATEGORY
First rename the file notion/private_secrets_TEMPLATE.py
to notion/private_secrets.py
with missing credentials for
Notion
python notion/create_db.py ALL
python notion/async_upsert_minifigs_data.py CATEGORY
Edit cron jobs with crontab -e
:
And add for example:
35 9 * * 1 cd /PATH_TO_PROJECT/bricklink-notion-integration && sh cron_weekly_scrape_init.sh
35 10,15 * * * cd /PATH_TO_PROJECT/bricklink-notion-integration && sh cron_daily_scrape_info.sh
- scrape_BL_ids : save ids in sqlite -
DONE
- could even get id, name, category, subcategory
- then read ids from sqlite to scrape info -
DONE
- read data from Notion where owned=True to process this data in priority -
DONE
- create priority list of minifigs to scrape -
DONE
- 1- owned > 2- wanted > 3- most recent
- add to table: last_scraped_at -
DONE
- differentiate when really no price vs BL quota reached -
DONE
- Add logic (exponential backoff "delay = (base_delay * 2 ** retries + random.uniform(0, 1))") -
DONE
- logic: do not scrape if today < last_scraped_at + 2 ** failed_count days
- add column in db: failed_count [default=0]
- In the future, add logic to scrape based on last_scraped_at -
DONE
- Only record price when the diff with previous price is significant -
DONE
- general refactor (factorize code, create classes, optimize where possible, etc.)
- create async methods to send the data to Notion -
DONE
- add to notion_mapping table: last_updated_at -
DONE