A basic scraper for generating files for https://www.russ.fm/ ๐ธ. While this was initially created for personal use, feel free to use it if you find it helpful! ๐ Although the documentation is minimal, the code is fairly straightforward.
You can find the repo containing the website files and config at russmckendrick/records, it's a Hugo-powered site and there are ALOT of files.
- Clone the repository to your local machine.
- Install the required dependencies using
pip install -r requirements.txt
. - Run the
discogs_scraper.py
script to start the scraper.
To customize the scraper for your needs, create a copy of the secrets.json.example
file calling it secrets.json
and file in the details.
The scraper fetches data from the Discogs API and processes the information to generate markdown files and download images. This data can then be used to create a static site showcasing your music collection ๐ง.
The scraper can be run using the following commands:
To process just 10 releases every 2 seconds run the script without any flags;
$ python3 discogs_scraper.py
You can add the --all
flag to process all releases in your collection;
$ python3 discogs_scraper.py --all
You can also add the --num-items
flag to process a specific number of releases;
$ python3 discogs_scraper.py --num-items 100
Finally, you can override the default 2 second delay between requests using the --delay
flag, this is not recommended as it may cause issues with the Discogs API so be careful;
$ python3 discogs_scraper.py --delay 0
You can also combine the flags to process a specific number of releases without any delay;
$ python3 discogs_scraper.py --all --delay 0
If you'd like to contribute or suggest improvements, feel free to submit a pull request or open an issue on GitHub. We appreciate your input! ๐
Enjoy scraping and building your music collection website! ๐ถ
Oh yeah, it was mostly written by ChapGPT ๐ฌ with me debugging ๐ it and adding some features. ๐ค
For when reviewing the wrong matches and you need to move a release to the collection_cache_overrides.json
file from your collection_cache.json
file.