- Create virtual environment for the project using Python 3.8+
- Install requirements with
pip install -r requirements.txt
- Update search URLs in the file
./imot_bg_crawler/input.yaml
When done, check withhttp://www.yamllint.com/
if the input file is okay. - Run spider for the desired website. If you do not want logs add
--nolog
in the end of the command - When finished, check the
./imot_bg_crawler/output_files
folder for the results. - Enjoy.
- Imot.bg -
scrapy crawl imot.bg
- Imoti.com -
scrapy crawl imoti.com
SKIP_EXISTING
- does not save data if already saved, default True
PER_ITEM_RESULT
- saves every item in a separate folder, default True
PER_ITEM_DOWNLOAD_IMAGES
- if PER_ITEM_RESULT is enabled, marks if crawler will download item images, default True