Skip to content

18KRedApple/ChicagoSraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

ChicagoCrashes

Python scripts and n8n workflow to scrape and process data from: https://crash.chicagopolice.org

Credentials

n8n dashboard accessible via plaintext HTTP on port 5678.

Frontail (https://github.com/mthenw/frontail) instances to monitor scraping and data export are available on port 9002 and 9003.

Credentials:

  • Username: user
  • Password: HYOfbhgA6MpUJpEXgbHULfTw

Twilio and SendGrid credentials can be configured in .ini files.

Vagrant can be used to launch the entire system inside VM. Terraform can be used to create and provision VPS on Digital Ocean. There is also a Terraform config file to provision proxy servers (proxies.tf) with a helper script (setup_proxies.py). Note that you need to import n8n workflow from ChicagoCrashesV2.json into n8n server and enable it via n8n web interface.

Dolt is used for version-controlable database by scraper and other script.

CLI of scraper

usage: scrape.py [-h] --dolt-db-dir DOLT_DB_DIR [--try-next TRY_NEXT] [--try-prev TRY_PREV] [--force-start-rd FORCE_START_RD] [--yesterday] [--disable-proxies]

optional arguments:
  -h, --help            show this help message and exit
  --dolt-db-dir DOLT_DB_DIR
                        directory for Dolt DB
  --try-next TRY_NEXT   number of RDs to try after last known one
  --try-prev TRY_PREV   number of RDs to try before last known one
  --force-start-rd FORCE_START_RD
                        explicit value of initial RD
  --yesterday           use reference RD from yesterdays data
  --disable-proxies     disable proxies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published