This is the GitHub home for web scraping at the Police Data Accessibility Project.
(What do we mean by web scraping?)
This repo is part of a toolkit for people all over the country to learn about our police systems. Check out our software development roadmap and high-level technical diagram to learn more about our ecosystem.
Right now, this requires some Python knowledge and patience. We're in the early stages: there's no automated scraper farm or fancy GUI yet. Scrapers can be run locally as needed.
- Install Python. Prefer a differently opinionated guide? Perhaps this is more your speed.
- Clone this repo.
- Find the scraper you wish to run. These are sorted geographically, so start by looking in
/scrapers_library/...
. - Follow the instructions in the scraper's
README
to get going. (If it's broken or simply out of date, please open an issue in this repo or submit a PR.)
If you do something cool or interesting or fun with your shiny new data, share that in our Discord. Want to kick around an idea or share something that doesn't work as expected? Discord's a great place for that, too.
To write a scraper, start with CONTRIBUTING.md. Be sure to check out the /utils folder!
For everything else, start with docs.pdap.io.
Here are some potentially useful tools. If you want to make additions or updates, you can edit the docs in GitHub!