π β Break data out of PDF prison
This walkthrough demonstrates how to:
- Scrape data from PDF tables using
tabulizer
- Manage unwieldy header types and tidy scraped data output using
dplyr
,tidyr
, andstringr
- Abstract steps into a scraper function
- Iterate across multiple tables and PDFs with
purrr
- Reshape and bind output into a master
tidy
dataframe