Skip to content

Downloads PDFs and stores the text in the FOIArchive database and a copy in an s3 bucket

License

Notifications You must be signed in to change notification settings

history-lab/foiarchive-pdfloader

Repository files navigation

Installation

  1. clone the repo: git clone https://github.com/benjlis/foiarchive-pdfloader.git
  2. cd foiarchive-pdfloader
  3. create a virtual environment: python3 -m venv env
  4. activate the envionment: . env/bin/activate
  5. install the requirements: pip install -r requirements.txt
  6. define required environmental variables and store in .env
  7. run it in the background with nohup: nohup python -u pdf2pgs3.py >> load.log 2>&1&

About

Downloads PDFs and stores the text in the FOIArchive database and a copy in an s3 bucket

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages