Skip to content

katharinawuensche/NLPdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLPdf

PDF Extractor using Natural Language Processing

Quickstart

  1. Download the repository
  2. Install the requirements:
pip3 install -r requirements.txt 
  1. Load the language model for Spacy:
python3 -m spacy download en
  1. Copy the PDF files to be cleaned into the directory "PDFs"
  2. Run the extraction tool:
python3 run.py 
  1. The output is written to the directory "output"

About

PDF Extractor using Natural Language Processong

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages