pdf-text-extraction

Star

Here are 11 public repositories matching this topic...

houking-can / PDFSDK

Star

Based on Foxit Quick PDF Library，python interface

pdf-merge pdf-split pdf-document-processor pdf-sdk pdf-text-extraction

Updated Apr 4, 2020
Python

vijayengineer / PDFTextSpeechConverter

Star

Converts scanned documents and ordinary documents into speech mp3 using Amazon Polly

pdf text images speech aws-polly audiobook synthesis scanned-documents pdf-text-extraction

Updated Dec 30, 2020
Python

PrathameshDhande22 / PdfTxtBot

Star

A Telegram bot which extract Text from PDF, also extract the Images of PDF Pages. Made with Python

python telegram telegram-bot python3 python-telegram-bot image-extractor python-telegram pdf-text pdf-text-extraction pdf-image

Updated Feb 27, 2023
Python

mamiriqbal1 / rag_book_qa_prompt

Star

A simple demonstration of how you can implement retrieval augmented generation (RAG) for a book.

question-answering rag pdf-text-extraction large-language-models llm chatgpt-web retrieval-augmented-generation

Updated Nov 29, 2023
Jupyter Notebook

Zeeshanahmad4 / NLP-Pdf-Minning-Extracting-text-from-pdf

Star

NLP Pdf Minning Extracting text from pdf

python pdf pdf-converter text-extraction pdfkit pdf-files extract-text pdftotext pdf-format pdf-document-processor pdftoimage pdftools pdftohtml pdf-text-extraction pdfcon

Updated Apr 2, 2020
Python

VirajMadhu / pdf_key_matcher

Star

Highlights the key matches between your Given PDF and the description text

python open-source pdf cv python-script python3 text-extraction terminal-based ats text-compression pdf-text-extraction virajmadhu

Updated Dec 4, 2024
Python

towfique-elahe / pdf-to-structured-csv

Star

A Python-based tool for extracting structured data from PDFs using OCR and regex, and exporting it to CSV. Ideal for processing invoices, logs, or scanned documents into organized, usable datasets.

ocr data-extraction pdf-to-csv document-processing pytesseract pdf2image python-automation pdf-text-extraction structured-data-extraction regex-parsing

Updated Oct 30, 2024
Jupyter Notebook

RealBlueSwan / BSPDFDataExtractor

Star

Extracts Data from provided PDF using key words to identify relevant datapoints. Using UglyToad PDFPIG(great lib btw)

pdf-text-extraction

Updated Jul 20, 2024
C#

A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.