Skip to content

extract text from pdf (a PHP wrapper for pdftotext)

License

Notifications You must be signed in to change notification settings

ottosmops/pdftotext

Repository files navigation

Extract text from a PDF with pdftotext

Software License Latest Stable Version Packagist Downloads

This package provides a class to extract text from a pdf.

For PHP 5.6 use Version 1.0.3

  \Ottosmops\Pdftotext\Extract::getText('/path/to/file.pdf') //returns the text from the pdf

Requirements

The Package uses pdftotext. Make sure that this is installed: which pdftotext

For Installation see: poppler-utils

If the installed binary is not found ("The command "which pdftotext" failed.") you can pass the full path to the _constructor (see below) or use putenv('PATH=$PATH:/usr/local/bin/:/usr/bin') (with the dir where pdftotext lives) before you call the class Extract.

Installation

composer require ottosmops/pdftotext

Usage

Extracting text from a pdf:

$text = (new Extract())
    ->pdf('file.pdf')
    ->text();

You can set the binary and you can specify options:

$text = (new Extract('/path/to/pdftotext'))
    ->pdf('path/to/file.pdf')
    ->options('-layout')
    ->text();

Default options are: -eol unix -enc UTF-8 -raw

License

The MIT License (MIT). Please see License File for more information.

About

extract text from pdf (a PHP wrapper for pdftotext)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages