Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call to undefined method Smalot\PdfParser\Encoding::__toString() #364

Closed
rubas opened this issue Oct 30, 2020 · 9 comments · Fixed by #378
Closed

Call to undefined method Smalot\PdfParser\Encoding::__toString() #364

rubas opened this issue Oct 30, 2020 · 9 comments · Fixed by #378
Labels

Comments

@rubas
Copy link

rubas commented Oct 30, 2020

We are seeing a lot of uncatched errors, when we try to extract the content of some pdfs.


Encoding::__toString()

Call to undefined method Smalot\PdfParser\Encoding::__toString()

You find the complete stack trace here. The char is \.

if (\strlen($char) < 2 && $this->has('Encoding') && 'WinAnsiEncoding' === $this->get('Encoding')->__toString()) {
    $fallbackDecoded = self::uchr($dec);
 }

https://github.com/smalot/pdfparser/blob/master/src/Smalot/PdfParser/Font.php#L104


Header::__toString()

Call to undefined method Smalot\PdfParser\Header::__toString()
You find the complete stack trace here. The char is !.


Code

Our code is simple.

use Smalot\PdfParser\Parser;

$content = file_get_contents($url);
...
$parser = new Parser();
$pdf    = $parser->parseContent($content);

return $pdf->getText();

Testfiles

10-12.pdf
12-14.pdf
28-32-2.pdf

@k00ni k00ni added the bug label Oct 30, 2020
@k00ni
Copy link
Collaborator

k00ni commented Oct 30, 2020

Thank you for your detailed bug report.

@clicksistema
Copy link

clicksistema commented Nov 8, 2020

the function __toString is missing on class Encoding
I've created it to return an implode of the object for test and the error stoped

@k00ni
Copy link
Collaborator

k00ni commented Nov 9, 2020

Can you paste your fix here?

@clicksistema
Copy link

clicksistema commented Nov 9, 2020

I've insert this function to class Encoding:

    public function __toString()
	{
		return implode(',',$this->encoding);
	}

Just to be clear that i didn't check for what this class is used. I just created a function that works and was not founded before.
I belive that most times this class is not returned as a object of HEADER class but when HEADER has one object of this class the error occurs
Maybe the error is deeper of contest. Why sometimes this class is part of HEADER class?

@johnyboom
Copy link

Hi, I have the same issue. But sadly the fix removes the error not the problem.
If you have one rouge character in a file no big deal but some of the files, I need to parse, are almost entirely unreadable.

pd120320.pdf

TISKOVÁ ZPRÁVA Centrum pro výzkum veejného mínní Sociologický ústav AV R, v.v.i. �������������������� � !"#� ��� $�%�&��'�%��(�)&�&���&* � etc.

Despite this, the majority of files are parsed nicely so great work.

@k00ni
Copy link
Collaborator

k00ni commented Nov 10, 2020

@johnyboom: Is the PDF you posted free to use and without obligations? We may add it to our test environment to test potential fixes.

@johnyboom
Copy link

Well, it is a public document but to be sure I'll ask for consent.

https://cvvm.soc.cas.cz/media/com_form2content/documents/c2/a47/f9/pd120320.pdf

@johnyboom
Copy link

@johnyboom: Is the PDF you posted free to use and without obligations? We may add it to our test environment to test potential fixes.

Ok, we have consent to use it for tests. I've forwarded the details to your email.

@k00ni
Copy link
Collaborator

k00ni commented Nov 11, 2020

The following consent was given for the mentioned PDF file:

We are giving consent to https://www.pdfparser.org to freely use pdf file https://cvvm.soc.cas.cz/media/com_form2content/documents/c2/a47/f9/pd120320.pdf for testing purposes.
Content of the file is still intellectual property of "CENTRUM PRO VÝZKUM VEŘEJNÉHO MÍNĚNÍ Sociologický ústav AV ČR, v.v.i." and should be handled according to https://cvvm.soc.cas.cz/cz/cvvm/dokumenty/13-pravni-ujednani.

If someone wants to provide a fix and using this file to check, please include my quoted consent as it is and add it to the code part (with test code) which uses the PDF.

k00ni added a commit that referenced this issue Dec 25, 2020
* Implement undefined method in Encoding class. The __toString method was missing/not implemented, even though it is called in some cases. Fixes #364

* Add PHPDoc and fix type error

* Fix style issues

* added test in FontTest to proof fix is working; coding style nice up in Font.php

* added EncodingTest with 2 tests for new method "getEncodingClass"

Co-authored-by: Konrad Abicht <hi@inspirito.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants