Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in part of the translation #273

Closed
mauricioros opened this issue Feb 25, 2020 · 3 comments · Fixed by #346
Closed

Problem in part of the translation #273

mauricioros opened this issue Feb 25, 2020 · 3 comments · Fixed by #346
Labels

Comments

@mauricioros
Copy link

Hello!
First of all thanks for making this amazing tool available!
Let's get to the problem .... I'm creating a small application using PDFParser.
But it cannot convert part of a pdf.

Can you help me?

I'm reading the following url:
https://dje.tjsp.jus.br/cdje/consultaSimples.do?cdVolume=14&nuDiario=2986&cdCaderno=12&nuSeqpagina=2993

And in the first occurrence of "REQDO" it should bring forward "Guaraná e Ramos Empreendimentos Imobiliários Ltda.", However bring the following set of characters "<001d0003002a00580044005500440051006900030048000300000000000000000000000000000000000000000000000000000000000000003000000000000000000030000000
440011 ["

Using the demo guys (https://www.pdfparser.org/demo) it also presents the same error.

Can you help me? Please!!!

@k00ni k00ni added the bug label May 26, 2020
@Connum
Copy link
Contributor

Connum commented Sep 25, 2020

The hexadecimal string in this position contains a newline character that causes this issue. I'm working on a fix that strips any newlines from hexadecimal strings bevore trying to decode them.

@Connum
Copy link
Contributor

Connum commented Sep 25, 2020

I just created a PR with a fix! There's this other decoding issue when testing the provided file

EXEQTE  : Infinit����)�D�V�K�L�R�Q���&�R�P�p�U�F�L�R���H���'�L�V�W�U�L�E�X�L�G�R�U�D���(�L�U�H�O�i
ADVOGADO  : 67978/SP - Cleodilson Luiz Sforzin

which will be fixed as well once #342 is merged.

@k00ni
Copy link
Collaborator

k00ni commented Sep 30, 2020

#342 was merged.

@k00ni k00ni closed this as completed Sep 30, 2020
k00ni pushed a commit that referenced this issue Oct 8, 2020
…eaks first (fix #273) (#346)

* process hexadecimal strings containing line breaks, but strip line breaks first (fix #273)

* remove binary symbold from test data string

* code linting
partulaj pushed a commit to partulaj/pdfparser that referenced this issue Dec 21, 2020
…eaks first (fix smalot#273) (smalot#346)

* process hexadecimal strings containing line breaks, but strip line breaks first (fix smalot#273)

* remove binary symbold from test data string

* code linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants