Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(pdftojpeg): multilang documents #366

Open
Kristinita opened this issue Aug 19, 2019 · 0 comments
Open

bug(pdftojpeg): multilang documents #366

Kristinita opened this issue Aug 19, 2019 · 0 comments

Comments

@Kristinita
Copy link

1. Summary

For pdftojpeg command sejda-console incorrect convert to jpeg documents with bilingual text.

2. Data

  • KiraDebuggingMultilang.pdf — PDF of my scanned book, that I want convert to image for further processing; text in file on Russian and English

3. Steps to reproduce

I download and install BookAntiqua, Century Schoolbook and Century Schoolbook Bold Italic fonts.

I download and unzip the latest Sejda Console Windows binary for 14.8.2019 → I add path of Sejda bin folder to my PATH environment variable → I run sejda-console command:

sejda-console pdftojpeg -f KiraDebuggingImage.pdf -o .

4. Expected behavior

Poppler output via command pdftoppm -jpeg KiraDebuggingMultilang.pdf KiraMultilang (file without editing):

Poppler

English symbols preserve.

5. Actual behavior

D:\SashaDebugging\SejdaToTiff>sejda-console pdftojpeg -f KiraDebuggingMultilang.pdf -o .
Configuring Sejda 3.2.83
Document root element "sejda", must match DOCTYPE root "null".
Document is invalid: no grammar found.
Starting execution with arguments: 'pdftojpeg -f KiraDebuggingMultilang.pdf -o .'
Java version: '1.8.0_221'
Validating parameters.
Starting task (org.sejda.impl.sambox.PdfToMultipleImageTask@22555ebf) execution.
Opening D:\SashaDebugging\SejdaToTiff\KiraDebuggingMultilang.pdf
Created output temporary buffer C:\Users\SASHAC~1\AppData\Local\Temp\sejdaTmp1308333151633555382.tmp
Cannot read JBIG2 image: jbig2-imageio is not installed
Task progress: 100% done
Moving C:\Users\SASHAC~1\AppData\Local\Temp\sejdaTmp1308333151633555382.tmp to D:\SashaDebugging\SejdaToTiff\1_KiraDebuggingMultilang.jpg.
Documents converted to JPEG and saved to org.sejda.model.output.FileOrDirectoryTaskOutput@74fe5c40[D:\SashaDebugging\SejdaToTiff]
Task (org.sejda.impl.sambox.PdfToMultipleImageTask@22555ebf) executed in 1 second
Completed execution

Sejda

English symbols convert incorrect.

6. Not helped

I remove fonts, that I install in section 3 → for fallback fonts I still have expected behavior, when I run Poppler command, and non-expected, if Sejda Console command.

7. Environment

  • Windows 10 Enterprise LTSB 64-bit EN
  • Java 1.8.0_211
  • Sejda Console 3.2.83

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant