Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot select text for PDF generated by hocr-pdf #6829

Closed
HKWhyIP opened this issue Jan 3, 2016 · 2 comments
Closed

Cannot select text for PDF generated by hocr-pdf #6829

HKWhyIP opened this issue Jan 3, 2016 · 2 comments

Comments

@HKWhyIP
Copy link

HKWhyIP commented Jan 3, 2016

I used hocr-pdf (https://github.com/tmbdev/hocr-tools/blob/master/hocr-pdf) to generate a PDF (attached). hocr-pdf can merge a hocr file with an image file to generate a PDF. The hocr file is generated by Tesseract. Adobe Reader and Chrome default PDF viewer both have no problem to let me select and copy text from the PDF. But PDF.js does not allow me to select the text.

Anyone knows why? Is it an issue with hocr-pdf or PDF.js? Any suggestion on how to resolve the issue?

tesseract_hocr.pdf

@Snuffleupagus
Copy link
Collaborator

Closing as duplicate of issue #4684.

@HKWhyIP
Copy link
Author

HKWhyIP commented Jan 4, 2016

Thanks for the pointer. To those who are interested in the solution for using hocr-pdf, I simply replaced the fontname from "invisible" to "Courier". Then everything works fine. There is no more problem reported by PDF.js.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants