Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text selection causes unnecessary line breaks #2569

Closed
gracile-fr opened this issue Jan 15, 2013 · 12 comments
Closed

Text selection causes unnecessary line breaks #2569

gracile-fr opened this issue Jan 15, 2013 · 12 comments

Comments

@gracile-fr
Copy link

PDFs created with "Print Pages to Pdf" addon based upon the opensource library wkhtmltopdf have an annoying issue when opened with pdf.js: if you copy and paste the textcontent of the PDF, each character is separated by a line break.

E.g.: With this PDF, if you copy and paste "The Mozilla Manifesto", you'll get:
T
h
e
M
o
z
i
l
l
a
M
a
n
i
f
e
s
t
o

@timvandermeij
Copy link
Contributor

Another PDF with the same issue: http://archive.cs.uu.nl/mirror/CTAN/macros/latex/contrib/chessboard/chessboard.pdf#page=43&zoom=auto,0,770. Also, spaces are not copied in your PDF if you copy 'The Mozilla Manifesto'.

@timvandermeij
Copy link
Contributor

@ReporterX
Copy link

Well the top post states it is the issue with wkhtmltopdf, but in the case above, it has nothing to do with it.
The example above is a PDF created by Adobe Reader.
It seems the unintentional line breaks could happen to any PDF.

The bug has been around more than 1 year.
So what is the real culprit? Any clue?

@timvandermeij timvandermeij changed the title Issue with PDFs created with wkhtmltopdf Text selection causes unnecessary line breaks Sep 19, 2014
@timvandermeij
Copy link
Contributor

@ReporterX I have updated the title of this issue to reflect the current state. The problem is that each character is put in a separate div. When you copy that, you get the abovementioned behaviour. IIRC work is being done to refactor the text layer, i.e., to reduce the amount of divs (combining them into one).

@abilashs90
Copy link

Is there any solution to this?

@jasonparallel
Copy link

@timvandermeij Does it still look like the current refactoring work will alleviate this issue?

@timvandermeij
Copy link
Contributor

@jasonparallel I'm afraid not. There are some PRs open for text layer alignment, but combining divs is not yet being looked into. If anyone is willing to do that, we welcome PRs anytime.

@rugk
Copy link

rugk commented Nov 16, 2015

Is this related to #2989?
And was there any process made regarding this issue? because copy & pasting currently often works badly in pdf.js.

In any case here is another example: https://www.ietf.org/proceedings/82/slides/rtcweb-13.pdf
When copying the normal text everything works (expect that some characters like { and seem to be inserted in some places).
But when you copy a text which is displayed non-horizontally like the HTTPS (ROAP?) on site 6 you get many line breaks:

HTTPS
(R
OA
P?)

@yurydelendik
Copy link
Contributor

The original PDF issue was fixed by #6590 (?). Rest of the problematic PDFs might have different problems -- the new issues shall be created for them. Closing as fixed.

@rugk
Copy link

rugk commented Nov 18, 2015

So the HTTPS (ROAP?) thing is fixed there?

And what about #2989? Is this also closed by that PR?

And thirdly I've created a separate issue for the character problem: #6658

@yurydelendik
Copy link
Contributor

So the HTTPS (ROAP?) thing is fixed there?

No, it was not. The issue is closed once the reporter's issue is resolved.

And what about #2989? Is this also closed by that PR?

The #2989 talks about using span elements vs div and/or using events to replace the clipboard.

And thirdly I've created a separate issue for the character problem: #6658

Thank you.

@rugk
Copy link

rugk commented Nov 18, 2015

So the HTTPS (ROAP?) thing is fixed there?

No, it was not. The issue is closed once the reporter's issue is resolved.

Okay, I've created a new issue for this: #6659

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants