Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paste from Word: option to parse OpenOffice/LibreOffice Writer inserts #2374

Closed
dryoma opened this issue Sep 4, 2018 · 10 comments
Closed

Paste from Word: option to parse OpenOffice/LibreOffice Writer inserts #2374

dryoma opened this issue Sep 4, 2018 · 10 comments
Assignees
Labels
plugin:pastefromword The plugin which probably causes the issue. status:confirmed An issue confirmed by the development team. type:feature A feature request.

Comments

@dryoma
Copy link

dryoma commented Sep 4, 2018

Type of report

Feature request

Provide description of the new feature

At the moment the Paste from Word plugin automatically detects only if the content being pasted is from the MS' application. It would be nice to have an option to detect content from other similar programs, namely Writer from OpenOffice and LibreOffice packages. I understand the documents might have different markup rules and thus pasting from them as is would cause bugs (I take it that's why they are not supported officially - because that would require extra work on adjusting the code). That's why I think
an config option would do. Like config.pasteFromWord_detectOpenOffice = true/false. Or even multiple options, as I don't know if the makup LibreOffice produces is substantially different and if it has different meta. So that a user could decide if he/she's ok with possible downsides.

Adding such detect ability is pretty easy. Here resides the regex pattern:

 /<meta\s*name=(?:\"|\')?generator(?:\"|\')?\s*content=(?:\"|\')?microsoft/gi

which, if changed like so:

 /<meta\s*name=(?:\"|\')?generator(?:\"|\')?\s*content=(?:\"|\')?(?:microsoft|openoffice)/gi

will allow to catch content that comes from OOo Writer.

@jacekbogdanski jacekbogdanski self-assigned this Sep 5, 2018
@jacekbogdanski
Copy link
Member

Hello,

I gave it a try and it seems like it works as expected - at least for very simple LibreOffice content. The proposed regex should also contain libreoffice clause.

I like the idea to integrate OpenOffice/LibreOffice format into pastefromword - except obvious that we will ship additional, nice feature, it may improve pastefromword testing if you don't have access to MS Word.

However, I think it should work out of the box, without additional config option.

@jacekbogdanski jacekbogdanski added type:feature A feature request. status:confirmed An issue confirmed by the development team. plugin:pastefromword The plugin which probably causes the issue. labels Sep 5, 2018
@jacekbogdanski jacekbogdanski removed their assignment Sep 5, 2018
@dryoma
Copy link
Author

dryoma commented Sep 5, 2018

Hi @jacekbogdanski

Nice to hear that. Support by default works too.

it may improve pastefromword testing if you don't have access to MS Word.

Almost my case. Spent hours debugging a custom pastefromword filter script until realizing I was importing from Writer not Word :D

@mlewand
Copy link
Contributor

mlewand commented Sep 6, 2018

I can confirm that Libre Office adds a meta tag allowing us to identify it, like so:

<meta name="generator" content="LibreOffice 6.0.5.2 (Windows)"/>

Based on what I have tested the markup pasted from LibreOffice works pretty well already in CKE4. I checked font color, bold, underline, font size, lists, lists with roman markers - the formatting is retained.

So we'd like to ask the community what kind of features, that currently are not retained, would be valuable for end users?

@mlewand mlewand added the target:major Any docs related issue that should be merged into a major branch. label Sep 6, 2018
@DonWolli
Copy link

Wonderful! But what if the browser rejects paste? And ansers "Press CTRL-V to insert, your browser does not support paste via button" etc.... ? I tried FF, Chrome, EDGE each one with its actual version

@dryoma
Copy link
Author

dryoma commented Sep 27, 2018

@DonWolli it works like that now. Instead of pasting into a dedicated dialog you'll need to ctrl+v into the editor itself. This thread is about expanding the number of programs the pasted content from which CKEditor automatically detects.

@DonWolli
Copy link

and the pasted content leaves untouched with all that unnessacary word stuff ... thats NOT what I want, I want the same functionality as before ...

@dryoma
Copy link
Author

dryoma commented Oct 3, 2018

@DonWolli you can take the filter/default.js file from the latest pre-4.6 plugin version, save it with a different name and feed it to the plugin with the pasteFromWordCleanupFile setting. That way you'll be able to have the old-way filtering with keeping the ACF lax.

But that is not a bug, nor it's the browser's fault. It's a new behavior of the pastefromword plugin since 4.6, now it relies on ACF settings, not on some internal filtering rules. You can read the details in dedicated threads here, and in the changelog. If you still have problems I think it's best to open a new issue, since this one is about different things, and we better not piss the maintainers off with offtopic ;)

@f1ames
Copy link
Contributor

f1ames commented Oct 15, 2019

Since we will be starting to work on this issue, let's sum up what's needed. Let's start with Libre Office support only for now (the markup is probably very similar in both, but Libre Office get much more development during recent years and seems to be more popular), then we can see how compatible it is with Open Office if needed.

I see that we will be able to reuse some pasting filters we already have in place (assumed based on #2374 (comment)), but I will be for introducing it as a new plugin - same as we have pastefromword and pasetfromgdocs, we can add pastefromlibreoffice. Reusing some Word filters may require extracting it to common filters (to pastetools plugin).

As for testing how the plugin performs, it would be good to reuse our two sample documents (ofc copied/recreated in Libre Office) which we have in Paste from Word / Google Docs samples as they cover variety of most common cases.

@msamsel msamsel self-assigned this Oct 15, 2019
@msamsel
Copy link
Contributor

msamsel commented Oct 22, 2019

There appeared some troubles when I dived deeper into this issue.
There might be pasted content from Safari or IE11, which don't have meta-information of text editor engine generator. That's why content looks like fragment of a regular HTML. This results in the situation where both PasteFromWord (PFW) and PasteFromLibreOffice (PFLO) plugins will handle such content one after another. In cases when PFW transforms the content, some information specific for PFLO might be lost.

For examples:
PFLO undef Safari adds color style to paragraph:

<p style="margin-bottom: 0in; line-height: 16px; background-color: transparent; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none; background-position: initial initial; background-repeat: initial initial;">Simple text with<span class="Apple-converted-space"> </span><font color="#000000">black</font>color.</p>

However there is also part of a text which also has applied black color: <font color="#000000">black</font>. After transofrmation with PFW filter color style from paragraph will be transformed into <span>. The same will hapen with the <font> tag. In such situation style related to paragraph should be removed and style applied partially to text should be preserved. Unfortunatelly after PFW transformation both will look the same.

It seems to be optimal to remove such styles before those are transofmred with PFW. However it's not so simple as adding new content handler with higher priority. PFW filter during a transformation obtains the fresh copy of the clipbaord data, so all transformations, made before it, are lost. That's why there is required a more bound solution which will listen on pasteFromWord event or creating a modification in PFW filter to get data from dataValue if exists. The latter solution is also not perfect as there are some unit tests which start to fail after such change.

Even when this case will be fixed, there are still other other aspects to fix:

  1. There might appear situation when content transformed only with PFW will generate different output, then transformed with PFW+PFLO.
  2. There should be preserved consistency if only one plugin will be added to the editor. So PFLO cannot assume that obtained data came from PFW. That's why some logic duplication will be required to handle cases when PFW is not present and PFLO will have to handle such cases individually.
  3. Ideally would be to have only one filter aggregated from different plugins, however, current logic in pastetools doesn't allow on that. Data are deserialized and serialized with each filter separately, which additionally can impact on the editor's performance. Especially under Safari and IE11 browsers where 2 or more filters will have to be executed and data transformation will be run for each of them:
    createFilter: function( options ) {
    var rules = CKEDITOR.tools.array.isArray( options.rules ) ? options.rules : [ options.rules ],
    additionalTransforms = options.additionalTransforms;
    return function( html, editor ) {
    var writer = new CKEDITOR.htmlParser.basicWriter(),
    filter = new CKEDITOR.htmlParser.filter(),
    fragment;
    if ( additionalTransforms ) {
    html = additionalTransforms( html, editor );
    }
    CKEDITOR.tools.array.forEach( rules, function( rule ) {
    filter.addRules( rule( html, editor, filter ) );
    } );
    fragment = CKEDITOR.htmlParser.fragment.fromHtml( html );
    filter.applyTo( fragment );
    fragment.writeHtml( writer );
    return writer.getHtml();
    };

So there are some cases which have quite a huge impact on this issue. The solution would be relatively easy if there would be a lack of support for PFLO under IE11 and Safari. Supporting all browsers requires much more cases to cover with different editor setups.

@f1ames f1ames removed the target:major Any docs related issue that should be merged into a major branch. label Nov 12, 2019
@f1ames f1ames modified the milestones: 4.14.0, Iteration 2019-1 Dec 10, 2019
@f1ames f1ames modified the milestones: Iteration 2019-1, Next Dec 10, 2019
@Comandeer
Copy link
Member

This feature was introduced in #3624 and will be released in 4.14.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin:pastefromword The plugin which probably causes the issue. status:confirmed An issue confirmed by the development team. type:feature A feature request.
Projects
None yet
Development

No branches or pull requests

7 participants