Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve importing images from Microsoft Word #4291

Merged
merged 41 commits into from
Oct 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
7a903bb
Add unit tests for extracting images from document with headers and f…
Comandeer Sep 18, 2020
abd0b67
Add manual test for extracting images from files with headers and foo…
Comandeer Sep 18, 2020
208148c
Parse fragments of RTF to get rid of headers and footers.
Comandeer Sep 18, 2020
ca2d2ab
Refactor a little code for parsing RTF and add some explanation in co…
Comandeer Sep 18, 2020
d6969cc
Remove unnecessary Word temp file.
Comandeer Sep 20, 2020
664b007
Fix incorrect file name in manual test.
Comandeer Sep 20, 2020
0d9263b
Add unit tests for unsupported images formats.
Comandeer Sep 20, 2020
a49df5b
Add manual test for unsupported images formats.
Comandeer Sep 20, 2020
cef84aa
Always put images in results table, even with unknown image format. O…
Comandeer Sep 20, 2020
e83f0d5
Remove unnecessary Word temp file. AGAIN
Comandeer Sep 20, 2020
ed2e48f
Use of images ids to track which images are already extracted.
Comandeer Sep 20, 2020
dc52541
Skip WordArt shapes.
Comandeer Sep 20, 2020
c59a402
Update unit test for helpers.
Comandeer Sep 20, 2020
716e9aa
Unify image handling between different Pf* plugins.
Comandeer Sep 23, 2020
f5eb7c0
Move RTF helpers directly to the common filter.
Comandeer Sep 23, 2020
3c7f44a
Enhance RTF parser.
Comandeer Sep 24, 2020
16fcf1c
Implement extracting content from RTF groups and use it to replace th…
Comandeer Sep 24, 2020
e738304
Handle images after Word default filter.
Comandeer Sep 25, 2020
8935980
Always treat images without ids as unique.
Comandeer Sep 25, 2020
aa394fb
Fix failing tests for image filter.
Comandeer Sep 25, 2020
9d9f9f5
Rephrase manual test expected section.
Comandeer Sep 26, 2020
854efb6
Add error for unsupported image formats.
Comandeer Sep 26, 2020
e85b04b
Add error for unsupported images and helper for getting image type.
Comandeer Sep 27, 2020
48b219f
Extend expected in unsupported image types manual test.
Comandeer Sep 28, 2020
9a2ad76
Add error for the case, when there are different amount of images ext…
Comandeer Sep 28, 2020
c4b1703
Add unit tests for duplicated images.
Comandeer Sep 28, 2020
c1af458
Try to differiantiate between duplicated images and the same image in…
Comandeer Sep 28, 2020
a301d51
Add manual test for duplicated images.
Comandeer Sep 28, 2020
db845ae
Make CKEDITOR.pasteFilters the real alias of CKEDITOR.plugins.pasteto…
Comandeer Sep 28, 2020
7a226e1
Update API docs.
Comandeer Sep 28, 2020
7098e74
Remove unnecessary Word temporary file.
Comandeer Oct 2, 2020
a322b3c
Add additional tests for PfLO.
Comandeer Oct 2, 2020
8230a16
Add autogrow plugin to manual tests.
Comandeer Oct 2, 2020
2c5b100
Inlined one var in unit tests.
f1ames Oct 8, 2020
f614889
Extract recognizable image types to the separate property.
Comandeer Oct 10, 2020
5d7d89a
Update API docs for RTF helpers.
Comandeer Oct 10, 2020
2daaeae
Fix wrong indentation.
Comandeer Oct 10, 2020
1d952c0
Add additional tests for RTF helpers.
Comandeer Oct 10, 2020
4c46207
Add additional unit tests.
Comandeer Oct 14, 2020
15e9904
Strip off all drawn objects.
Comandeer Oct 14, 2020
597c28f
Changelog entry.
f1ames Oct 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@ CKEditor 4 Changelog

## CKEditor 4.16

New Features:

* [#2800](https://github.com/ckeditor/ckeditor4/issues/2800): Unsupported image formats are now gracefully handled by the [Paste from Word](https://ckeditor.com/cke4/addon/pastefromword) plugin on paste, additionally showing descriptive error messages.
* [#2800](https://github.com/ckeditor/ckeditor4/issues/2800): Unsupported image formats are now gracefully handled by the [Paste from LibreOffice](https://ckeditor.com/cke4/addon/pastefromlibreoffice) plugin on paste, additionally showing descriptive error messages.

Fixed Issues:

* [#2800](https://github.com/ckeditor/ckeditor4/issues/2800): Fixed: No images are imported from Microsoft Word when content pasted by the [Paste from Word](https://ckeditor.com/cke4/addon/pastefromword) plugin if there is at least one image in unsupported format.

API Changes:

* [#3782](https://github.com/ckeditor/ckeditor4/issues/3782): Moved [`CKEDITOR.plugins.pastetool.filters.word.images`](https://ckeditor.com/docs/ckeditor4/latest/api/CKEDITOR_plugins_pastetools_filters_word_images.html) filters to [`CKEDITOR.plugins.pastetools.filters.image`](https://ckeditor.com/docs/ckeditor4/latest/api/CKEDITOR_plugins_pastetools_filters_image.html) namespace.
* [#4297](https://github.com/ckeditor/ckeditor4/issues/4297): All [`CKEDITOR.plugins.pastetools.filters`](https://ckeditor.com/docs/ckeditor4/latest/api/CKEDITOR_plugins_pastetools_filters.html) are now available under [`CKEDITOR.pasteTools`](https://ckeditor.com/docs/ckeditor4/latest/api/CKEDITOR.html#property-pasteTools) alias.

## CKEditor 4.15

New features:
Expand Down
109 changes: 33 additions & 76 deletions plugins/pastefromword/filter/default.js
Original file line number Diff line number Diff line change
Expand Up @@ -1520,82 +1520,6 @@
};
List = plug.lists;

/**
* Namespace containing a set of image helper methods.
*
* @private
* @since 4.13.0
* @member CKEDITOR.plugins.pastetools.filters.word
*/
plug.images = {
/**
* Parses RTF content to find embedded images. Please be aware that this method should only return `png` and `jpeg` images.
*
* @private
* @since 4.13.0
* @param {String} rtfContent RTF content to be checked for images.
* @returns {Object[]} An array of images found in the `rtfContent`.
* @returns {String} return.hex Hexadecimal string of an image embedded in `rtfContent`.
* @returns {String} return.type A string representing the image type. Allowed values: 'image/png', 'image/jpeg'.
* @member CKEDITOR.plugins.pastetools.filters.word.images
*/
extractFromRtf: function( rtfContent ) {
var ret = [],
rePictureHeader = /\{\\pict[\s\S]+?\\bliptag\-?\d+(\\blipupi\-?\d+)?(\{\\\*\\blipuid\s?[\da-fA-F]+)?[\s\}]*?/,
rePicture = new RegExp( '(?:(' + rePictureHeader.source + '))([\\da-fA-F\\s]+)\\}', 'g' ),
wholeImages,
imageType;

wholeImages = rtfContent.match( rePicture );
if ( !wholeImages ) {
return ret;
}

for ( var i = 0; i < wholeImages.length; i++ ) {
if ( rePictureHeader.test( wholeImages[ i ] ) ) {
if ( wholeImages[ i ].indexOf( '\\pngblip' ) !== -1 ) {
imageType = 'image/png';
} else if ( wholeImages[ i ].indexOf( '\\jpegblip' ) !== -1 ) {
imageType = 'image/jpeg';
} else {
continue;
}

ret.push( {
hex: imageType ? wholeImages[ i ].replace( rePictureHeader, '' ).replace( /[^\da-fA-F]/g, '' ) : null,
type: imageType
} );
}
}

return ret;
},

/**
* Extracts an array of `src`` attributes in `<img>` tags from the given HTML. `<img>` tags belonging to VML shapes are removed.
*
* CKEDITOR.plugins.pastefromword.images.extractTagsFromHtml( html );
* // Returns: [ 'http://example-picture.com/random.png', 'http://example-picture.com/another.png' ]
*
* @private
* @since 4.13.0
* @param {String} html A string representing HTML code.
* @returns {String[]} An array of strings representing the `src` attribute of the `<img>` tags found in `html`.
* @member CKEDITOR.plugins.pastetools.filters.word.images
*/
extractTagsFromHtml: function( html ) {
var regexp = /<img[^>]+src="([^"]+)[^>]+/g,
ret = [],
item;

while ( item = regexp.exec( html ) ) {
ret.push( item[ 1 ] );
}

return ret;
}
};

/**
* Namespace containing methods used to process the pasted content using heuristics.
*
Expand Down Expand Up @@ -1901,10 +1825,41 @@
* @property {Object} images
* @private
* @deprecated 4.13.0
* @removed 4.16.0
f1ames marked this conversation as resolved.
Show resolved Hide resolved
* @since 4.8.0
* @member CKEDITOR.plugins.pastefromword
*/

/**
* See {@link CKEDITOR.plugins.pastetools.filters.image}.
*
* @property {Object} images
* @private
* @removed 4.16.0
* @since 4.13.0
* @member CKEDITOR.plugins.pastetools.filters.word
*/

/**
* See {@link CKEDITOR.plugins.pastetools.filters.image#extractFromRtf}.
*
* @property {Function} extractFromRtf
* @private
* @removed 4.16.0
* @since 4.13.0
* @member CKEDITOR.plugins.pastetools.filters.word.images
*/

/**
* See {@link CKEDITOR.plugins.pastetools.filters.image#extractTagsFromHtml}.
*
* @property {Function} extractTagsFromHtml
* @private
* @removed 4.16.0
* @since 4.13.0
* @member CKEDITOR.plugins.pastetools.filters.word.images
*/

/**
* See {@link CKEDITOR.plugins.pastetools.filters.word.heuristics}.
*
Expand All @@ -1925,6 +1880,8 @@
* @member CKEDITOR.plugins.pastefromword
*/



/**
* See {@link #pasteTools_removeFontStyles}.
*
Expand Down
59 changes: 10 additions & 49 deletions plugins/pastefromword/plugin.js
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
configInlineImages = editor.config.pasteFromWord_inlineImages === undefined ? true : editor.config.pasteFromWord_inlineImages,
defaultFilters = [
CKEDITOR.getUrl( pastetoolsPath + 'filter/common.js' ),
CKEDITOR.getUrl( pastetoolsPath + 'filter/image.js' ),
CKEDITOR.getUrl( path + 'filter/default.js' )
];

Expand Down Expand Up @@ -102,6 +103,15 @@
if ( forceFromWord || confirmCleanUp() ) {
pfwEvtData.dataValue = CKEDITOR.cleanWord( pfwEvtData.dataValue, editor );

// Paste From Word Image:
// RTF clipboard is required for embedding images.
// If img tags are not allowed there is no point to process images.
// Also skip embedding images if image filter is not loaded.
if ( CKEDITOR.plugins.clipboard.isCustomDataTypesSupported && configInlineImages &&
CKEDITOR.pasteFilters.image ) {
pfwEvtData.dataValue = CKEDITOR.pasteFilters.image( pfwEvtData.dataValue, editor, dataTransferRtf );
}

editor.fire( 'afterPasteFromWord', pfwEvtData );

data.dataValue = pfwEvtData.dataValue;
Expand All @@ -127,56 +137,7 @@
}
}
} );

// Paste From Word Image:
// RTF clipboard is required for embedding images.
// If img tags are not allowed there is no point to process images.
if ( CKEDITOR.plugins.clipboard.isCustomDataTypesSupported && configInlineImages ) {
editor.on( 'afterPasteFromWord', imagePastingListener );
}

function imagePastingListener( evt ) {
var pfw = CKEDITOR.plugins.pastefromword && CKEDITOR.plugins.pastefromword.images,
imgTags,
hexImages,
newSrcValues = [],
i;

// If pfw images namespace is unavailable or img tags are not allowed we simply skip adding images.
if ( !pfw || !evt.editor.filter.check( 'img[src]' ) ) {
return;
}

function createSrcWithBase64( img ) {
return img.type ? 'data:' + img.type + ';base64,' + CKEDITOR.tools.convertBytesToBase64( CKEDITOR.tools.convertHexStringToBytes( img.hex ) ) : null;
}

imgTags = pfw.extractTagsFromHtml( evt.data.dataValue );
if ( imgTags.length === 0 ) {
return;
}

hexImages = pfw.extractFromRtf( evt.data.dataTransfer[ 'text/rtf' ] );
if ( hexImages.length === 0 ) {
return;
}

CKEDITOR.tools.array.forEach( hexImages, function( img ) {
newSrcValues.push( createSrcWithBase64( img ) );
}, this );

// Assuming there is equal amount of Images in RTF and HTML source, so we can match them accordingly to the existing order.
if ( imgTags.length === newSrcValues.length ) {
for ( i = 0; i < imgTags.length; i++ ) {
// Replace only `file` urls of images ( shapes get newSrcValue with null ).
if ( ( imgTags[ i ].indexOf( 'file://' ) === 0 ) && newSrcValues[ i ] ) {
evt.data.dataValue = evt.data.dataValue.replace( imgTags[ i ], newSrcValues[ i ] );
}
}
}
}
}

} );
} )();

Expand Down
Loading