Fixed multibyte string encoding #128

edgardmessias · 2021-01-14T19:01:34Z

Q	A
Documentation	no
Bugfix	yes
BC Break	no
New Feature	no
RFC	no
QA	no

Description

The problem occurs when the filename is multibyte string and this needs continuation.
str_split don't work with multibyte strings, like chineses characteres.

Related to glpi-project/glpi#8495

snapshotpl · 2021-01-14T19:05:23Z

@edgardmessias please provide covered test case

glensc · 2021-01-14T19:37:57Z

I think it's a lot better to use only mb_str_split and add https://github.com/symfony/polyfill-mbstring polyfill dependency.

maintainers need to say their word on that.

glensc · 2021-01-14T19:45:39Z

also, I think you must specify $encoding parameter to mb_str_split, as the default comes from php.ini, is ISO8859-1 in older versions, changed to UTF-8 in PHP 5.6 but yet still configurable in php.ini settings. a library should not rely on such global behavior.

how does it even work for you? you say your encoding is "Chinese" (which one of them? Big5? ISO 2022-JP?. ...?), but then write code that assumes utf-8?

in the bug report, you complain that the existing code doesn't respect multibyte, but then you add a function that relies on autodetect or unknown default?

edgardmessias · 2021-01-14T20:16:02Z

Looking here the dependencies, the true/punycode require symfony/polyfill-mbstring

glensc · 2021-01-14T20:26:22Z

great, then you only need to ensure the proper version is used:

composer require symfony/polyfill-mbstring:^1.12.0

add new PHP 7.4 functions symfony/polyfill#181, v1.12.0 was the first tag to include that PR

edgardmessias · 2021-01-15T01:38:03Z

@glensc , I made a full rework

src/Header/ContentDisposition.php

glensc · 2021-01-15T10:04:03Z

Add test where filename is not quoted. as I'm sure quotes are optional:

-filename=\"$multibyteFilename\""
+filename=$multibyteFilename

are using single quotes also allowed there?
need to consult RFC, so far this is just suspiction.

test/Header/ContentDispositionTest.php

edgardmessias · 2021-01-15T12:26:01Z

are using single quotes also allowed there?
need to consult RFC, so far this is just suspiction.

I think this is a discussion for another topic.

I solved the problems mentioned above

glensc · 2021-01-15T12:30:33Z

Some other notes.

do not force push commits when the review is in progress. add git fixup commits instead and rebase later. it's difficult to review if every push you make is overwriting previous changes, difficult to see new changes. use Draft status to indicate PR is not ready for merge. I recommend https://github.com/keis/git-fixup and https://github.com/MitMaro/git-interactive-rebase-tool to help you in the process.
avoid adding references in commit messages, like Related to glpi-project/glpi#8495. keep them in PR body instead. this avoids creating spam and bogus references on the other side. each force push creates noise there and it has no real value: GLPI 9.5.3 mailcollector drops some attached documents depending on the file name glpi-project/glpi#8495 (reference)

glensc · 2021-01-15T12:33:57Z

I think this is a discussion for another topic.

I'm not certain of that, as your new code assumes an exact match of =" (line 148), so if it's not quoted, parsing will fail? I have not tested how code behaved previously, only looked at your diff and saw that you expect quote following immediately after equal sign. you should also test with optional spacing: = "...", =\t"..." (tab), so it looks fragile.

edgardmessias · 2021-01-15T12:45:23Z

@glensc First, thanks for the tips.

About the quote, I not modified the parser function (lines 43-95), but the encoder function (getFieldValue 108-173)

The problem with getFieldValue is when the filename is multibyte string and this needs continuation field.

glensc · 2021-01-15T17:28:44Z

@edgardmessias alright, I had a really quick look on the diff only, so didn't look it closely it's a generator, not parser logic.

Ocramius · 2021-01-17T17:18:21Z

test/Header/ContentDispositionTest.php

@@ -194,8 +202,20 @@ public function setDispositionProvider(): array
            'UTF-8 continuation' => [
                'attachment',
                ['filename' => 'this-file-name-is-so-long-that-it-does-not-even-fit-on-a-whole-line-by-itself-so-we-need-to-split-it-with-value-continuation.also-UTF-8-characters-hērē.txt'],
-                "attachment;\r\n filename*0=\"this-file-name-is-so-long-that-it-does-not-even-fit-on-a-\";\r\n filename*1=\"whole-line-by-itself-so-we-need-to-split-it-with-value-co\";\r\n filename*2=\"ntinuation.also-UTF-8-characters-hērē.txt\"",
-                "Content-Disposition: attachment;\r\n filename*0=\"=?UTF-8?Q?this-file-name-is-so-long-that-it-does-not-ev?=\";\r\n filename*1=\"=?UTF-8?Q?en-fit-on-a-whole-line-by-itself-so-we-need-t?=\";\r\n filename*2=\"=?UTF-8?Q?o-split-it-with-value-continuation.also-UTF-8?=\";\r\n filename*3=\"=?UTF-8?Q?-characters-h=C4=93r=C4=93.txt?=\"",
+                "attachment;\r\n filename*0=\"this-file-name-is-so-long-that-it-does-not-even-fit-on-a-whole-\";\r\n filename*1=\"line-by-itself-so-we-need-to-split-it-with-value-continuation.a\";\r\n filename*2=\"lso-UTF-8-characters-hērē.txt\"",


Some byte-level changes happened here: should be investigated (these two lines, specifically)

Yes, because I edited the function to generate exactly size of 78 characters

cedric-anne · 2021-01-28T07:54:21Z

I reviewed changes and tests cases and all seems ok form me.

Indeed, now filenames that are not using multibytes strings are now splitted on lines that are 78 chars length (instead of fickle value based on ceil(0.6 * $maxValueLength) operation), and multibyte UTF-8 filenames are correctly handled.

src/Header/ContentDisposition.php

glensc

LGTM

Related to glpi-project/glpi#8495 Signed-off-by: Edgard <edgardmessias@gmail.com>

edgardmessias force-pushed the patch-1 branch from 87d737b to 6be60b2 Compare January 14, 2021 19:04

edgardmessias force-pushed the patch-1 branch 3 times, most recently from 4b20ec7 to 62e782f Compare January 15, 2021 01:32

glensc reviewed Jan 15, 2021

View reviewed changes

src/Header/ContentDisposition.php Outdated Show resolved Hide resolved

glensc reviewed Jan 15, 2021

View reviewed changes

test/Header/ContentDispositionTest.php Show resolved Hide resolved

edgardmessias force-pushed the patch-1 branch from 62e782f to a58790b Compare January 15, 2021 12:23

edgardmessias mentioned this pull request Jan 15, 2021

Fixed mailcollector with multibyte filename (#8495) glpi-project/glpi#8531

Merged

Ocramius reviewed Jan 17, 2021

View reviewed changes

edgardmessias requested a review from Ocramius January 28, 2021 00:21

edgardmessias force-pushed the patch-1 branch 2 times, most recently from ece12a0 to 35e0828 Compare January 28, 2021 13:58

glensc reviewed Jan 28, 2021

View reviewed changes

src/Header/ContentDisposition.php Outdated Show resolved Hide resolved

edgardmessias force-pushed the patch-1 branch from 35e0828 to a58790b Compare January 28, 2021 18:23

glensc approved these changes Jan 28, 2021

View reviewed changes

weierophinney added this to the 2.13.1 milestone Feb 12, 2021

weierophinney added the Bug Something isn't working label Feb 12, 2021

Fixed multibyte string encoding

dd51ff2

Related to glpi-project/glpi#8495 Signed-off-by: Edgard <edgardmessias@gmail.com>

weierophinney force-pushed the patch-1 branch from a58790b to dd51ff2 Compare February 12, 2021 17:34

weierophinney merged commit cc5a038 into laminas:2.13.x Feb 12, 2021

weierophinney mentioned this pull request Feb 12, 2021

Fix str_split warning in ContentDisposition.php #115

Closed

github-actions bot mentioned this pull request Feb 12, 2021

Merge release 2.13.1 into 2.14.x #137

Merged

glensc mentioned this pull request Mar 21, 2021

Update laminas/laminas-mail (2.12.5 => 2.14.0) eventum/eventum#1016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed multibyte string encoding #128

Fixed multibyte string encoding #128

edgardmessias commented Jan 14, 2021 •

edited

Loading

snapshotpl commented Jan 14, 2021

glensc commented Jan 14, 2021

glensc commented Jan 14, 2021

edgardmessias commented Jan 14, 2021

glensc commented Jan 14, 2021

edgardmessias commented Jan 15, 2021

glensc commented Jan 15, 2021

edgardmessias commented Jan 15, 2021

glensc commented Jan 15, 2021

glensc commented Jan 15, 2021

edgardmessias commented Jan 15, 2021

glensc commented Jan 15, 2021

Ocramius Jan 17, 2021

edgardmessias Jan 17, 2021

cedric-anne commented Jan 28, 2021

glensc left a comment

Fixed multibyte string encoding #128

Fixed multibyte string encoding #128

Conversation

edgardmessias commented Jan 14, 2021 • edited Loading

Description

snapshotpl commented Jan 14, 2021

glensc commented Jan 14, 2021

glensc commented Jan 14, 2021

edgardmessias commented Jan 14, 2021

glensc commented Jan 14, 2021

edgardmessias commented Jan 15, 2021

glensc commented Jan 15, 2021

edgardmessias commented Jan 15, 2021

glensc commented Jan 15, 2021

glensc commented Jan 15, 2021

edgardmessias commented Jan 15, 2021

glensc commented Jan 15, 2021

Ocramius Jan 17, 2021

Choose a reason for hiding this comment

edgardmessias Jan 17, 2021

Choose a reason for hiding this comment

cedric-anne commented Jan 28, 2021

glensc left a comment

Choose a reason for hiding this comment

edgardmessias commented Jan 14, 2021 •

edited

Loading