Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed multibyte string encoding #128

Merged
merged 1 commit into from
Feb 12, 2021
Merged

Conversation

edgardmessias
Copy link
Contributor

@edgardmessias edgardmessias commented Jan 14, 2021

Q A
Documentation no
Bugfix yes
BC Break no
New Feature no
RFC no
QA no

Description

The problem occurs when the filename is multibyte string and this needs continuation.
str_split don't work with multibyte strings, like chineses characteres.

Related to glpi-project/glpi#8495

@snapshotpl
Copy link
Member

@edgardmessias please provide covered test case

@glensc
Copy link
Contributor

glensc commented Jan 14, 2021

I think it's a lot better to use only mb_str_split and add https://github.com/symfony/polyfill-mbstring polyfill dependency.

maintainers need to say their word on that.

@glensc
Copy link
Contributor

glensc commented Jan 14, 2021

also, I think you must specify $encoding parameter to mb_str_split, as the default comes from php.ini, is ISO8859-1 in older versions, changed to UTF-8 in PHP 5.6 but yet still configurable in php.ini settings. a library should not rely on such global behavior.

how does it even work for you? you say your encoding is "Chinese" (which one of them? Big5? ISO 2022-JP?. ...?), but then write code that assumes utf-8?

in the bug report, you complain that the existing code doesn't respect multibyte, but then you add a function that relies on autodetect or unknown default?

@edgardmessias
Copy link
Contributor Author

Looking here the dependencies, the true/punycode require symfony/polyfill-mbstring

@glensc
Copy link
Contributor

glensc commented Jan 14, 2021

great, then you only need to ensure the proper version is used:

composer require symfony/polyfill-mbstring:^1.12.0

@edgardmessias edgardmessias force-pushed the patch-1 branch 3 times, most recently from 4b20ec7 to 62e782f Compare January 15, 2021 01:32
@edgardmessias
Copy link
Contributor Author

@glensc , I made a full rework

@glensc
Copy link
Contributor

glensc commented Jan 15, 2021

Add test where filename is not quoted. as I'm sure quotes are optional:

-filename=\"$multibyteFilename\""
+filename=$multibyteFilename

are using single quotes also allowed there?
need to consult RFC, so far this is just suspiction.

@edgardmessias
Copy link
Contributor Author

are using single quotes also allowed there?
need to consult RFC, so far this is just suspiction.

I think this is a discussion for another topic.

I solved the problems mentioned above

@glensc
Copy link
Contributor

glensc commented Jan 15, 2021

Some other notes.

  1. do not force push commits when the review is in progress. add git fixup commits instead and rebase later. it's difficult to review if every push you make is overwriting previous changes, difficult to see new changes. use Draft status to indicate PR is not ready for merge. I recommend https://github.com/keis/git-fixup and https://github.com/MitMaro/git-interactive-rebase-tool to help you in the process.
  2. avoid adding references in commit messages, like Related to glpi-project/glpi#8495. keep them in PR body instead. this avoids creating spam and bogus references on the other side. each force push creates noise there and it has no real value: GLPI 9.5.3 mailcollector drops some attached documents depending on the file name glpi-project/glpi#8495 (reference)

@glensc
Copy link
Contributor

glensc commented Jan 15, 2021

I think this is a discussion for another topic.

I'm not certain of that, as your new code assumes an exact match of =" (line 148), so if it's not quoted, parsing will fail? I have not tested how code behaved previously, only looked at your diff and saw that you expect quote following immediately after equal sign. you should also test with optional spacing: = "...", =\t"..." (tab), so it looks fragile.

@edgardmessias
Copy link
Contributor Author

@glensc First, thanks for the tips.

About the quote, I not modified the parser function (lines 43-95), but the encoder function (getFieldValue 108-173)

The problem with getFieldValue is when the filename is multibyte string and this needs continuation field.

@glensc
Copy link
Contributor

glensc commented Jan 15, 2021

@edgardmessias alright, I had a really quick look on the diff only, so didn't look it closely it's a generator, not parser logic.

@@ -194,8 +202,20 @@ public function setDispositionProvider(): array
'UTF-8 continuation' => [
'attachment',
['filename' => 'this-file-name-is-so-long-that-it-does-not-even-fit-on-a-whole-line-by-itself-so-we-need-to-split-it-with-value-continuation.also-UTF-8-characters-hērē.txt'],
"attachment;\r\n filename*0=\"this-file-name-is-so-long-that-it-does-not-even-fit-on-a-\";\r\n filename*1=\"whole-line-by-itself-so-we-need-to-split-it-with-value-co\";\r\n filename*2=\"ntinuation.also-UTF-8-characters-hērē.txt\"",
"Content-Disposition: attachment;\r\n filename*0=\"=?UTF-8?Q?this-file-name-is-so-long-that-it-does-not-ev?=\";\r\n filename*1=\"=?UTF-8?Q?en-fit-on-a-whole-line-by-itself-so-we-need-t?=\";\r\n filename*2=\"=?UTF-8?Q?o-split-it-with-value-continuation.also-UTF-8?=\";\r\n filename*3=\"=?UTF-8?Q?-characters-h=C4=93r=C4=93.txt?=\"",
"attachment;\r\n filename*0=\"this-file-name-is-so-long-that-it-does-not-even-fit-on-a-whole-\";\r\n filename*1=\"line-by-itself-so-we-need-to-split-it-with-value-continuation.a\";\r\n filename*2=\"lso-UTF-8-characters-hērē.txt\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some byte-level changes happened here: should be investigated (these two lines, specifically)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because I edited the function to generate exactly size of 78 characters

@cedric-anne
Copy link
Contributor

I reviewed changes and tests cases and all seems ok form me.

Indeed, now filenames that are not using multibytes strings are now splitted on lines that are 78 chars length (instead of fickle value based on ceil(0.6 * $maxValueLength) operation), and multibyte UTF-8 filenames are correctly handled.

@edgardmessias edgardmessias force-pushed the patch-1 branch 2 times, most recently from ece12a0 to 35e0828 Compare January 28, 2021 13:58
Copy link
Contributor

@glensc glensc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@weierophinney weierophinney added this to the 2.13.1 milestone Feb 12, 2021
@weierophinney weierophinney added the Bug Something isn't working label Feb 12, 2021
Related to glpi-project/glpi#8495

Signed-off-by: Edgard <edgardmessias@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants