Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with smarty_mb_str_replace PHP >= 7.1 #549

Closed
modGTB opened this issue Jul 30, 2019 · 21 comments
Closed

problems with smarty_mb_str_replace PHP >= 7.1 #549

modGTB opened this issue Jul 30, 2019 · 21 comments

Comments

@modGTB
Copy link

modGTB commented Jul 30, 2019

There ist a problem with the "replace" modifier in combination with ISO-8859-1 and PHP >= 7.1
You get an empty string back in case the string contains german umlauts (öäü).

@AnrDaemon
Copy link
Contributor

Could you please fork https://github.com/AnrDaemon/test-001 and reproduce your issue in a branch cloned from the master?

@cmanley
Copy link

cmanley commented Sep 24, 2019

I'd like to add to this (after having issues with latin encoded German characters in Smarty 3.1.33)....
The issue is not related to the PHP version used (maybe the default encoding happened to be different with your new PHP version), so I suggest changing the subject a little.
What's happening is that "replace" calls "smarty_mb_str_replace" if the MB extension is loaded, but the latter function does not take the encoding set in Smarty::$_CHARSET into account, assuming that the text is encoded in UTF-8 which causes the line with "mb_split(preg_quote(..." to return false. One of the reasons is that mb_split() assumes the encoding as returned by mb_regex_encoding() which Smarty sets nowhere.
So the fix is fairly simple: make smarty_mb_str_replace use Smarty::$CHARSET and be very careful and considerate of encodings when using mb_split and preg* functions and read all the warnings in below the documentation of mb_regex_encoding().

I noticed a few more minor issues related to encoding in Smarty while digging into this:

  1. In Smarty.class.php, the comment above the define of SMARTY_RESOURCE_CHAR_SET says "deprecated in favor of Smarty::$_CHARSET". The problem with that is, is that Smarty code accesses Smarty::$_CHARSET using the hard-coded class name prefix instead of static::$_CHARSET, which makes overriding troublesome.
  2. The constructor in Smarty.class.php modifies the applications internal encoding (this is a no-no) by calling mb_internal_encoding() instead of adapting itself to that encoding.
  3. Custom child classes of Smarty that set Smarty::$_CHARSET in their constructors have to do so before calling parent::__construct() or else Smarty::$_UTF8_MODIFIER will be set incorrectly.
  4. Because the Smarty encoding related properties are statics, it makes it impossible to use 2 Smarty instances in 2 different encodings simultaneously (e.g. one for legacy SMS templates in latin1, and one for UTF-8 encoded HTML pages). Although rare, a (very) good Smarty design will not fail then.

@modGTB
Copy link
Author

modGTB commented Sep 24, 2019

I have fixed the problem with adding these lines:
if (is_callable('mb_regex_encoding')) { mb_regex_encoding(Smarty::$_CHARSET); }

after
if (is_callable('mb_internal_encoding')) { mb_internal_encoding(Smarty::$_CHARSET); }

@cmanley
Copy link

cmanley commented Sep 24, 2019

That's almost a good solution, except for 1 problem: mb_regex_encoding() supports much less encodings than mb_internal_encoding(). See the comments below the mb_regex_encoding documentation. I'm afraid, if you want to use mb_split(), then you'll have to translate the given input into utf-8 (or whatever encoding has the widest coverage), operate on it, and then translate the result back into the expected encoding. Perhaps writing a alternative mb_trim-like function that supports an encoding parameter is better. It's a nasty problem, like treading in a code minefield.

@cmanley
Copy link

cmanley commented Sep 24, 2019

I think this will fix it, by replacing the last "else" block in plugins/shared.mb_str_replace.php with this:

        } else {
            $want_encoding = 'UTF-8';
            $must_transcode = function_exists('mb_convert_encoding') && strcasecmp(Smarty::$_CHARSET, $want_encoding);
            if ($must_transcode) {
                $search  = mb_convert_encoding($search , $want_encoding, Smarty::$_CHARSET);
                $replace = mb_convert_encoding($replace, $want_encoding, Smarty::$_CHARSET);
                $subject = mb_convert_encoding($subject, $want_encoding, Smarty::$_CHARSET);
            }
            $parts = preg_split('/' . preg_quote($search, '/') . '/u', $subject);
            $count = count($parts) - 1;
            $subject = implode($replace, $parts);
            if ($must_transcode) {
                $subject = mb_convert_encoding($subject, Smarty::$_CHARSET, $want_encoding);
            }
        }

It needs testing though.
Edit: Slightly optimized update using preg_replace instead of preg_split + count + implode:

        } else {
            $want_encoding = 'UTF-8';
            $must_transcode = function_exists('mb_convert_encoding') && strcasecmp(Smarty::$_CHARSET, $want_encoding);
            if ($must_transcode) {
                $search  = mb_convert_encoding($search , $want_encoding, Smarty::$_CHARSET);
                $replace = mb_convert_encoding($replace, $want_encoding, Smarty::$_CHARSET);
                $subject = mb_convert_encoding($subject, $want_encoding, Smarty::$_CHARSET);
            }
            $subject = preg_replace('/' . preg_quote($search, '/') . '/u', $replace, $subject, -1, $count);
            if ($must_transcode) {
                $subject = mb_convert_encoding($subject, Smarty::$_CHARSET, $want_encoding);
            }
        }

The function should do better argument type checking b.t.w. to handle objects, nulls, etc. being passed in.

@AnrDaemon
Copy link
Contributor

You can use {} as RE delimiters, avoiding much of the "quote this, quote that" issue.
But without a test case, all your code is for nothing.

@cmanley
Copy link

cmanley commented Sep 25, 2019

...until you get a search string with either { or } in it.
It's not for nothing for who does what I did: copy + fix the function and load it before Smarty loads.
Sorry, I have no time to create demos/test-cases due to heavy work load. Someone else can give it a try if they want.

@AnrDaemon
Copy link
Contributor

Sorry, I did mean to say "quote this, don't quote that".
preg_match('{' . preg_quote($string) . '}u'); is perfectly safe.

@AnrDaemon
Copy link
Contributor

It's not for nothing for who does what I did: copy + fix the function and load it before Smarty loads.

Without a clear test case you can't be sure if the fix is working, or if there's no sideeffects of your fix. Or that your fix won't be accidentally dropped in a future refactoring.

@cmanley
Copy link

cmanley commented Apr 14, 2020

Still broken in 3.1.36
Update: and still broken in 3.1.40.

@Tomcraft1980
Copy link

Hmmm, that is really sad!

cmanley pushed a commit to cmanley/test-001 that referenced this issue May 1, 2020
@cmanley
Copy link

cmanley commented May 1, 2020

It's not for nothing for who does what I did: copy + fix the function and load it before Smarty loads.

Without a clear test case you can't be sure if the fix is working, or if there's no sideeffects of your fix. Or that your fix won't be accidentally dropped in a future refactoring.

Here's the forked test case demonstrating the issue and the fix:
https://github.com/cmanley/test-001

@AnrDaemon
Copy link
Contributor

AnrDaemon commented May 3, 2020

I see the issue, however, the fix is not appropriate. The real problem is that

  1. Smarty is changing mb_internal_encoding, consequently breaking entire environment;
  2. mb_split is using two encodings, and it does that inconsistently. Bad stuff.

This can be fixed in two ways. One is quick and dirty. One is correct and very, very tedious.

@Tomcraft1980
Copy link

Good news! So there is a chance to see this issue be fixed in an upcoming release? ;-)

@AnrDaemon
Copy link
Contributor

I'm tinkering with the code as we speak. May be, just may be, I will have a fix. I already have a regression test case, so this should be covered.

@Tomcraft1980
Copy link

Thanks in advance! =)

@AnrDaemon
Copy link
Contributor

Another problematic point is that you can not actually rely on Smarty::$_CHARSET as it can be changed AFTER initialization.

@AnrDaemon
Copy link
Contributor

plugins/modifier.escape.php needs an investigation and coverage analysis.

@AnrDaemon
Copy link
Contributor

It seems I'm going to fix A LOT of stuff in process. >.< There's lots of PHP incompatibilities and the test suite is not configured to run on minimum advertised PHP version.

AnrDaemon added a commit to AnrDaemon/smarty that referenced this issue May 4, 2020
`mb_split` will fail if `$pattern` or `$string` contains byte sequences not invalid for `mb_regex_encoding()`.

Convert both strings and set regex encoding to `UTF-8` before calling mb_split.

Fixes smarty-php#549
AnrDaemon added a commit to AnrDaemon/smarty that referenced this issue May 4, 2020
`mb_split` will fail if `$pattern` or `$string` contains byte sequences not valid for `mb_regex_encoding()`.

Convert both strings and set regex encoding before calling `mb_split()` if needed.

Fixes smarty-php#549
AnrDaemon added a commit to AnrDaemon/smarty that referenced this issue May 5, 2020
`mb_split` will fail if `$pattern` or `$string` contains byte sequences not valid for `mb_regex_encoding()`.

Convert both strings and set regex encoding before calling `mb_split()` if needed.

Fixes smarty-php#549
@bLeveque42
Copy link

I was running into an obscure issue in the PrestaShop back-office, and nailed it down to the replace modifier.
Every time I'd call |replace, I'd just get a blank page after that, even with smarty_debugging and error_reporting turned on.

It seems it was this exact same issue, and it was resolved once I installed php7.3-mbstring.

Thank you for the fix, this is gonna help a lot of people!

AnrDaemon added a commit to AnrDaemon/smarty that referenced this issue Mar 4, 2022
`mb_split` will fail if `$pattern` or `$string` contains byte sequences not valid for `mb_regex_encoding()`.

Convert both strings and set regex encoding before calling `mb_split()` if needed.

Fixes smarty-php#549
AnrDaemon added a commit to AnrDaemon/smarty that referenced this issue Mar 4, 2022
`mb_split` will fail if `$pattern` or `$string` contains byte sequences not valid for `mb_regex_encoding()`.

Convert both strings and set regex encoding before calling `mb_split()` if needed.

Fixes smarty-php#549
@wisskid
Copy link
Contributor

wisskid commented Jan 31, 2023

This was fixed in #740.

@wisskid wisskid closed this as completed Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants