Skip to content
This repository has been archived by the owner on Dec 11, 2020. It is now read-only.

realText generates string with broken last character in some cases #1305

Closed
apotheosis91 opened this issue Sep 25, 2017 · 1 comment
Closed

Comments

@apotheosis91
Copy link

When realText generates a string that ends with some specific character (for example, р (U+0440) in ru_RU or uk_UA locale; є (U+0454) in uk_UA locale) this character becomes broken.
And when I trying to insert generated string into mysql database I get an error:
SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xD1.'

Here is how to reproduce it:

$faker = Faker\Factory::create('ru_RU');
$faker->seed(208);
$title = $faker->realText(32);  // returns 'Какой-то этакой характе<broken_char_here>.'

Looks like this happens because of rtrim($text, ',— ') in appendEnd method.
Em dash is not an ASCII character so it can cause problems there.

rtrim doesn't understand as one character, but 3 bytes (E2 80 94) and it will trim any of those bytes individually.
So, when text ends with р character (it has hex representation D1 80) rtrim cuts 'half a char' from it: D180 and it becomes broken.

I think changing rtrim to preg_replace will solve the issue.

apotheosis91 added a commit to apotheosis91/Faker that referenced this issue Sep 25, 2017
@fzaninotto
Copy link
Owner

I believe it's because of utf-8 handling. rtrim isn't utf8-safe.

apotheosis91 added a commit to apotheosis91/Faker that referenced this issue Sep 29, 2017
iamraccoon added a commit to iamraccoon/Faker that referenced this issue Oct 18, 2017
iamraccoon added a commit to iamraccoon/Faker that referenced this issue Oct 18, 2017
fzaninotto added a commit that referenced this issue Nov 13, 2017
Fix #1305 realText in some cases breaks last character
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants