Skip to content
This repository has been archived by the owner on Dec 11, 2020. It is now read-only.

Support an asterisk in bothify and optimize #701

Merged
merged 3 commits into from
Sep 14, 2015
Merged

Support an asterisk in bothify and optimize #701

merged 3 commits into from
Sep 14, 2015

Conversation

nineinchnick
Copy link
Contributor

Base::bothify() now replaces an asterisk ('*') with either a random number or a random letter.

I also replaced preg_replace_callback in numerify and lexify as it is slower than iterating over a range of chars found by using strpos and strrpos.

Benchmarks are available here: https://gist.github.com/nineinchnick/84895a560910e3ead9cc

The biggest question is should we worry about UTF-8? I guess in a worst case it could replace the wrong byte. There are a few SO questions how to iterate over UTF-8 strings efficiently.

Should the new helper method Base::replaceWildcard be public?

*
* @param string $string String that needs to bet parsed
* @return string
*/
public static function bothify($string = '## ??')
{
$string = self::replaceWildcard($string, '*', function () {
return mt_rand(0, 1) ? '#' : '?';
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you directly pass randomDigit() or randomAscii() here? That's save one function call for each placeholder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it save a function call? For randomLetter(), it will be called once anyway and randomDigit() is not called at all, there's an optimization already that gets many digits from a big number.

@fzaninotto
Copy link
Owner

Hi, thanks for the patch!

Being utf8-safe also explains why the current version may be slower. Although faster, your method drops an important ability (UTF-8), which is a stopper. I think preg_replcae_callback() with a /u is a must.

@nineinchnick
Copy link
Contributor Author

I've been thinking about it. We can still use strpos to extract a shorter string to pass to preg_replace_callback(). It also allows to skip the call completely in bothify if there are no '#' or '?'.

I'll benchmark that version and the one with custom utf-8 string iteration.

@nineinchnick
Copy link
Contributor Author

Btw there should be an unit test to see it fail for special utf-8 strings. Maybe I'll try it with minimaxir/big-list-of-naughty-strings.

@nineinchnick
Copy link
Contributor Author

There's an answer on SO that says it should be fine to search for an ascii char in an utf-8 string.

@fzaninotto
Copy link
Owner

Then you'll need a test to prove it :)

@nineinchnick
Copy link
Contributor Author

Done and it seems to work. Done with https://gist.github.com/nineinchnick/917e644df42ccd62db5c
for #, * and ? on every possible utf-8 char and those files:

@fzaninotto
Copy link
Owner

I mean a unit test un Faker.

@nineinchnick
Copy link
Contributor Author

Done.

@fzaninotto
Copy link
Owner

I meant testing Faker, not PHP. You have to make sure numerify, lexify, and bothify work correctly with UTF-8 strings.

@nineinchnick
Copy link
Contributor Author

I'm not sure how such a test would look like. Should I run it on that big utf-8 string, replace the single char back to the placeholder and compare?

@fzaninotto
Copy link
Owner

No, just use a sample UTF-8 string with a couple special chars AND placeholders, and check that the result matches the expected with a regular expression (using /u)

@nineinchnick
Copy link
Contributor Author

Done. Note that it doesn't matter if you use /u or not.

fzaninotto added a commit that referenced this pull request Sep 14, 2015
Support an asterisk in bothify and optimize
@fzaninotto fzaninotto merged commit bedaf7d into fzaninotto:master Sep 14, 2015
@fzaninotto
Copy link
Owner

Great, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants