[8.x] increase performance of Str::before by over 60%. #34642

lupinitylabs · 2020-10-02T20:22:48Z

This PR proposes a change to the Str::before function that improves performance by over 60% compared to the old code (see benchmarks below). Even with an empty search string, the new code is slightly faster.

The string cast of the $search parameter is necessary to remain compatible with the tests/Support/SupportStrTest.php test, which tests for acceptance of integer values as $search parameter. To me, this is a little confusing, because the PHPdoc block declares $search as string. I didn't change this, but I think the type should be changed to mixed for $search, if this is really expected behavior.

Alternatively, the test could be changed, but since this would introduce a breaking change, this is not advisable.

As a side node, the new code would be another 10% faster without the string cast. Not using a shorthand if in the return line would further improve performance, but is slightly less readable.

Benchmarks

Current code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
0.95517802238464
0.96321487426758
0.95738410949707
0.95915079116821
0.96855711936951

New code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
0.58452105522156
0.58884692192078
0.5809018611908
0.58099007606506
0.58521318435669

Current code with empty search:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = ''; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
0.16867280006409
0.16797685623169
0.16919302940369
0.16927814483643
0.16803288459778

New code with empty search:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = ''; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
0.1334171295166
0.13449788093567
0.13518381118774
0.13393306732178
0.13339805603027

New code without string cast:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, $search, true); if ($result === false) { $subject; } else { $result; } } } echo microtime(true)-$s . "\n"; }
0.50580191612244
0.50802898406982
0.51281690597534
0.52838587760925
0.51457190513611

browner12 · 2020-10-02T20:40:40Z

looks like you could get rid of the initial empty string check when switching to this function. strstr will throw a warning, but it does allow an empty needle, where explode does not.

$result = strstr($subject, (string) $search, true);

return $result === false ? $subject : $result;

and then couldn't we single line it?

return strstr($subject, (string) $search, true) ?: $subject;

GrahamCampbell · 2020-10-02T20:44:36Z

Does this actually make things faster for real apps?

lupinitylabs · 2020-10-02T22:59:30Z

looks like you could get rid of the initial empty string check when switching to this function. strstr will throw a warning, but it does allow an empty needle, where explode does not.
$result = strstr($subject, (string) $search, true);

return $result === false ? $subject : $result;

Not a fan of having a warning thrown, tbh.

and then couldn't we single line it?
return strstr($subject, (string) $search, true) ?: $subject;

Consider this case:

$subject = '0x10'; 
$search = 'x'; 

strstr($subject, (string) $search, true) ?: $subject;

This will return 0x10, but it would be expected to return 0.

It would be possible to write it as

return ($result = strstr($subject, (string) $search, true)) === false ? $subject : $result;

but I refrained from doing that for increased readability.

lupinitylabs · 2020-10-02T23:11:33Z

Does this actually make things faster for real apps?

Depends on the app and the data, I would say. It will certainly make a difference iterating over very large datasets. I noticed it myself when optimizing such a script and found out that strstr was making a difference of several seconds in runtime.

And I was being kind with the benchmarks. To visualize how bad it can really get, imagine a case where the needle is in the haystack multiple times:

New code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '
/'; for ($i = 10000000;
0.56986904144287
0.57503294944763
0.57185816764832
0.57861685752869
0.5752580165863

Current code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
20.957284927368
21.176103115082
20.958166122437
20.968662977219
21.130389928818

That is ~ 35x faster. So, yes, I would say it does have an impact in some cases. Frankly, I wouldn't want to leave the function that way, and I don't see a benefit of not patching it.

Same goes for large strings, which is a more realistic use-case:

Current:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'test/' . str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
8.6145751476288
9.2526910305023
8.4428129196167
9.2436029911041
8.5641269683838

New:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'test/' . str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true
)-$s . "\n"; }
0.57037281990051
0.5638210773468
0.56512999534607
0.56515693664551
0.57639598846436

And this is only a 10 kB string.

Of course, if there is a needle anywhere in the haystack, strstr will always win over explode. But even if the needle does not occur in the haystack, the new code is slightly faster:

Current:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
1.6434171199799
1.6519260406494
1.7897970676422
1.6361329555511
1.6360969543457

New:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
1.4887380599976
1.4843230247498
1.483482837677
1.4862279891968
1.4922170639038

Even if this might be a little constructed and might not happen very often in a real-world application, as a developer I would hope that the Laravel helpers would not only offer a clean and pleasant interface, but also strive for the best possible solution under the hood. I like the idea of having the peace of mind that Laravel is not only a joy to use, but it does the job in the most performant way possible.

mfn · 2020-10-03T11:20:03Z

I wonder how

strtok('before?after', '?')

fares with the micro benchmarks.

Same behvaiour was with strstr, if not found it returns false

lupinitylabs · 2020-10-03T14:16:13Z

I wonder how
strtok('before?after', '?')
fares with the micro benchmarks.

Same behvaiour was with strstr, if not found it returns false

Slightly worse than strstr for small strings and MUCH worse than explode on long strings.

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strtok($subject, (string) $search); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
0.70224785804749
0.70644283294678
0.70995903015137
0.70787596702576
0.70642018318176

>>> $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strtok($subject, (string) $search); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n";
77.940736055374

mfn · 2020-10-05T04:00:35Z

Thank you! => 🗑️ 😄

lupinitylabs added 2 commits October 2, 2020 21:47

[8.x] increase performance of Str::before by over 50%.

712e785

[8.x] refactor code for readability and a further 10% speed gain.

5d3b96f

taylorotwell merged commit 576dd3d into laravel:8.x Oct 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.x] increase performance of Str::before by over 60%. #34642

[8.x] increase performance of Str::before by over 60%. #34642

lupinitylabs commented Oct 2, 2020 •

edited

Loading

browner12 commented Oct 2, 2020

GrahamCampbell commented Oct 2, 2020

lupinitylabs commented Oct 2, 2020 •

edited

Loading

lupinitylabs commented Oct 2, 2020 •

edited

Loading

mfn commented Oct 3, 2020

lupinitylabs commented Oct 3, 2020 •

edited

Loading

mfn commented Oct 5, 2020

[8.x] increase performance of Str::before by over 60%. #34642

[8.x] increase performance of Str::before by over 60%. #34642

Conversation

lupinitylabs commented Oct 2, 2020 • edited Loading

Benchmarks

browner12 commented Oct 2, 2020

GrahamCampbell commented Oct 2, 2020

lupinitylabs commented Oct 2, 2020 • edited Loading

lupinitylabs commented Oct 2, 2020 • edited Loading

mfn commented Oct 3, 2020

lupinitylabs commented Oct 3, 2020 • edited Loading

mfn commented Oct 5, 2020

lupinitylabs commented Oct 2, 2020 •

edited

Loading

lupinitylabs commented Oct 2, 2020 •

edited

Loading

lupinitylabs commented Oct 2, 2020 •

edited

Loading

lupinitylabs commented Oct 3, 2020 •

edited

Loading