Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] increase performance of Str::before by over 60%. #34642

Merged
merged 2 commits into from
Oct 5, 2020

Conversation

lupinitylabs
Copy link
Contributor

@lupinitylabs lupinitylabs commented Oct 2, 2020

This PR proposes a change to the Str::before function that improves performance by over 60% compared to the old code (see benchmarks below). Even with an empty search string, the new code is slightly faster.

The string cast of the $search parameter is necessary to remain compatible with the tests/Support/SupportStrTest.php test, which tests for acceptance of integer values as $search parameter. To me, this is a little confusing, because the PHPdoc block declares $search as string. I didn't change this, but I think the type should be changed to mixed for $search, if this is really expected behavior.

Alternatively, the test could be changed, but since this would introduce a breaking change, this is not advisable.

As a side node, the new code would be another 10% faster without the string cast. Not using a shorthand if in the return line would further improve performance, but is slightly less readable.

Benchmarks

Current code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
0.95517802238464
0.96321487426758
0.95738410949707
0.95915079116821
0.96855711936951

New code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
0.58452105522156
0.58884692192078
0.5809018611908
0.58099007606506
0.58521318435669

Current code with empty search:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = ''; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
0.16867280006409
0.16797685623169
0.16919302940369
0.16927814483643
0.16803288459778

New code with empty search:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = ''; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
0.1334171295166
0.13449788093567
0.13518381118774
0.13393306732178
0.13339805603027

New code without string cast:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel'; $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, $search, true); if ($result === false) { $subject; } else { $result; } } } echo microtime(true)-$s . "\n"; }
0.50580191612244
0.50802898406982
0.51281690597534
0.52838587760925
0.51457190513611

@browner12
Copy link
Contributor

looks like you could get rid of the initial empty string check when switching to this function. strstr will throw a warning, but it does allow an empty needle, where explode does not.

$result = strstr($subject, (string) $search, true);

return $result === false ? $subject : $result;

and then couldn't we single line it?

return strstr($subject, (string) $search, true) ?: $subject;

@GrahamCampbell
Copy link
Member

Does this actually make things faster for real apps?

@lupinitylabs
Copy link
Contributor Author

lupinitylabs commented Oct 2, 2020

looks like you could get rid of the initial empty string check when switching to this function. strstr will throw a warning, but it does allow an empty needle, where explode does not.

$result = strstr($subject, (string) $search, true);

return $result === false ? $subject : $result;

Not a fan of having a warning thrown, tbh.

and then couldn't we single line it?

return strstr($subject, (string) $search, true) ?: $subject;

Consider this case:

$subject = '0x10'; 
$search = 'x'; 

strstr($subject, (string) $search, true) ?: $subject;

This will return 0x10, but it would be expected to return 0.

It would be possible to write it as

return ($result = strstr($subject, (string) $search, true)) === false ? $subject : $result;

but I refrained from doing that for increased readability.

@lupinitylabs
Copy link
Contributor Author

lupinitylabs commented Oct 2, 2020

Does this actually make things faster for real apps?

Depends on the app and the data, I would say. It will certainly make a difference iterating over very large datasets. I noticed it myself when optimizing such a script and found out that strstr was making a difference of several seconds in runtime.

And I was being kind with the benchmarks. To visualize how bad it can really get, imagine a case where the needle is in the haystack multiple times:

New code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '
/'; for ($i = 10000000;
0.56986904144287
0.57503294944763
0.57185816764832
0.57861685752869
0.5752580165863

Current code:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
20.957284927368
21.176103115082
20.958166122437
20.968662977219
21.130389928818

That is ~ 35x faster. So, yes, I would say it does have an impact in some cases. Frankly, I wouldn't want to leave the function that way, and I don't see a benefit of not patching it.

Same goes for large strings, which is a more realistic use-case:

Current:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'test/' . str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
8.6145751476288
9.2526910305023
8.4428129196167
9.2436029911041
8.5641269683838

New:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'test/' . str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true
)-$s . "\n"; }
0.57037281990051
0.5638210773468
0.56512999534607
0.56515693664551
0.57639598846436

And this is only a 10 kB string.

Of course, if there is a needle anywhere in the haystack, strstr will always win over explode. But even if the needle does not occur in the haystack, the new code is slightly faster:

Current:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
1.6434171199799
1.6519260406494
1.7897970676422
1.6361329555511
1.6360969543457

New:

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
1.4887380599976
1.4843230247498
1.483482837677
1.4862279891968
1.4922170639038

Even if this might be a little constructed and might not happen very often in a real-world application, as a developer I would hope that the Laravel helpers would not only offer a clean and pleasant interface, but also strive for the best possible solution under the hood. I like the idea of having the peace of mind that Laravel is not only a joy to use, but it does the job in the most performant way possible.

@mfn
Copy link
Contributor

mfn commented Oct 3, 2020

I wonder how

strtok('before?after', '?')

fares with the micro benchmarks.

Same behvaiour was with strstr, if not found it returns false

@lupinitylabs
Copy link
Contributor Author

lupinitylabs commented Oct 3, 2020

I wonder how

strtok('before?after', '?')

fares with the micro benchmarks.

Same behvaiour was with strstr, if not found it returns false

Slightly worse than strstr for small strings and MUCH worse than explode on long strings.

>>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strtok($subject, (string) $search); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
0.70224785804749
0.70644283294678
0.70995903015137
0.70787596702576
0.70642018318176
>>> $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strtok($subject, (string) $search); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n";
77.940736055374

@mfn
Copy link
Contributor

mfn commented Oct 5, 2020

Thank you! => 🗑️ 😄

@taylorotwell taylorotwell merged commit 576dd3d into laravel:8.x Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants