-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[8.x] increase performance of Str::before by over 60%. #34642
Conversation
looks like you could get rid of the initial empty string check when switching to this function. $result = strstr($subject, (string) $search, true);
return $result === false ? $subject : $result; and then couldn't we single line it? return strstr($subject, (string) $search, true) ?: $subject; |
Does this actually make things faster for real apps? |
Not a fan of having a warning thrown, tbh.
Consider this case: $subject = '0x10';
$search = 'x';
strstr($subject, (string) $search, true) ?: $subject; This will return 0x10, but it would be expected to return 0. It would be possible to write it as return ($result = strstr($subject, (string) $search, true)) === false ? $subject : $result; but I refrained from doing that for increased readability. |
Depends on the app and the data, I would say. It will certainly make a difference iterating over very large datasets. I noticed it myself when optimizing such a script and found out that strstr was making a difference of several seconds in runtime. And I was being kind with the benchmarks. To visualize how bad it can really get, imagine a case where the needle is in the haystack multiple times: New code: >>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '
/'; for ($i = 10000000;
0.56986904144287
0.57503294944763
0.57185816764832
0.57861685752869
0.5752580165863 Current code: >>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
20.957284927368
21.176103115082
20.958166122437
20.968662977219
21.130389928818 That is ~ 35x faster. So, yes, I would say it does have an impact in some cases. Frankly, I wouldn't want to leave the function that way, and I don't see a benefit of not patching it. Same goes for large strings, which is a more realistic use-case: Current: >>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'test/' . str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
8.6145751476288
9.2526910305023
8.4428129196167
9.2436029911041
8.5641269683838 New: >>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'test/' . str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true
)-$s . "\n"; }
0.57037281990051
0.5638210773468
0.56512999534607
0.56515693664551
0.57639598846436 And this is only a 10 kB string. Of course, if there is a needle anywhere in the haystack, Current: >>> for ($a = 5; $a--;) { $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { $search === '' ? $subject : explode($search, $subject)[0]; } echo microtime(true)-$s . "\n"; }
1.6434171199799
1.6519260406494
1.7897970676422
1.6361329555511
1.6360969543457 New: >>> for ($a = 5; $a--;) { $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strstr($subject, (string) $search, true); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
1.4887380599976
1.4843230247498
1.483482837677
1.4862279891968
1.4922170639038 Even if this might be a little constructed and might not happen very often in a real-world application, as a developer I would hope that the Laravel helpers would not only offer a clean and pleasant interface, but also strive for the best possible solution under the hood. I like the idea of having the peace of mind that Laravel is not only a joy to use, but it does the job in the most performant way possible. |
I wonder how strtok('before?after', '?') fares with the micro benchmarks. Same behvaiour was with |
Slightly worse than >>> for ($a = 5; $a--;) { $s = microtime(true); $subject = 'vendor/laravel/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/b/b//b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b'; $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strtok($subject, (string) $search); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n"; }
0.70224785804749
0.70644283294678
0.70995903015137
0.70787596702576
0.70642018318176 >>> $s = microtime(true); $subject = str_repeat('A', 10000); $search = '/'; for ($i = 10000000; $i--;) { if ($search === '') { $subject; } else { $result = strtok($subject, (string) $search); $result === false ? $subject : $result; } } echo microtime(true)-$s . "\n";
77.940736055374 |
Thank you! => 🗑️ 😄 |
This PR proposes a change to the
Str::before
function that improves performance by over 60% compared to the old code (see benchmarks below). Even with an empty search string, the new code is slightly faster.The string cast of the
$search
parameter is necessary to remain compatible with thetests/Support/SupportStrTest.php
test, which tests for acceptance of integer values as$search
parameter. To me, this is a little confusing, because the PHPdoc block declares$search
asstring
. I didn't change this, but I think the type should be changed tomixed
for$search
, if this is really expected behavior.Alternatively, the test could be changed, but since this would introduce a breaking change, this is not advisable.
As a side node, the new code would be another 10% faster without the string cast. Not using a shorthand if in the return line would further improve performance, but is slightly less readable.
Benchmarks
Current code:
New code:
Current code with empty search:
New code with empty search:
New code without string cast: