-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.7] Optimize Collection::mapWithKeys #24656
Conversation
This would change the behavior from a merge to an append. Right now it's possible for a later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for overwriting keys.
Beware of numeric keys, and order of overwriting. Working solutions might be: $result = $callback($value, $key) + $result;
$result = array_replace($result, $callback($value, $key)); Note So, on the principle, why not changing to the |
Refs #16564 and #16552 (comment) The proposals here seem to support "callback returning several rows" as well. |
More importantly: #16552 (comment)
That would be quite a subtle BC break... I haven't checked the proposal here about this. And tests should be added for this, if there are not yet. |
The added test seems to be worth it, but I guess your new code is slower than current code, isn't it? |
@vlakoff @tillkruss Added tests for overwriting keys. Benchmark (100,000 iterations):
28 - 35% now. |
@vlakoff I made some changes, so my new code is faster than the previous code. But there is more memory usage. |
I see about memory usage. This should be measured as well. With your code here, we get a huge Have you tried my simple |
While the current implementation says "The callback should return an associative array with a single key/value pair", it supports a callback returning multiple entries. Is this something that is still supported, or are we dropping that support? |
https://gist.github.com/dmitrybubyakin/6bdcc7c0ee7eaffdae699d61017214ee
|
I tried to reproduce the performance increase, but on my machine your code is actually a bit slower... $nb = 100;
$data = range(1, 1000);
$callback = function ($value, $key) {
return [$key => $value];
};
$t1 = microtime(true);
for ($ii = $nb; $ii--; ) {
$result = [];
foreach ($data as $key => $value) {
$assoc = $callback($value, $key);
foreach ($assoc as $mapKey => $mapValue) {
$result[$mapKey] = $mapValue;
}
}
}
$t2 = microtime(true);
for ($ii = $nb; $ii--; ) {
$values = array_map($callback, $data, array_keys($data));
$result = array_replace(...$values);
}
$t3 = microtime(true);
echo $t2 - $t1;
echo "\n";
echo $t3 - $t2;
echo "\n\n";
echo $t2 - $t1 == 0 ? 'N/A' : ($t3 - $t2) * 100 / ($t2 - $t1); Both methods seem to be O(n). |
@vlakoff on my machine (php 7.2.6):
|
Yeah, I had tested on an old PHP 5 box. On PHP 7, both codes are much faster, and yours seems to be faster indeed. |
Again on my box, above a certain amount of items, your code gets suddenly slower. I guess there is some memory exhaustion, which breaks scalability. |
I can confirm on 3v4l.org that on large data (try 10000, 50000...), your code performs slower and uses more memory. There seems to be a threshold at which execution speed seems to fall down, and memory usage difference seems to gradually increase with data size. For me scalability is the most important by far, even if it means slower (yet acceptable) execution speed for smaller datasets. Memory usage shouldn't be neglected too. |
Just as a reminder, some possible subtle BC breaks have been highlighted on this thread, and unit tests may be added for these. |
@dmitrybubyakin I suggest you open a new PR, for just adding this unit test concerning numeric keys overwriting. The other cases I mentioned (callback returning multiple rows, preservation of array order) are actually already covered by tests. |
@vlakoff Ok, I'll submit a new PR. |
Benchmark (100,000 iterations):
https://gist.github.com/dmitrybubyakin/6bdcc7c0ee7eaffdae699d61017214ee
15-25% faster.