[5.7] Optimize Collection::mapWithKeys #24656

dmitrybubyakin · 2018-06-21T16:24:48Z

Benchmark (100,000 iterations):

Old method: 4.133s
New method: 3.175s
---
Old method: 3.754s
New method: 2.975s
---
Old method: 3.773s
New method: 3.246s

https://gist.github.com/dmitrybubyakin/6bdcc7c0ee7eaffdae699d61017214ee

15-25% faster.

derekmd · 2018-06-21T19:39:14Z

This would change the behavior from a merge to an append. Right now it's possible for a later mapWithKeys() iteration to overwrite the key of a previous iteration. This would make the first keyed value to always be used.

tillkruss

Please add tests for overwriting keys.

vlakoff · 2018-06-22T02:04:16Z

Beware of numeric keys, and order of overwriting.

Working solutions might be:

$result = $callback($value, $key) + $result;
$result = array_replace($result, $callback($value, $key));

Note array_replace() seems to just be the same as + with reversed overwriting order, but probably significantly slower (function call overhead).

So, on the principle, why not changing to the + operator, if it's properly benchmarked and tested.

vlakoff · 2018-06-22T02:08:46Z

Refs #16564 and #16552 (comment)

The proposals here seem to support "callback returning several rows" as well.

vlakoff · 2018-06-22T02:17:01Z

More importantly: #16552 (comment)

Also take care of preserving the array order (not the same as the numerical indexes). To verify this, a foreach on the result should iterates the items in the same order.

That would be quite a subtle BC break... I haven't checked the proposal here about this. And tests should be added for this, if there are not yet.

vlakoff · 2018-06-22T05:35:51Z

The added test seems to be worth it, but I guess your new code is slower than current code, isn't it?

dmitrybubyakin · 2018-06-22T05:38:27Z

@vlakoff @tillkruss Added tests for overwriting keys.

Benchmark (100,000 iterations):

Old method: 3.782s
New method: 2.757s
---
Old method: 4.678s
New method: 3.075s
---
Old method: 4.173s
New method: 3.014s

28 - 35% now.

dmitrybubyakin · 2018-06-22T05:49:51Z

@vlakoff I made some changes, so my new code is faster than the previous code. But there is more memory usage.

vlakoff · 2018-06-22T05:57:00Z

I see about memory usage. This should be measured as well. With your code here, we get a huge array_replace() call with many arguments!

Have you tried my simple $result = $callback($value, $key) + $result;?

sisve · 2018-06-22T06:19:08Z

While the current implementation says "The callback should return an associative array with a single key/value pair", it supports a callback returning multiple entries. Is this something that is still supported, or are we dropping that support?

dmitrybubyakin · 2018-06-22T06:37:09Z

@vlakoff Yes, I have. $result = $callback($value, $key) + $result; is much slower than the old code:

Old method: 3.786s
New method: 27.701s

@sisve We are nothing dropping.

dmitrybubyakin · 2018-06-22T07:05:50Z

@vlakoff

https://gist.github.com/dmitrybubyakin/6bdcc7c0ee7eaffdae699d61017214ee

100000 iterations over 100 items, strlen 10:
Old method: time 4.066s, memory: 2.00Mb
New method: time 2.844s, memory: 2.00Mb
----
100000 iterations over 100 items, strlen 100:
Old method: time 4.356s, memory: 2.00Mb
New method: time 3.294s, memory: 2.00Mb
----
100000 iterations over 100 items, strlen 1000:
Old method: time 4.206s, memory: 2.00Mb
New method: time 3.136s, memory: 2.00Mb
----

100 iterations over 1000 items, strlen 1000:
Old method: time 45ms, memory: 4.00Mb
New method: time 34ms, memory: 4.00Mb
----
100 iterations over 10000 items, strlen 1000:
Old method: time 651ms, memory: 22.00Mb
New method: time 647ms, memory: 26.00Mb
----
100 iterations over 100000 items, strlen 1000:
Old method: time 10.186s, memory: 181.50Mb
New method: time 9.437s, memory: 221.51Mb
----

vlakoff · 2018-06-22T07:23:30Z

I tried to reproduce the performance increase, but on my machine your code is actually a bit slower...

$nb = 100;

$data = range(1, 1000);

$callback = function ($value, $key) {
    return [$key => $value];
};


$t1 = microtime(true);
for ($ii = $nb; $ii--; ) {

    $result = [];
    foreach ($data as $key => $value) {
        $assoc = $callback($value, $key);
        foreach ($assoc as $mapKey => $mapValue) {
            $result[$mapKey] = $mapValue;
        }
    }

}
$t2 = microtime(true);
for ($ii = $nb; $ii--; ) {

    $values = array_map($callback, $data, array_keys($data));
    $result = array_replace(...$values);

}
$t3 = microtime(true);

echo $t2 - $t1;
echo "\n";
echo $t3 - $t2;

echo "\n\n";
echo $t2 - $t1 == 0 ? 'N/A' : ($t3 - $t2) * 100 / ($t2 - $t1);

Both methods seem to be O(n).

dmitrybubyakin · 2018-06-22T07:29:05Z

@vlakoff on my machine (php 7.2.6):

0.01982307434082
0.012885093688965

65.000481093043%

vlakoff · 2018-06-22T07:33:07Z

Yeah, I had tested on an old PHP 5 box. On PHP 7, both codes are much faster, and yours seems to be faster indeed.

vlakoff · 2018-06-22T07:36:17Z

Again on my box, above a certain amount of items, your code gets suddenly slower. I guess there is some memory exhaustion, which breaks scalability.

dmitrybubyakin · 2018-06-22T07:43:22Z

@vlakoff

https://3v4l.org/YuZn8/perf#perfomance
https://3v4l.org/kone1/perf#perfomance

vlakoff · 2018-06-22T07:54:51Z

I can confirm on 3v4l.org that on large data (try 10000, 50000...), your code performs slower and uses more memory. There seems to be a threshold at which execution speed seems to fall down, and memory usage difference seems to gradually increase with data size.

For me scalability is the most important by far, even if it means slower (yet acceptable) execution speed for smaller datasets. Memory usage shouldn't be neglected too.

vlakoff · 2018-06-22T13:42:56Z

Just as a reminder, some possible subtle BC breaks have been highlighted on this thread, and unit tests may be added for these.

vlakoff · 2018-06-24T11:49:47Z

@dmitrybubyakin I suggest you open a new PR, for just adding this unit test concerning numeric keys overwriting.

The other cases I mentioned (callback returning multiple rows, preservation of array order) are actually already covered by tests.

dmitrybubyakin · 2018-06-24T12:41:12Z

@vlakoff Ok, I'll submit a new PR.

Optimize Collection::mapWithKeys

2e7280e

tillkruss suggested changes Jun 21, 2018

View reviewed changes

Dmitry Bubyakin added 3 commits June 22, 2018 07:27

Add tests for overwriting keys

c45be74

Refactor mapWithKeys

310be73

Fix

71e43e1

Dont call isEmpty every time

71d4b51

taylorotwell closed this Jun 22, 2018

dmitrybubyakin mentioned this pull request Jun 24, 2018

[5.7] Add test for overwriting keys (Collection::mapWithKeys) #24676

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5.7] Optimize Collection::mapWithKeys #24656

[5.7] Optimize Collection::mapWithKeys #24656

dmitrybubyakin commented Jun 21, 2018

derekmd commented Jun 21, 2018

tillkruss left a comment

vlakoff commented Jun 22, 2018

vlakoff commented Jun 22, 2018 •

edited

Loading

vlakoff commented Jun 22, 2018

vlakoff commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

vlakoff commented Jun 22, 2018 •

edited

Loading

sisve commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018 •

edited

Loading

dmitrybubyakin commented Jun 22, 2018 •

edited

Loading

vlakoff commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

vlakoff commented Jun 22, 2018 •

edited

Loading

vlakoff commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

vlakoff commented Jun 22, 2018 •

edited

Loading

vlakoff commented Jun 22, 2018

vlakoff commented Jun 24, 2018

dmitrybubyakin commented Jun 24, 2018

[5.7] Optimize Collection::mapWithKeys #24656

[5.7] Optimize Collection::mapWithKeys #24656

Conversation

dmitrybubyakin commented Jun 21, 2018

derekmd commented Jun 21, 2018

tillkruss left a comment

Choose a reason for hiding this comment

vlakoff commented Jun 22, 2018

vlakoff commented Jun 22, 2018 • edited Loading

vlakoff commented Jun 22, 2018

vlakoff commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

vlakoff commented Jun 22, 2018 • edited Loading

sisve commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018 • edited Loading

dmitrybubyakin commented Jun 22, 2018 • edited Loading

vlakoff commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

vlakoff commented Jun 22, 2018 • edited Loading

vlakoff commented Jun 22, 2018

dmitrybubyakin commented Jun 22, 2018

vlakoff commented Jun 22, 2018 • edited Loading

vlakoff commented Jun 22, 2018

vlakoff commented Jun 24, 2018

dmitrybubyakin commented Jun 24, 2018

vlakoff commented Jun 22, 2018 •

edited

Loading

vlakoff commented Jun 22, 2018 •

edited

Loading

dmitrybubyakin commented Jun 22, 2018 •

edited

Loading

dmitrybubyakin commented Jun 22, 2018 •

edited

Loading

vlakoff commented Jun 22, 2018 •

edited

Loading

vlakoff commented Jun 22, 2018 •

edited

Loading