Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea] Implement using Iterators instead of arrays #1

Open
lovenunu opened this issue Nov 17, 2016 · 3 comments
Open

[Idea] Implement using Iterators instead of arrays #1

lovenunu opened this issue Nov 17, 2016 · 3 comments

Comments

@lovenunu
Copy link

Hello,

I think we can save some time and memory on large data sets by using iterators instead of arrays here:
https://github.com/functional-php/parallel/blob/master/src/functions.php#L75

If we provide an iterator (from a DB statement for example), we have to traverse it once in the iterator_to_array statement, then array_chunk make use traverse the data set once more.

Using iterators will not be a problem with array, as we already have \ArrayIterator class in SPL.

Moreover, instead of traversing two arrays here: https://github.com/functional-php/parallel/blob/master/src/functions.php#L59 , we could only traverse it once and apply the two functions in the same time.

BTW, a major part of the problem has already been solved in this (great) lib from nikic: https://github.com/nikic/iter/ (chunk, map and filter).

In the worst case, we could save 3 of 4 iterations. :-)

What do you think of this idea ? :-)

@krtek4
Copy link
Member

krtek4 commented Feb 9, 2017

Hi there,

Sorry for the really late answer, github seems to have forgotten to send me a notification about this issue ;)

I am not sure I understand your idea. The _chunk function is called just before the work is distributed to the various threads, so we need to split the iterator in multiple ones, each one containing some part of the original. I am not sure this is possible in PHP, but I might be wrong.

I am aware of iter but it solves a different issue and I don't see how it could be leverage here.

I am all in for a performance amelioration as it is the main point of this library, but it seems I am missing the point of your idea, sorry :/

@NeoVance
Copy link

I think he is actually suggesting using Generators rather than making a copy of the input as chunks.

function fold($threads, callable $callable, $collection, $initial)
{
    $func = function(Generator $chunk) use($callable, $initial) {
        $next = $initial;
        foreach ($chunk as $element) {
            $next = $callable($next, $element);
        }
        return $next;
    };
    $results = _parallel($threads, $func, $collection);
    return $func($results);
}

function &_generator(&$input, $start, $end)
{
    for ($j = $start; $j < $end; $j++)
         yield $input[$j];
    }
};

function _chunks(&$input, $size)
{
    $chunks = [];
    $chunkSize = ceil(count($input) / $size);
    for ($i = 0; $i < $size; $i++) {
        $start = $i * $chunkSize;
        if ($i === $size - 1;) {
            $end = count($input);
        } else {
            $end = $start + $chunkSize;
        }
        $chunks[] = _generator($input, $start, $end);
    }
    return $chunks;
}

@krtek4
Copy link
Member

krtek4 commented Oct 17, 2018

Ok, I get it.

I am not quite sur I want to introduce generators in there, because it adds complexity and there's not proven gain right now. This is exactly the kind of optimization that should be done by PHP itself in my opinion.

Also your code uses generators, but it creates new array also, _chunks will return arrays as it it the case actually, so there won't be any gain here.

If you want to go forward with the idea, can I propose you to make a PR and provide some kind of benchmark proving it is more efficient ?

I don't have much time to work on this lib right now, so it will be a great help.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants