-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Idea] Implement using Iterators instead of arrays #1
Comments
Hi there, Sorry for the really late answer, github seems to have forgotten to send me a notification about this issue ;) I am not sure I understand your idea. The I am aware of I am all in for a performance amelioration as it is the main point of this library, but it seems I am missing the point of your idea, sorry :/ |
I think he is actually suggesting using function fold($threads, callable $callable, $collection, $initial)
{
$func = function(Generator $chunk) use($callable, $initial) {
$next = $initial;
foreach ($chunk as $element) {
$next = $callable($next, $element);
}
return $next;
};
$results = _parallel($threads, $func, $collection);
return $func($results);
}
function &_generator(&$input, $start, $end)
{
for ($j = $start; $j < $end; $j++)
yield $input[$j];
}
};
function _chunks(&$input, $size)
{
$chunks = [];
$chunkSize = ceil(count($input) / $size);
for ($i = 0; $i < $size; $i++) {
$start = $i * $chunkSize;
if ($i === $size - 1;) {
$end = count($input);
} else {
$end = $start + $chunkSize;
}
$chunks[] = _generator($input, $start, $end);
}
return $chunks;
} |
Ok, I get it. I am not quite sur I want to introduce generators in there, because it adds complexity and there's not proven gain right now. This is exactly the kind of optimization that should be done by PHP itself in my opinion. Also your code uses generators, but it creates new array also, If you want to go forward with the idea, can I propose you to make a PR and provide some kind of benchmark proving it is more efficient ? I don't have much time to work on this lib right now, so it will be a great help. Best, |
Hello,
I think we can save some time and memory on large data sets by using iterators instead of arrays here:
https://github.com/functional-php/parallel/blob/master/src/functions.php#L75
If we provide an iterator (from a DB statement for example), we have to traverse it once in the
iterator_to_array
statement, thenarray_chunk
make use traverse the data set once more.Using iterators will not be a problem with array, as we already have
\ArrayIterator
class in SPL.Moreover, instead of traversing two arrays here: https://github.com/functional-php/parallel/blob/master/src/functions.php#L59 , we could only traverse it once and apply the two functions in the same time.
BTW, a major part of the problem has already been solved in this (great) lib from nikic: https://github.com/nikic/iter/ (chunk, map and filter).
In the worst case, we could save 3 of 4 iterations. :-)
What do you think of this idea ? :-)
The text was updated successfully, but these errors were encountered: