Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.1] Fix for chunk() #9681

Closed
wants to merge 2 commits into from
Closed

[5.1] Fix for chunk() #9681

wants to merge 2 commits into from

Conversation

arrilot
Copy link
Contributor

@arrilot arrilot commented Jul 18, 2015

Closes #9649

This one is relatively hard to review. Sorry about that.

As you can see in the issue limits and offsets are completely ignored if chunk() is used instead of get().
This happens because chunk overrides those properties internally.

Although it's a rare usecase, in my opinion it's a bug and should be fixed in 5.1 therefore.

@taylorotwell
Copy link
Member

Chunk is meant to override limits and offsets. The original issue you posted shouldn't even be using chunk.

@arrilot
Copy link
Contributor Author

arrilot commented Jul 19, 2015

Okay, you can treat this PR as a proposal then.

Why throwing away limits and offsets is any better than actually handling them like get() does?
Are you sure noone will ever want to do smth like Model::skip(100000)->take(500000)->chunk(100, ...)?

@kidtronnix
Copy link

@taylorotwell Why is 'chunk is meant to override limits and offsets'?

I can't see any reason why there is the necessity for an intentional overwrite.

@Shkeats
Copy link

Shkeats commented Jul 31, 2015

@taylorotwell I also can't see any reason why chunk shouldn't internally respect a limit and offset that has already been set. It's not clear that chunk is meant to override limits and offsets. It seems to me that rather it happens to override limits and offsets because of how it's implemented. I think it's a valid usecase to want to chunk through a limited or offset number of records.

Say I want to do some processing on 40,000 records with 4 worker threads for jobs and I want them to do 10,000 records each in batches of 500, currently you need to calculate the limits and offsets yourself, set up a loop and call the queries incrementing the skip() and take() as you go.

Chunk could handle this easily if I could run the following on each worker.

-1 Model::take(10000)->chunk(500,..)
-2 Model::skip(10000)->take(10000)->chunk(500,..)
-3 Model::skip(20000)->take(10000)->chunk(500,..)
-4 Model::skip(30000)->take(10000)->chunk(500,..)

I think the above lines are conceptually sound and very readable.
"With this model, skip the first 10,000 rows, take the next 10,000 in batches of 500 passing them to this closure."

@kidtronnix
Copy link

^ +1

@arrilot arrilot deleted the chunk_fixes branch August 19, 2015 17:01
@arrilot arrilot restored the chunk_fixes branch August 19, 2015 17:01
@WillMac29
Copy link

@taylorotwell
Model::skip(10000)->take(10000)->chunk(500,..)
I actually just wrote this code intuitively hoping it would do what is indicated above.
It wasn't working so I decided to do some searching and I came across this pull request. Maybe another method or optional override parameter of the default behavior of chunk would be nice.

^ +1

@ammont
Copy link

ammont commented May 23, 2016

^ +1

@hhsadiq
Copy link

hhsadiq commented Jan 2, 2018

Any workaround for this?

@NicksonYap
Copy link

+1

we need this!

@NicksonYap
Copy link

Work around it to just count the number of loops

$skip_amount = 100
$done_amount = 0

DB::table('users')->orderBy('id')->chunk(100, function ($users) use($skip_amount, &$done_amount ) {
    foreach ($users as $user) {
        $done_amount += 1;
         if($done_amount >=  $skip_amount){
             //
         }
    }
});

@jerrebm
Copy link

jerrebm commented May 4, 2018

Here's an Eloquent workaround:

Model::skip(100)->take(100)->get()->chunk(10)->each(function ($chunk) {
    // chunk
});

This will not chunk the Queries, but it does allow you to chunk the results.

@chriscalip
Copy link

chriscalip commented Jan 4, 2019

Here's an Eloquent workaround:

Model::skip(100)->take(100)->get()->chunk(10)->each(function ($chunk) {
    // chunk
});

This will not chunk the Queries, but it does allow you to chunk the results.

At the point of ->get() we still get a large collection of models. Could be troublesome..
An alternative would be first to get an array of model Ids then generate an array of batch Sets based on the array chunk of those model Ids. For each subset do a Model find by Ids.

$idModels = [..];
// where N is an integer representing desired number of elements in a subset.
$throughput = N;
$batchSets = array_chunk($idModels, $throughput);
unset($idModels);
$idModels = $batchSets[0];
$models = Model::find($idModels);
foreach ($batchSets as $setId => $idModels) {
  $models = Model::find($idModels);
}

@Stevemoretz
Copy link

Just was reading this, for future readers : none of the above workarounds are good, don't use them, they all have performance issues, if you want to do something like that just re-implement the chunk yourself with limit, skip and a simple loop based on the total count.

@DrDynamic
Copy link

@taylorotwell Model::skip(10000)->take(10000)->chunk(500,..) I actually just wrote this code intuitively hoping it would do what is indicated above. It wasn't working so I decided to do some searching and I came across this pull request. Maybe another method or optional override parameter of the default behavior of chunk would be nice.

^ +1

The same happened to me right now. I'd really love to have this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.