-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.2] Speed up chunk for large sets of data #12861
Conversation
This only works if we are actually ordering by id. |
Also, the each function is still forced to use the poor performance chunk method. |
Yeah, it's def a pretty specific use case (large set, ordered by id), but that might be common enough. I mean, for people using an auto incrementing id, using a Fwiw, for my needs i ended up implementing a method using |
Yeah it's common enough to order by ID that it's probably worth the improvement here. Thanks! |
Ok, great! So far i've kept the changed functionality completely separated in a Come to think of it, i guess Sound good? |
I would just keep them separate how you have them now. |
I just merged this but noticed the Query builder version is all messed up. The query builder doesn't return a collection at all in 5.2. I guess you only wrote test for Eloquent version? |
Yeah, sorry! I was expecting to need to put a little more work in -- should've put a 'WIP' in the PR title or something. This evening I can send another PR to polish up the query builder side and add tests. |
|
Could there be a similar thing for pagination? |
@@ -1315,6 +1315,23 @@ public function forPage($page, $perPage = 15) | |||
} | |||
|
|||
/** | |||
* Set the limit and query for the next set of results, given the | |||
* last scene ID in a set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scene or seen?
@kevindoole Did you perhaps also benchmark chunking with a |
I've been working on a project that involves sifting through some pretty large databases, and the query builder's chunk method was not really doing it for me because it slowed down so much as it worked through the table.
As chunk gets deeper and deeper into a table, it asks MySQL to count through more and more rows. When chunk says to MySQL,
select whatever from wherever limit 800000, 1000
MySQL counts through 800,000 rows without using an index or nuthin. Then, the next page, it again counts through all previous rows to find the next set. Poor MySQL... It gets really really slow (details below).I realize this may be too much an edge case, so didn't want to spend too much time working on the code; just looking for feedback at this point.
If you're still reading, here are some scrappy benchmarks. :)
After 4 seconds, chunk is very optimistic:
187500/862443 [▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░] 21% 4 secs/18 secs 10.0 MiB
However, in the end the process has taken substantially longer:
862443/862443 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 2 mins/2 mins 10.0 MiB
As you can imagine, the problem gets progressively worse as the database gets larger.
If we instead query by
'id' > $theLastIdFromTheLastSet
, the chunking is much faster.Again, after 4 seconds:
387000/862443 [▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░] 44% 4 secs/9 secs 10.0 MiB
And the same set takes only 21s in the end.
862443/862443 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 21 secs/21 secs 10.0 MiB