Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Cursor on large number of results causing memory issues #14919

Closed
jamesryanbell opened this issue Aug 21, 2016 · 9 comments
Closed

Using Cursor on large number of results causing memory issues #14919

jamesryanbell opened this issue Aug 21, 2016 · 9 comments

Comments

@jamesryanbell
Copy link

jamesryanbell commented Aug 21, 2016

When using Cursors to loop through database results PHP runs out of memory on very large result sets, this is due to the default option of MYSQL_ATTR_USE_BUFFERED_QUERY being true.

This is a MySQL specific attribute available within PDO - http://php.net/manual/en/ref.pdo-mysql.php#pdo.constants.mysql-attr-use-buffered-query.

To prevent this I am currently this attribute manually via the code below but it would be good to have this as a default.

DB::connection()->getPdo()->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

Thanks

@themsaid
Copy link
Member

I don't think this can be considered as a framework bug, querying over large data sets might result memory problems, dealing with this situation differs from a situation to another, it's also highly related to the database driver being used.

I'm going to close this issue for now but please feel free to ping me if ever found a specific bug in the framework that causes such memory issue.

Thank you :)

@rickshawhobo
Copy link

@jamesryanbell Where do you see that the default is true? Is Laravel making it true by default. What did you end up doing? Is there any downside to making the default false?

@atodd
Copy link

atodd commented Apr 21, 2017

@jamesryanbell setting that attribute to false fixed my issue as well. Thanks!

@tmcgee123
Copy link

tmcgee123 commented Dec 6, 2017

@jamesryanbell setting this fixed my issue as well.

@themsaid Any time I've tried to use the cursor (only a few 100 results, 415 to be exact), I always get a 502 from the nginx. Using chunking, it takes too long and I get a 504 timeout. Although I cannot prove that this is indeed a framework bug, it appears as though something holds up the cursor between iterations when MYSQL_ATTR_USE_BUFFERED_QUERY is true (probably the buffer).

Either way, I do think it would be good to note in the documentation why this happens, because the data sets I am querying are not very large. Without the setting of the mysql attribute, cursor is useless for me in all applications.

The documentation is slightly misleading, as the chunking portion says "If you need to process thousands of Eloquent records,..." but the cursor portion states "When processing large amounts of data, the cursor method may be used to greatly reduce your memory usage". Are there any metrics for this so we can reliably set the attribute? I can interpret that thousands of records indicates a large count of rows, but what designates a large data set?

I've attached my code below for when I had to replace chunking with the cursor. On the first try without the mysql attribute, this endpoint would send back a 502 and there was quite a bit of lag in between cursor iterations. Upon changing the attribute as specified, the iterations went down to microseconds.

It would be good to know as we are utilizing laravel to build micro-services that need to scale reliably and some type of information would be better than none.

function getReport(Request $request, $idFkey)
{
    //had to add this for the cursor
    DB::connection()->getPdo()->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
    $entries = [];

    // get all order forms for a season
    $models = MyModel::select('id')
        ->where('idFkey', $idFkey)
        ->where('idFkey2', $request->get('Fkey2'))
        ->get();

    foreach ($models as $model) {
        // get all orders for those order forms
        foreach (MyModelItems::select('items', 'idFkey')
            ->where('relatedItems', 'LIKE', '%"idItem":'.$model->id.'%')
            ->where('status', '!=', 'transferred')
            ->cursor() as $modelItem) {
            // get all discount program entries for those orders
            $items = json_decode($modelItem->items);

            foreach ($items as $item) {

                if (property_exists($item, 'programs')) {
                    foreach ($item->programs as $program) {
                        $entries[] = [
                            'idModel'  => $item->idModel,
                            'idProgram'  => $program->id,
                            'idCustomer'  => $modelItem->idCustomer,
                            'programName' => $program->name,
                            'discount'    => $program->discount,
                            'quantity'    => $item->quantity,
                            'price'       => $item->price
                        ];
                    }
                } elseif (property_exists($item, 'idProgram') && !empty($item->idProgram)) {
                    $program = DiscountProgram::find($item->idProgram);

                    $entries[] = [
                        'idSolution'  => $item->idSolution,
                        'idDiscount'  => $program->id,
                        'idCustomer'  => $modelItem->idCustomer,
                        'programName' => $program->name,
                        'discount'    => $program->discount,
                        'quantity'    => $item->quantity,
                        'price'       => $item->price
                    ];
                }
            }
        }
    }

    return response(['data' => $entries]);

}

@coreation
Copy link

Thanks @jamesryanbell using that option indeed does the trick, otherwise PHP runs out of memory. Considering this as a "bug" would mean that the behaviour was not intended, as Laravel uses a default setting, it's probably not a bug, but more of suboptimal practice? As cursors are used to scroll through large amounts of results, the default option in the framework should however help as much as possible to allow this behaviour to occur... which in this case it doesn't @themsaid

@garygreen
Copy link
Contributor

garygreen commented Mar 14, 2019

@themsaid I believe this issue should be re-opened because ->cursor() is broken with large data sets at the moment, which in a way goes completely against why you would even use it in the first place.

If you can load all results in memory with buffering enabled, then you don't really need to use cursor at all. The only useful case for cursor at the moment is to use the results from the database quicker (due to using generators).

Laravel should fix this by using a new separate connection with buffering turned off, otherwise ->cursor() will always give memory problems, even on relatively small datasets. See here for more info: http://php.net/manual/en/mysqlinfo.concepts.buffering.php

Following these characteristics buffered queries should be used in cases where you expect only a limited result set or need to know the amount of returned rows before reading all rows. Unbuffered mode should be used when you expect larger results.

@mfn
Copy link
Contributor

mfn commented Mar 14, 2019

Laravel should fix this by using a new separate connection with buffering turned off

But this very MySQL specific and real cursors in other DBs work differently (e.g. in Postgres they require a transaction), so it's not that easy.

@garygreen
Copy link
Contributor

@mfn Ah true. Would you happen to know how in Laravel you can create a new connection instance and use setAttribute to turn buffering off (just for MySQL) - then to use that new connection with the query builder?

I've been digging thru the core trying to work out how to do it but it doesn't appear possible.

@garygreen
Copy link
Contributor

garygreen commented Mar 14, 2019

For anyone looking at this issue, I've come up with some steps to solve this:

Fix for ->cusor() when using MySQL and getting out of memory issue

All you need to do is create a dedicated mysql connection settings which turns buffering off:

config/database.php

'mysql-unbuffered' => [
	'driver'    => 'mysql',
	'host'      => env('DB_HOST'),
	'database'  => env('DB_DATABASE'),
	'username'  => env('DB_USERNAME'),
	'password'  => env('DB_PASSWORD'),
	'charset'   => 'utf8mb4',
	'collation' => 'utf8mb4_unicode_ci',
	'prefix'    => '',
	'strict'    => false,
	'options'   => [
		PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false
	],
],

Then in your code whenever you want to use ->cursor() over large data sets:

foreach (DB::connection('mysql-unbuffered')->table('users')->cursor() as $user) {
    // ....
}

You can also change the connection when using Eloquent models using ->on()

foreach (User::on('mysql-unbuffered')->cursor() as $user) {
    // ....
}

The reason why you need to use a separate connection is because in MySQL you cannot mix buffered and unbuffered connection, so you may get a warning:

SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active.

So it's important to use a separate connection when using unbuffered queries.

Hopefully that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants