-
-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized count queries by removing all joins if there are no where #2210
Optimized count queries by removing all joins if there are no where #2210
Conversation
How to test this? |
Here @colinmollenhour said the issue is not confirmed #1335 (comment). If we can have some clues about testing we can check it. |
The code can be confirmed to work by just visiting the various grids and viewing the counts to ensure they are correct. The originally reported issue is what I could not reproduce before this patch or after. However, it makes a huge performance difference on large tables when joins are used. The proper way to add columns in the grid is to add them statically to the _grid table but many times people don't or can't and this helps a lot. May also help in other areas like APIs, reporting, any third party modules that use joins in grids, etc. |
I was thinking about the splitting the query with AND/OR, what if that query has a and/or within quotes like in part of a text? isn't it a bit dangerous to split that way? I don't know if it could be done in another way. |
what about group by and having ? should it also be taken into account? |
The parentheses are stripped out so it is really just looking for a match of the table name anywhere in the where clause so I think nested conditions should be fine.
Group by is already considered on line 245. I haven't taken "having" into account.. I don't think I've ever seen that used in a grid collection but it could be added easily to line 245. EDIT: Having shouldn't be used without group by anyway so I think it is implicitly handled in line 245, right? |
Not sure if "having" was required ... adding product SKUs to Sales Order Grid? |
Yes but I believe "having" can only be used in conjunction with "group by" which is already considered so "having" is not an issue. |
…auses using joined tables and all joins are left joins.
6d33cbb
to
d4a3805
Compare
I tested with customers grid with our million customers. Line 266 of the PR, I added
It looks good... but without the PR the webpage doesn't take 6s to load. I profiled the customers grid with Blackfire without the PR, and the SQL used for counting is: So I think there is an optimisation elsewhere. |
so this PR adds 6 seconds to your query? auch... well.. |
I think he's saying that when he does a real page load without the PR the page is loading in less than 6 seconds so he thinks there must be an optimization somewhere else already making this one have no effect in this case. The unoptimized join query is definitely slower than the optimized query. |
Yes, he said that. It seems that for customer grid the PR is useless, because the SQL request is optimized by someone else. |
@luigifab Can you provide some more info here? I'm struggling to see how that is possible.. Also is it the case with other grids? However, I'm not sure it matters, this is still a good optimization that could benefit other areas besides just grids. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a way to test this with a custom table:
$result = Mage::getResourceModel('finance/ledger_collection')
->joinBillingData() // 3 left joins
->addFieldToFilter('main_table.name', 'prepaid')
->getSize();
It's about 60% faster with the PR.
👍 Can we replace |
For what reason? As we say in the South, "If it ain't broke, don't fix it." 😄 |
|
I can't imagine it making any difference in this case, there will never be more than a few elements in the arrays being filtered (number of tables and where clauses is "fixed" and "small") and this code runs once per collection so in most cases just once per page. So since there is no performance gain to be had beyond a few nanoseconds, the argument as to which is better highly subjective.. Given the code as written is already highly tested I don't think it it worthwhile to rewrite, but I do appreciate your enthusiasm for optimizing performance. 👍 |
@fballiano wtf? Is this you playground now? At other PRs we discuss about "performance", but here ... 👎 |
it was approved, and I though @colinmollenhour explanation was enough, I genuinely didn't think you have this much problem with a foreach. |
"it was approved", but it had open questions ....
Maybe you dont see it, but with faster progress a lot of errors were introduced. Simple PRs got approved/merged without being tested by contributor and/or reviewers .... Is this they way you want to work? |
this PR is like 10 months old, the moment it's decided to merge you complain about a minor thing, I wait for a long explanation by the author but still is not good enough for you. anyway, I've said it since the first day: openmage is not magento1, we don't have the manpower to do things in that same way, we'll do our best but I'll not babysit a dieing project which never want to change anything because "maybe a foreach is better than array_filter". the benefits we had in the last couple of years are huge, also if we had a few problems, wth, we're held to a standard which is impossible for projects with 100 times the manpower, it does not make any sense. we're in a "-rc" phase since how many months now? doesn't seem to me we're recklessly releasing whatever... last thing: please rebase your PRs, if I do it I can do it only squashing commits and you will for sure find a reason to complain about that too. |
While I have not had too much time lately to contribute, I have to step in here and defend @fballiano. As he said, it is not right for someone to have no participation in this PR for nearly a year, and only after two approvals are left, expect to be able to block a PR over such a minor thing such as foreach. First, there is no speed benefit to changing it to a foreach in this case. Are we forbidden from ever using array_filter in openmage? If that's someone's opinion, they they should make a code style RFC that says so. Second, since this is Colin's PR, he is free to keep it as he wrote it. That is exactly what PR authors get to do. Person makes PR, two people approve it, and it gets merged. Or people don't approve it and it doesn't get merged... |
You are mixing things up! Of course i know Collins work, but he isn't perfect too. Asking for I recently changed WHY do introduce new code, knowing we can do better? |
Lets do some math: 20 - number of array elements in this case. 0.069692134857178 - 0.018320083618164 = 0.05137205123 (delta per 1000000 elements) 0.05137205123 / 1000000 * 20 = 0.00000102744s So the performance gain, in this case, is 0.00000102744 seconds. In contrast, this patch is often improving the page load by a few seconds. My take: The time we would spend reviewing and testing the patch after changes would be much better used by spending on other patches. |
how many times did you answer "will do it in another PR" just to get something merged and then improve it later? I'll answer: many, many times. so why the heck you want to ruin it for everybody now? some things takes us ages to get reviewed and approved, so:
when we merge something that's not perfect (although obviously we all try not to be reckless) we are improving, showing that we are alive, creating a discussion about a new feature. maybe it generates a bug? ok, whatever, we will fix it together (what the heck is a community project anyway? and lately we're extremely responsive with bug reports). since we're not a corporation we can't do otherwise anyway. |
In the history of OpenMage, many times it happened to merge a PR and make subsequent changes or even revert it. We should implement a rule to avoid such contradictory discussions. In the present case, the merging process has been completed. I agree optimization of the code to save execution time was requested before but it got an answer from the author. If someone wants it please open a PR to be discussed, tested and approved. In my opinion, nothing can be perfect in the material world and I guess that many times it doesn't work out for God either. |
I found a small issue, if there are any bound parameters used in the removed Found this when doing a clean install just now, a data upgrade script in magento-lts/app/code/core/Mage/Directory/Model/Resource/Region/Collection.php Lines 63 to 68 in e3c88f1
|
To replicate error reported by @elidrissidev $collection = Mage::getResourceModel('directory/region_collection');
$collection->addBindParam('id', 'MY');
$collection->getSelect()->where('main_table.country_id = :id');
$collection->getSize(); // throws SQLSTATE[HY093]: Invalid parameter number: number of bound variables does not match number of tokens No error when inserting |
An ugly patch: // insert before the PR Code
if (count($this->_bindParams)) {
return $countSelect;
}
// ... PR code ... May be @colinmollenhour has a better fix. |
Actually, that seems to be a second issue. The bound params are also not cleared when a removed For my original issue, the code below seems is enough to reproduce it, but only in PHP 8.x, with PHP 7.4 it doesn't throw any error for me 🧐 Mage::getResourceModel('directory/region_collection')->getSize(); |
…es bind parameters in join clauses. Refs OpenMage#2210
I pushed #3279 which I believe is all that is needed.
There are no where conditions removed by this optimization.. The reason that |
@colinmollenhour I didn't search, and I didn't have more info except the test I did, but yes this is a good optimisation, just not sure how to show it, but now @kiatng found a way. |
…auses using joined tables and all joins are left joins.
This has been used for a few years now, first mentioned to OpenMage at #1335 (comment).
Joins are absolute performance killers for large count queries in some cases so this can make loading some grids much faster.