Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow reindexing when using a MariaDb platform. #1621

Closed
reviskar opened this issue Nov 15, 2019 · 19 comments
Closed

Extremely slow reindexing when using a MariaDb platform. #1621

reviskar opened this issue Nov 15, 2019 · 19 comments

Comments

@reviskar
Copy link

The issue only occurs when using a mariaDb platform, and I've found that it only happens with catalog_product_entity_int reindexing in the catalogsearch_fulltext indexer.
Because it works in a loop, the effective sync speeds take over an hour to finish in our production server, meanwhile are done in under a minute in a local machine.

Preconditions

MariaDB: 10.2.26

Magento Version : 2.3.2

ElasticSuite Version : 2.8.1/2.8.3

Steps to reproduce

  1. Run $ bin/magento indexer:reindex catalogsearch_fulltext
  2. If using mariaDb 10.2.26 grab some popcorn

Expected result

  1. Normal EXPLAIN when using a MySql 5.7 system:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE entity NULL range PRIMARY PRIMARY 4 NULL 1000 100.00 Using where; Using index
1 SIMPLE t_default NULL ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 4 rekaubamaja.entity.entity_id 21 39.13 Using index condition
1 SIMPLE t_store NULL eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 magento_table.entity.entity_id,magento_table.t_default.attribute_id,const 1 100.00 NULL

3 rows (0.008 s),

Actual result

  1. Using MariaDb 10.2.* system:
id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 SIMPLE entity range PRIMARY PRIMARY 4 NULL 1000 1000.00 100.00 100.00 Using where; Using index
1 SIMPLE t_default range CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 NULL 8000 3017.00 0.12 0.10 Using index condition; Using where
1 SIMPLE t_store eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 magento_table.entity.entity_id,magento_table.t_default.attribute_id,const 1 0.66 100.00 100.00  

3 rows (16.280 s),

Analysis

The problem lies somewhat in MariaDb failing to use the primary key index instead of the unique one in catalog_product_entity_int table, but I think the main issue is in the non optimal SQL query executed.

\Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Fulltext\Datasource\AbstractAttributeData::getAttributesRawData finds attributes using the entity_id's, and for some reason selects them from the main table first only to join the entity tables by the entity_id value.

I've fixed our problem by ditching the Inner join done here and selecting all Id's straight from the t_default (catalog_product_entity_int) table.

So instead of doing ...

        $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
            ->joinInner(
                ['t_default' => $tableName],
                new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}"),
                ['attribute_id']
            )
            ->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
            ->where('t_default.store_id=?', 0)
            ->where('t_default.attribute_id IN (?)', $attributeIds)
            ->where("entity.{$entityIdField} IN (?)", $entityIds)
            ->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);

... the query works perfectly well written as:

        $select->from(['t_default' => $tableName], [$entityIdField])
            ->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
            ->where('t_default.store_id=?', 0)
            ->where('t_default.attribute_id IN (?)', $attributeIds)
            ->where("t_default.{$entityIdField} IN (?)", $entityIds)
            ->columns(['t_default.attribute_id', 'value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);

I would suggest fixing this query :)

@romainruaud
Copy link
Collaborator

@reviskar for me your query will not work under a Magento Commerce because you get rid of the new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}") which is the only way to properly match the list of $entityIds when using Magento Commerce (and having the staging modules enabled) because there is no entity_id column in the catalog_product_entity_* tables.

@reviskar
Copy link
Author

True, did not see that as a possibility. We have entity_id fields in our entity table so the fix works well for us at the moment. But as to solving the issue, I'm not sure what the best approach would be here. I would imagine it's doable either way without the double-join.

@romainruaud
Copy link
Collaborator

I'm not sure the double-join can be avoided because catalog_product_entity table is the only location where you can fetch the row_id <-> entity_id mapping if using Magento Commerce.

@romainruaud
Copy link
Collaborator

@reviskar do you have any new insights on this one ? did you try to upgrade MariaDB to another version ?

I saw some issues on the MariaDB tracker that looks like yours : https://jira.mariadb.org/browse/MDEV-15339?attachmentViewMode=list

@reviskar
Copy link
Author

@romainruaud Thanks for asking. Seems to be exactly the issue we had. I've solved it as stated above, as it works for us very well and we had limited resources to solve it at that time.

@reviskar
Copy link
Author

@romainruaud our service provider will upgrade our staging and production servers to MariaDb 10.4 in a few days time. So we will find out soon enough, if the newer version is smarter in this regard.

@romainruaud
Copy link
Collaborator

Great, thank you for keeping us in touch about this (very annoying I admit it) issue.

Regards

@Quazz
Copy link

Quazz commented Nov 9, 2020

I recently upraded MariaDB to 10.4 (as it is supported by Magento 2.4.1) and found I had to disable an optimizer_switch or indexation would be very slow when part of the result set was empty. This impacts more than just Elasticsuite (Magento itself also suffers), just figured I'd share what I found.

Rowid_filter is new in MariaDB 10.4, but is about 100 times slower when result set is empty, causing indexation to be 10 times slower in my case (since I use a lot of global attributes, so store view values are empty)
set global optimizer_switch='rowid_filter=off';

For my.cnf
optimizer_switch=rowid_filter=off

EDIT: Looks like that made it into the Magento documentation!

https://devdocs.magento.com/guides/v2.4/performance-best-practices/configuration.html#indexers

@Quazz
Copy link

Quazz commented Nov 12, 2020

There's also this: magento/magento2#27129 to consider and potentially patch here.

Improves index times even further.

@romainruaud
Copy link
Collaborator

Hi @Quazz , thank you for the update.

On our side, we've had good results so far on a MariaDB 10.3 with the following parameters :

in_predicate_conversion_threshold = 0
optimizer_switch = 'optimize_join_buffer_size=on'

@Quazz
Copy link

Quazz commented Nov 23, 2020

@romainruaud Thank you for those. I believe 'optimize_join_buffer_size=on' is default from 10.4.3 onwards now.

@Wohlie
Copy link

Wohlie commented Mar 17, 2021

Hi, we're also still facing to this problem with the following configuration:

  • Percona 5.7.33-36
  • Magento 2.3.4
  • elasticsuite 2.8.8

Problem

Profiling:
getAttributesRawData take about 40sec
image

EXPLAIN of the query shows missing index usage for EAV tables:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE entity NULL range CATALOG_PRODUCT_ENTITY_CREATED_IN,CATALOG_PRODUCT_ENTITY_UPDATED_IN,CATALOG_PRODUCT_ENTITY_ENTITY_ID_CREATED_IN_UPDATED_IN CATALOG_PRODUCT_ENTITY_ENTITY_ID_CREATED_IN_UPDATED_IN 12 NULL 323 100 Using where; Using index
1 SIMPLE attr NULL index PRIMARY EAV_ATTRIBUTE_PIM_ID 195 NULL 1017 42.38 Using where; Using index; Using join buffer (Block Nested Loop)
1 SIMPLE t_default NULL eq_ref CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_STORE_ID CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID 8 magento.entity.row_id,magento.attr.attribute_id,const 1 100 NULL
1 SIMPLE t_store NULL eq_ref CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_STORE_ID CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID 8 magento.entity.row_id,magento.attr.attribute_id,const 1 100 NULL

Possible solution

I don't found a performant variant of this query that also uses indexes, so I decide to split the query into two parts and join the result back in php which increase the speed.

After the patch getAttributesRawData take about 4sec
image

I hope the following patch works like the intended behavior of the original implementation:

  • filter out null values
  • store values will override default values

The following patch is currently not tested in production:

Index: a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php b/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php
--- a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php	(date 1615973628940)
+++ b/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php	(date 1615973628940)
@@ -16,6 +16,7 @@

 use Magento\Framework\App\ResourceConnection;
 use Magento\Framework\EntityManager\MetadataPool;
+use Magento\Store\Model\Store;
 use Magento\Store\Model\StoreManagerInterface;
 use Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Indexer;
 use Magento\Eav\Model\ResourceModel\Entity\Attribute\Collection as AttributeCollection;
@@ -104,49 +105,45 @@
      */
     public function getAttributesRawData($storeId, array $entityIds, $tableName, array $attributeIds)
     {
-        $select = $this->connection->select();
-
         // The field modelizing the link between entity table and attribute values table. Either row_id or entity_id.
         $linkField = $this->getEntityMetaData($this->getEntityTypeId())->getLinkField();

         // The legacy entity_id field.
         $entityIdField = $this->getEntityMetaData($this->getEntityTypeId())->getIdentifierField();

-        $joinDefaultValuesCondition = [
-            new \Zend_Db_Expr("entity.$linkField = t_default.$linkField"),
-            't_default.attribute_id = attr.attribute_id',
-            $this->connection->quoteInto('t_default.store_id = ?', \Magento\Store\Model\Store::DEFAULT_STORE_ID),
+        //Define store related conditions, keep the order of the array elements!
+        $storeConditions = [
+            'default' => $this->connection->quoteInto('t_attribute.store_id = ?', Store::DEFAULT_STORE_ID),
+            'store'   => $this->connection->quoteInto('t_attribute.store_id = ?', $storeId),
         ];
-        $joinDefaultValuesCondition = implode(' AND ', $joinDefaultValuesCondition);

-        $joinStoreValuesConditionClauses = [
-            new \Zend_Db_Expr("entity.$linkField = t_store.$linkField"),
-            't_store.attribute_id = attr.attribute_id',
-            $this->connection->quoteInto('t_store.store_id = ?', $storeId),
-        ];
-        $joinStoreValuesCondition = implode(' AND ', $joinStoreValuesConditionClauses);
+        $result = [];
+        foreach ($storeConditions as $condition) {
+            $joinAttributeValuesCondition = [
+                new \Zend_Db_Expr("entity.$linkField = t_attribute.$linkField"),
+                $condition
+            ];
+            $joinAttributeValuesCondition = implode(' AND ', $joinAttributeValuesCondition);

-        $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
-            ->joinInner(
-                ['attr' => $this->getTable('eav_attribute')],
-                $this->connection->quoteInto('attr.attribute_id IN (?)', $attributeIds),
-                ['attribute_id']
-            )
-            ->joinLeft(
-                ['t_default' => $tableName],
-                $joinDefaultValuesCondition,
-                []
-            )
-            ->joinLeft(
-                ['t_store' => $tableName],
-                $joinStoreValuesCondition,
-                []
-            )
-            ->where("entity.{$entityIdField} IN (?)", $entityIds)
-            ->having('value IS NOT NULL')
-            ->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);
+            $select = $this->connection->select();
+            $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
+                ->joinLeft(
+                    ['t_attribute' => $tableName],
+                    $joinAttributeValuesCondition,
+                    ['attribute_id', 'value']
+                )
+                ->where("entity.{$entityIdField} IN (?)", $entityIds)
+                ->where("t_attribute.attribute_id IN (?)", $attributeIds)
+                ->where("t_attribute.value IS NOT NULL");

-        return $this->connection->fetchAll($select);
+            //Get the result and override values from a previous loop
+            foreach ($this->connection->fetchAll($select) as $row) {
+                $key = "{$row['entity_id']}-{$row['attribute_id']}";
+                $result[$key] = $row;
+            }
+        }
+
+        return array_values($result);
     }

     /**

Feel free to merge the patch into the core or leave it as optional patch for huge catalogs.

@Quazz
Copy link

Quazz commented Mar 17, 2021

About 15-25% improvement (for the full catalogsearch reindex) on a 125K ish catalog with 3 store views in a quick test. Can't fully speak to whether the results are the same, though the number of documents in the index seems to match at least.

@romainruaud
Copy link
Collaborator

Wow, good catch, we'll have to test it with caution, especially with Magento Enterprise where things are handled differently (row_id/entity_id).

@romainruaud
Copy link
Collaborator

@Wohlie did you test it in production ?

By the way, are you using Magento Commerce or Open Source ?

@Wohlie
Copy link

Wohlie commented May 5, 2021

Hi @romainruaud, we are using Magento Commerce 2.3.4. We are currently not live. Go live for this project is end of May. This patch is included since mid-March in our code base.

We don't test the patch for Magento open source but I re-use the original code for this patch. So the chance is high, it will also work with Magneto open source.

@romainruaud
Copy link
Collaborator

@Wohlie

how time flies :)

Are you still using this patch on production ? If yes, that's probably enough tested now :)

Regards

@Wohlie
Copy link

Wohlie commented Feb 6, 2023

Hi @romainruaud, yes we still use this patch in production. :)

@romainruaud
Copy link
Collaborator

Ok so this could probably be integrated into the core if this does help that much with performances.

I'll prioritize this.

Regards

romainruaud added a commit to romainruaud/elasticsuite that referenced this issue Jun 23, 2023
romainruaud added a commit that referenced this issue Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants