Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
WL#15517: [5] HeatWave Support for InnoDB Partitions - Support for
Loading Partitions This patch enables loading a set of InnoDB partitions to HeatWave. While partition load modifies an existing relation and has DML semantics, for performance reasons, it is implemented by (ab)using the regular load codepath. When the user types ALTER TABLE SECONDARY_LOAD PARTITION (p0,,pn), the ha_rpd::load_table function is triggered. It extracts the requested partitions, checks if any of them is already loaded and proceeds with the not loaded ones. Opposingly to the regular table load, if the operation fails, the table is still AVAILABLE. When a partition is being loaded, the table enters a newly introduced state which is PARTITION_LOADING. This state is a hybrid thing between AVAILABLE and LOADING: While the table is in that state, Change Propagation cannot propagate updates and at the same time, we allow queries (which do not target the currently loaded partition) to offload. ~~~~~~~~~~ Related Bugs ~~~~~~~~~~~~ Bug#35976012: Load/Unload partition hangs and crashes after Ctrl C Issue: In case persistence is enabled, the fake binlog buffer of partition load/unload is removed by Change Propagation only when the operation is completed and the corresponding data persisted. In case of errors that prevent persistence, we should manually remove the fake binlog buffer. Otherwise, the next partition load/unload will hang, as it will wait for previous DMLs on this table to complete. For this purpose, we have created the InvalidatePartLoadBufferSync function. However, this error-handling mechanism was used only in partition loads leaving unloads exposed to the issue. If we abort the stuck DDL without removing the fake binlog buffer of the failed unload, then we hit the following assertion the next time AddRemoveTables is executed, resulting in a crash: AddRemoveTables(): Assertion `!buffers.empty() && buffers.front()->buffer_id == req.m_scn' failed. Fix: Properly clearing the fake binlog buffers for failed partition unloads resolves both issues. Other changes: - Add a test case for a failed partition unload - Finalizing partition load before changing state: push down fake binlog finalization within load_end and finalize the buffer before changing the table's state. - Fix the case where a binlog buffer contains multiple events for partitions which are not loaded. In that case, we should manually clean multiple buffers --------------------------------------------------------------------- Bug#36032221: Crash Condition: "rapid::InSet(tab->tstate_rpdrgstab, {LOADING_RPDGSTABSTATE, PA Issue: If loadStart fails a scope guard that restores appropriately the relation state is executed. The assertion in this scope guard is missing a case: - Let's consider table t1 has partition p0 loaded and is AVAILABLE - When loading p1, loadStart fails before setting the state to PARTITION_LOADING - The scope guard is going to find t1 in GS and t1 will be in AVAILABLE state. However, the code was assuming that t1 could be only in LOADING or PARTITION_LOADING Fix: Enhance the assertion, so that it also accounts for the AVAILABLE case. Add a test case that deterministically repros the issue --------------------------------------------------------------------- Bug#36044925: alter table tbl secondary_unload PARTITION and update queries stuck Issue1: The fake_binlogs of partition load/unload are of IGNORABLE event type. However, we do not add any extra info on the fake binlogs and as a result every MySQL IGNORABLE event is treated as a fake binlog by Change Propagation. As for false positive fake binlogs the finalization part will never be invoked, the corresponding binlog buffer is never removed from CP and leads to a hang Fix: In case of partition load/unload, mark the fake binlog buffers with a specific signature in order to distinguish them from other IGNORABLE events Issue2: When finishing a partition load, to iterate the super- partitions, mark_transaction_end receives the rel->rec_type. However, in case of offloaded load this is not initialized for the final relation, and hence we pass garbage to populate_part_info. If this uninitialized value happens to be 3 (downsize), then we iterate the wrong partition list, last_persisted_scn is not updated as expected and the CP hangs. Fix: When setting up the final relation for offloaded load, we should copy to its metadata the rec_type that the temporary relation has received from the plugin. --------------------------------------------------------------------- Bug#36033627: Query in heatwave gets hung in partitions table load/unload Issue: There are three separate issues that make queries/DDLs hang: 1. wrong persistence modes 2. wrong persistence contexts 3. deadlock in elasticity ~~~~~~~~~~~ Issues 1&2: After downsize, partition load and unload hangs Partition unload wasn't setting a specific value to the persistence mode of the relation. As after a downsize some of the super-partitions would have DNSIZE_RPDMDL_LD_PRST_MODE, the table would be still considered in recovery, the rpdserver wouldn't ack the plugin and the plugin would wait forever. In the case of partition load, we were updating the mode in the persistence context of the temporary relation we are using, leaving the context of the actual relation with outdated mode and having the same symptom as in unload Fix: - Unload: As conceptually parition unld has DML semantics, at the beginning of partition unload, we follow the same logic with insert_begin and change the persistence mode to CP_RPDMDL_LD_PRST_MODE. - Load: Set the appropriate info to the persistence context of the actual base relation. Partition temporary relations should have a NULL persistence context Other changes: - Fix partition unload to use the right modIndex map - Enhance the elasticity test to make partition load/unload after the resizing action ~~~~~~~~~~ Issue 3: - A partition unload starts and gets the elasticity m_UnloadMutex lock - In the meantime, an elasticity job starts and pauses CP Then, we have a deadlock: - Partition unload gets stuck because with stopped CP, PropagateSync in AssignSCN will never unblock - Elasticity job cannot make progress because for redistributing data, it needs m_UnloadMutex Fix: Introduce a new mutex in resize handler that syncs with partition unload. If partition unload takes the lock, elasticity will not start until partition unload has finished. Similarly, if elasticity grabs the lock, partition unlock will have to wait for elasticity to finish. --------------------------------------------------------------------- Bug#36161701: Elasticity stuck with parallel load/unload of partition tables Issue: In case elasticity hangs for some reason, a partition unload also hangs. As partition unload holds an exclusive MDL, all operations on that table hang Fix: Change the synchronization mechanism of elasticity and partition unload: - Elasticity still cannot start while a partition unload is in progress - A partition unload before starting, checks if elasticity is in progress. In case it is, the unload is aborted Other fixes: In case a partition load aborts while elasticity is in progress, just clear the load metadata in the plugin and don't send an actual unload command. (load abort due to elasticity happens before sending load_begin) --------------------------------------------------------------------- Bug#36177655: Wl15517-Recovery operation with persistence hangs Issue: In case of object-store recovery a node could go down, This could happen for two reasons: i) memory corruptions and ii) networking errors. i) CP packets recovery happens in batches. In case of partition load, the chunks we allocate are freed by the loading threads. However, the allocated memory in the recovery framework is aligned with the recovery batch size. In case a partition load spans multiple batches, there may be a memory corruption Fix: In case there is at least one partition operation, allocate new chunks for each batch ii) In order to figure out if this is a recovery load or not, load_end was checking the temporary relation which has an invalid persistence context. Hence, there was always the impression that the load comes from the plugin and the rpdserver was trying to open a connection to an invalid address. Fix: For persistence-related operations (e.g., inspect the persistence mode), we should always use the base relation --------------------------------------------------------------------- Bug#36161660: Wrong value for NROWS in partition tables during CP load/unload Issue: In partially loaded tables, where partition load has been applied, the NROWS metric in PFS shows the number of loaded rows. However, in case of partition unload, we do not update the table's stats and hence a workload of subsequent unload-load operations continuously increases NROW. Fix: Since partition unload is treated as delete in the rpdserver side, after a successful partition unload update the delete stats of the table. In order to know the number of rows per partition, we extend the table stats that the network context populates during propagation with stats at the partition granularity --------------------------------------------------------------------- Bug#36217109: Wl15517-rapid_bootstrap + LOAD/UNLOAD/DML hangs [noclose] Issue: While persistence happens inline with partition load/unload, the last_persisted_scn is updated the next time CP will try to remove persisted buffers. If in the meantime the plugin downgrades, upon recovery we will try to re-scan and re-propagate the fake binlog. However, as there is no connection thread to finalize this operation,it will remain in progress forever Fix: Update last_persisted_scn during the finalization of the fake binlog buffer. This will prevent rescanning in case of successful partition load/unload. Other changes: - Change the synchronization mechanism to not rely on the size of buffers_to_persist - Prevent partition unloads when the plugin is in the SUSPENDED state --------------------------------------------------------------------- Bug#36221083: Wl15517-rpd_tables.LOAD_PROGRESS shows wrong value for part table in progress Issue: If we query load progress while a partition is being loaded, the reported percentage is wrong. The denominator in this percentage is the total_expected_rows which by default is equal to the total number of rows in the table. So, if a user loads a specific partition, the progress will always be a significant underestimation Fix: When loading a table, in case of partition load, set the number of total_expected_rows to the number of rows that we expect to be loaded. --------------------------------------------------------------------- Bug#36234507: WL15517-Elasticity Hang Clears when we issue Cancel_resize Issue: When a partition load is aborted due to ongoing elasiticity, elasticity hangs in WaitForPropagatedSCNs. The reason is that the last propagated SCN for a table was updated in GS during loadStart and not after actual propagation. Hence, for an aborted load, we were falsely inflating the propagated SCN and waiting for the persisted SCN to reach a value that wasn't possible to reach. Fix: Update rpd_scn_rpdrgstab only after changes are propagated,i.e., after a successful load_end Other changes: - Cover this case with a test - Add a test case for partition unload and elasticity: partition unload should be aborted when elasticity is in progress - Separate elasticity and recovery tests. Clean up the tests - Abort partition load when elasticity cancel is in progress --------------------------------------------------------------------- Bug#36246176: Sig 11 in rpdmdlj_des_rcv at rpdmdlj.c:1404 Issue: In the following scenario, we were skipping the persistence of partition load command. So, during recovery, we were sending data chunks to the load actors while a loading context wasn't existing. Scenario: Let us assume we have 3 superpartitions in 3 rpdservers. After a downsize, server2 is shutdown, server0 remains unchanged and server1 downloads the superpartition of server2. So, server1 now has 2 superpatitions: 1 with persistence mode = LOAD and 1 with persistence mdoe = DOWNSIZE. Now, a new partition load arrives from plugin. However, because of the DOWNSIZE persistence mode, we do not persist the load command for this superpartition. If next, we downsize to 1 node, while downloading the superpartition that originally existed in server2, we hit the crash. Fix: Rewrite the condition based on which we decide whether the load comes from recovery. --------------------------------------------------------------------- Bug#36217109: Wl15517-rapid_bootstrap + LOAD/UNLOAD/DML hangs Issue: Assume a failed partition load that is followed by a plugin downgrade to SUSPENDED. It may happen that CP has not removed the corresponding binlog on time. In that case, when the plugin is back ON, it may re-scan it. In that case, it will set a partition load/unld operation in progress and there will be no finalization. Hence, all subsequent partition load/unload will be ignored Fix: To prevent re-scanning, when finalizing invalidated fake binlog buffers, we should also clear their cache. Other fixes: - Fix table state and last_propagated_SCN in case of a failed partition load - Add a test-case for the above scenario --------------------------------------------------------------------- Potential deadlock during recovery of multiple tables Issue: Let us assume two tables T1, T2. - T1: recovery of T1 is in progress and is waiting for a status to be received from DL actor --> cp persistence handler (CPH) waiting for DL - T2: The recovery of T2 is completed and the user attempts a new partition load. Since, mark_transaction_end for partition load runs in the main DL thread, there will be a blocking wait for partition load --> Runs in DL and waits until CPH processes the list --> DL busy by waiting for CPH Fix: Prevent the user from issuing new partition loads/unloads while the system is in CLUSTERREADY state Change-Id: Ic75b6b675d3d889fd97de2760be714447bb4082e
- Loading branch information