Skip to content

Commit

Permalink
WL#15517: [5] HeatWave Support for InnoDB Partitions - Support for
Browse files Browse the repository at this point in the history
Loading Partitions

This patch enables loading a set of InnoDB partitions to HeatWave.
While partition load modifies an existing relation and has DML
semantics, for performance reasons, it is implemented by (ab)using the
regular load codepath.

When the user types ALTER TABLE SECONDARY_LOAD PARTITION (p0,,pn), the
ha_rpd::load_table function is triggered. It extracts the requested
partitions, checks if any of them is already loaded and proceeds with
the not loaded ones. Opposingly to the regular table load, if the
operation fails, the table is still AVAILABLE.

When a partition is being loaded, the table enters a newly introduced
state which is PARTITION_LOADING. This state is a hybrid thing between
AVAILABLE and LOADING: While the table is in that state, Change
Propagation cannot propagate updates and at the same time, we allow
queries (which do not target the currently loaded partition) to offload.

~~~~~~~~~~  Related Bugs ~~~~~~~~~~~~

Bug#35976012: Load/Unload partition hangs and crashes after Ctrl C

Issue: In case persistence is enabled, the fake binlog buffer of
partition load/unload is removed by Change Propagation only when
the operation is completed and the corresponding data persisted.
In case of errors that prevent persistence, we should manually
remove the fake binlog buffer. Otherwise, the next partition
load/unload will hang, as it will wait for previous DMLs on this
table to complete. For this purpose, we have created the
InvalidatePartLoadBufferSync function. However, this
error-handling mechanism was used only in partition loads leaving
unloads exposed to the issue.

If we abort the stuck DDL without removing the fake binlog buffer
of the failed unload, then we hit the following assertion the
next time AddRemoveTables is executed, resulting in a crash:
AddRemoveTables(): Assertion `!buffers.empty() &&
buffers.front()->buffer_id == req.m_scn' failed.

Fix: Properly clearing the fake binlog buffers for failed partition
unloads resolves both issues.

Other changes:
- Add a test case for a failed partition unload
- Finalizing partition load before changing state: push down fake
binlog finalization within load_end and finalize the buffer before
changing the table's state.
- Fix the case where a binlog buffer contains multiple events for
partitions which are not loaded. In that case, we should manually
clean multiple buffers

---------------------------------------------------------------------

Bug#36032221: Crash Condition: "rapid::InSet(tab->tstate_rpdrgstab,
{LOADING_RPDGSTABSTATE, PA

Issue: If loadStart fails a scope guard that restores appropriately
the relation state is executed. The assertion in this scope guard
is missing a case:
- Let's consider table t1 has partition p0 loaded and is AVAILABLE
- When loading p1, loadStart fails before setting the state to
PARTITION_LOADING
- The scope guard is going to find t1 in GS and t1 will be in
AVAILABLE state. However, the code was assuming that t1 could be
only in LOADING or PARTITION_LOADING

Fix: Enhance the assertion, so that it also accounts for the
AVAILABLE case. Add a test case that deterministically repros the
issue

---------------------------------------------------------------------

Bug#36044925: alter table tbl secondary_unload PARTITION and update
queries stuck

Issue1: The fake_binlogs of partition load/unload are of IGNORABLE
event type. However, we do not add any extra info on the fake
binlogs and as a result every MySQL IGNORABLE event is treated as
a fake binlog by Change Propagation.
As for false positive fake binlogs the finalization part will
never be invoked, the corresponding binlog buffer is never removed
from CP and leads to a hang

Fix: In case of partition load/unload, mark the fake binlog buffers
with a specific signature in order to distinguish them from other
IGNORABLE events

Issue2: When finishing a partition load, to iterate the super-
partitions, mark_transaction_end receives the rel->rec_type.
However, in case of offloaded load this is not initialized for the
final relation, and hence we pass garbage to populate_part_info.
If this uninitialized value happens to be 3 (downsize), then we
iterate the wrong partition list, last_persisted_scn is not updated
as expected and the CP hangs.

Fix: When setting up the final relation for offloaded load, we
should copy to its metadata the rec_type that the temporary relation
has received from the plugin.

---------------------------------------------------------------------

Bug#36033627: Query in heatwave gets hung in partitions table
load/unload

Issue: There are three separate issues that make queries/DDLs
hang:
1. wrong persistence modes
2. wrong persistence contexts
3. deadlock in elasticity

~~~~~~~~~~~
Issues 1&2:  After downsize, partition load and unload hangs

Partition unload wasn't setting a specific value to the
persistence mode of the relation. As after a downsize some of
the super-partitions would have DNSIZE_RPDMDL_LD_PRST_MODE,
the table would be still considered in recovery, the rpdserver
wouldn't ack the plugin and the plugin would wait forever.

In the case of partition load, we were updating the mode in
the persistence context of the temporary relation we are using,
leaving the context of the actual relation with outdated mode
and having the same symptom as in unload

Fix:
- Unload: As conceptually parition unld has DML semantics, at
the beginning of partition unload, we follow the same logic
with insert_begin and change the persistence mode to
CP_RPDMDL_LD_PRST_MODE.

- Load: Set the appropriate info to the persistence context of
the actual base relation. Partition temporary relations should
have a NULL persistence context

Other changes:
- Fix partition unload to use the right modIndex map
- Enhance the elasticity test to make partition load/unload
after the resizing action

~~~~~~~~~~
Issue 3:
- A partition unload starts and gets the elasticity m_UnloadMutex
lock
- In the meantime, an elasticity job starts and pauses CP

Then, we have a deadlock:
- Partition unload gets stuck because with stopped CP,
PropagateSync in AssignSCN will never unblock
- Elasticity job cannot make progress because for redistributing
data, it needs m_UnloadMutex

Fix: Introduce a new mutex in resize handler that syncs with
partition unload. If partition unload takes the lock, elasticity
will not start until partition unload has finished. Similarly,
if elasticity grabs the lock, partition unlock will have to
wait for elasticity to finish.

---------------------------------------------------------------------

Bug#36161701: Elasticity stuck with parallel load/unload of partition
tables

Issue: In case elasticity hangs for some reason, a partition unload
also hangs. As partition unload holds an exclusive MDL, all operations
on that table hang

Fix: Change the synchronization mechanism of elasticity and partition
unload:
- Elasticity still cannot start while a partition unload is in progress
- A partition unload before starting, checks if elasticity is in
  progress. In case it is, the unload is aborted

Other fixes: In case a partition load aborts while elasticity is in
progress, just clear the load metadata in the plugin and don't send
an actual unload command. (load abort due to elasticity happens before
sending load_begin)

---------------------------------------------------------------------

Bug#36177655: Wl15517-Recovery operation with persistence hangs

Issue: In case of object-store recovery a node could go down, This
could happen for two reasons: i) memory corruptions and ii) networking
errors.

i) CP packets recovery happens in batches. In case of partition load,
the chunks we allocate are freed by the loading threads. However, the
allocated memory in the recovery framework is aligned with the recovery
batch size. In case a partition load spans multiple batches, there may
be a memory corruption

Fix: In case there is at least one partition operation, allocate new
chunks for each batch

ii) In order to figure out if this is a recovery load or not, load_end
was checking the temporary relation which has an invalid persistence
context. Hence, there was always the impression that the load comes
from the plugin and the rpdserver was trying to open a connection to
an invalid address.

Fix: For persistence-related operations (e.g., inspect the persistence
mode), we should always use the base relation

---------------------------------------------------------------------

Bug#36161660: Wrong value for NROWS in partition tables during CP
load/unload

Issue: In partially loaded tables, where partition load has been
applied, the NROWS metric in PFS shows the number of loaded rows.
However, in case of partition unload, we do not update the table's
stats and hence a workload of subsequent unload-load operations
continuously increases NROW.

Fix: Since partition unload is treated as delete in the rpdserver
side, after a successful partition unload update the delete stats
of the table.
In order to know the number of rows per partition, we extend the
table stats that the network context populates during propagation
with stats at the partition granularity

---------------------------------------------------------------------

Bug#36217109: Wl15517-rapid_bootstrap + LOAD/UNLOAD/DML hangs [noclose]

Issue: While persistence happens inline with partition load/unload, the
last_persisted_scn is updated the next time CP will try to remove
persisted buffers. If in the meantime the plugin downgrades, upon
recovery we will try to re-scan and re-propagate the fake binlog.
However, as there is no connection thread to finalize this operation,it
will remain in progress forever

Fix: Update last_persisted_scn during the finalization of the fake
binlog buffer. This will prevent rescanning in case of successful
partition load/unload.

Other changes:
- Change the synchronization mechanism to not rely on the size of
buffers_to_persist
- Prevent partition unloads when the plugin is in the SUSPENDED state

---------------------------------------------------------------------

Bug#36221083: Wl15517-rpd_tables.LOAD_PROGRESS shows wrong value for
part table in progress

Issue: If we query load progress while a partition is being loaded,
the reported percentage is wrong. The denominator in this percentage
is the total_expected_rows which by default is equal to the total
number of rows in the table. So, if a user loads a specific partition,
the progress will always be a significant underestimation

Fix: When loading a table, in case of partition load, set the number
of total_expected_rows to the number of rows that we expect to be
loaded.

---------------------------------------------------------------------

Bug#36234507: WL15517-Elasticity Hang Clears when we issue
Cancel_resize

Issue: When a partition load is aborted due to ongoing elasiticity,
elasticity hangs in WaitForPropagatedSCNs. The reason is that the
last propagated SCN for a table was updated in GS during loadStart
and not after actual propagation. Hence, for an aborted load, we
were falsely inflating the propagated SCN and waiting for the
persisted SCN to reach a value that wasn't possible to reach.

Fix: Update rpd_scn_rpdrgstab only after changes are propagated,i.e.,
after a successful load_end

Other changes:
- Cover this case with a test
- Add a test case for partition unload and elasticity: partition unload
should be aborted when elasticity is in progress
- Separate elasticity and recovery tests. Clean up the tests
- Abort partition load when elasticity cancel is in progress

---------------------------------------------------------------------

Bug#36246176: Sig 11 in rpdmdlj_des_rcv at rpdmdlj.c:1404

Issue: In the following scenario, we were skipping the persistence of
partition load command. So, during recovery, we were sending data
chunks to the load actors while a loading context wasn't existing.

Scenario: Let us assume we have 3 superpartitions in 3 rpdservers.
After a downsize, server2 is shutdown, server0 remains unchanged and
server1 downloads the superpartition of server2. So, server1 now has
2 superpatitions: 1 with persistence mode = LOAD and 1 with persistence
mdoe = DOWNSIZE. Now, a new partition load arrives from plugin.
However, because of the DOWNSIZE persistence mode, we do not persist
the load command for this superpartition. If next, we downsize to 1
node, while downloading the superpartition that originally existed in
server2, we hit the crash.

Fix: Rewrite the condition based on which we decide whether the load
comes from recovery.

---------------------------------------------------------------------

Bug#36217109: Wl15517-rapid_bootstrap + LOAD/UNLOAD/DML hangs

Issue: Assume a failed partition load that is followed by a plugin
downgrade to SUSPENDED. It may happen that CP has not removed the
corresponding binlog on time. In that case, when the plugin is back
ON, it may re-scan it. In that case, it will set a partition load/unld
operation in progress and there will be no finalization. Hence, all
subsequent partition load/unload will be ignored

Fix: To prevent re-scanning, when finalizing invalidated fake binlog
buffers, we should also clear their cache.

Other fixes:
- Fix table state and last_propagated_SCN in case of a failed partition
load
- Add a test-case for the above scenario

---------------------------------------------------------------------

Potential deadlock during recovery of multiple tables

Issue:
Let us assume two tables T1, T2.
- T1: recovery of T1 is in progress and is waiting for a status
to be received from DL actor --> cp persistence handler (CPH)
waiting for DL
- T2: The recovery of T2 is completed and the user attempts a new
partition load. Since, mark_transaction_end for partition load runs
in the main DL thread, there will be a blocking wait for partition
load --> Runs in DL and waits until CPH processes the list -->
DL busy by waiting for CPH

Fix: Prevent the user from issuing new partition loads/unloads while
the system is in CLUSTERREADY state

Change-Id: Ic75b6b675d3d889fd97de2760be714447bb4082e
  • Loading branch information
Evangelos Danias authored and dahlerlend committed Mar 4, 2024
1 parent 3ac3baf commit 951ffba
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions sql/sql_table.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16980,6 +16980,26 @@ bool mysql_alter_table(THD *thd, const char *new_db, const char *new_name,
}
}

if ((alter_info->flags & Alter_info::ALTER_DROP_PARTITION) != 0U) {
auto mdl_type = mdl_ticket->get_type();
auto downgrade_guard = create_scope_guard(
[mdl_ticket, mdl_type] { mdl_ticket->downgrade_lock(mdl_type); });

if (thd->mdl_context.upgrade_shared_lock(mdl_ticket, MDL_EXCLUSIVE,
thd->variables.lock_wait_timeout))
return true;

const dd::Table *table_def = nullptr;
if (thd->dd_client()->acquire(table_list->db, table_list->table_name,
&table_def))
return true;

table_list->partition_names = &alter_info->partition_names;
if (secondary_engine_unload_table(
thd, table_list->db, table_list->table_name, *table_def, false))
return true;
}

/*
Store all columns that are going to be dropped, since we need this list
when removing column statistics later. The reason we need to store it here,
Expand Down

0 comments on commit 951ffba

Please sign in to comment.