Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXC-4173: PXC node stalls with parallel replication workers executing DDLs via async node #1927

Open
wants to merge 1 commit into
base: 8.0
Choose a base branch
from

Conversation

venkatesh-prasad-v
Copy link
Contributor

@venkatesh-prasad-v venkatesh-prasad-v commented Jun 12, 2024

https://perconadev.atlassian.net/browse/PXC-4173

Problem
=======
PXC replica could enter into a deadlock with multi threaded replication when
--replica-preserve-commit-order is enabled.

The deadlock is more likely to occur on a replica with
--replica-preserve-commit-order enabled when the source server's workload
consists of concurrent executions of DMLs and DDLs. In such a scenario, the
replica shall try to commit them in the same order as that of the source which
will conflict with galera's ordering resulting in a deadlock.

MySQL ordering: Commits are serialized as per their commit order on source server.
Galera ordering: Commits are serialized in a FIFO manner.

Scenario
========
Let's assume we have the following two transactions in the relay log with same
last_committed.

T1: DML, seqno=10
T2: DDL, seqno=11

+-----------+--------------+-------------------------------------------------------+
| Timestamp |    Thread    |                        Activity                       |
+-----------+--------------+-------------------------------------------------------+
|     t0    | Co-ordinator | Assigns T1 to Applier-1                               |
+-----------+--------------+-------------------------------------------------------+
|     t1    | Co-ordinator | Assigns T2 to Applier-2                               |
+-----------+--------------+-------------------------------------------------------+
|     t2    |   Applier-1  | DML starts with seqno = 10                            |
+-----------+--------------+-------------------------------------------------------+
|     t3    |   Applier-2  | DDL starts with seqno = 11                            |
+-----------+--------------+-------------------------------------------------------+
|     t4    |   Applier-2  | DDL calls TOI_begin(), acquires commit monitor        |
+-----------+--------------+-------------------------------------------------------+
|     t5    |   Applier-2  | DDL executes                                          |
+-----------+--------------+-------------------------------------------------------+
|     t5    |   Applier-1  | DML executes                                          |
+-----------+--------------+-------------------------------------------------------+
|     t6    |   Applier-2  | DDL reaches commit_order_manager, finds that          |
|           |              | it needs to wait until Applier-1 is committed         |
+-----------+--------------+-------------------------------------------------------+
|     t7    |   Applier-1  | DML reaches commit, calls commit_order_enter_local()  |
|           |              | and will wait for Applier-2 to release commit monitor |
+-----------+--------------+-------------------------------------------------------+

In the end it will result in
Applier-1: DML executing Xid_log_event::commit waiting for DDL to release commit order
Applier-2: DDL executing ALTER TABLE waiting for applier-1 to commit first

Solution
========
This commit introduces "Async Monitor" at server layer to ensure that the
ordering in galera is done in the same order in the relay log.

Async Monitor is similar to Commit Monitor used in galera, but differs in how
seqnois are tracked within the monitor.

The Async Monitor has the following methods.
1. schedule() - Schedules the transactions in as per their order in the
                the relay log.
2. enter() - Ensures that threads are serialized as per their order.
3. leave() - Ensures that threads are removed from the monitor in the serialized order.
4. skip()  - Provides some flexibilty to the monitor to skip transactions. This
             will be required for handling skipped and filtered transactions.

Lifetime of the Async Monitor:
Async Monitor is created on START REPLICA and is destroyed on STOP REPLICA, or
whenever applier thread exits.

@it-percona-cla
Copy link

it-percona-cla commented Sep 18, 2024

CLA assistant check
All committers have signed the CLA.

@venkatesh-prasad-v venkatesh-prasad-v force-pushed the 8.0-PXC-4173-monitor branch 4 times, most recently from 63a4df8 to 25c09e8 Compare September 19, 2024 14:53
sql/wsrep_async_monitor.cc Outdated Show resolved Hide resolved
sql/wsrep_mysqld.cc Outdated Show resolved Hide resolved
sql/wsrep_mysqld.cc Outdated Show resolved Hide resolved
sql/wsrep_mysqld.cc Outdated Show resolved Hide resolved
sql/wsrep_async_monitor.cc Outdated Show resolved Hide resolved
sql/wsrep_async_monitor.cc Outdated Show resolved Hide resolved
… DDLs via async node

https://perconadev.atlassian.net/browse/PXC-4173

Problem
=======
PXC replica could enter into a deadlock with multi threaded replication when
--replica-preserve-commit-order is enabled.

The deadlock is more likely to occur on a replica with
--replica-preserve-commit-order enabled when the source server's workload
consists of concurrent executions of DMLs and DDLs. In such a scenario, the
replica shall try to commit them in the same order as that of the source which
will conflict with galera's ordering resulting in a deadlock.

MySQL ordering: Commits are serialized as per their commit order on source server.
Galera ordering: Commits are serialized in a FIFO manner.

Scenario
========
Let's assume we have the following two transactions in the relay log with same
last_committed.

T1: DML, seqno=10
T2: DDL, seqno=11

+-----------+--------------+-------------------------------------------------------+
| Timestamp |    Thread    |                        Activity                       |
+-----------+--------------+-------------------------------------------------------+
|     t0    | Co-ordinator | Assigns T1 to Applier-1                               |
+-----------+--------------+-------------------------------------------------------+
|     t1    | Co-ordinator | Assigns T2 to Applier-2                               |
+-----------+--------------+-------------------------------------------------------+
|     t2    |   Applier-1  | DML starts with seqno = 10                            |
+-----------+--------------+-------------------------------------------------------+
|     t3    |   Applier-2  | DDL starts with seqno = 11                            |
+-----------+--------------+-------------------------------------------------------+
|     t4    |   Applier-2  | DDL calls TOI_begin(), acquires commit monitor        |
+-----------+--------------+-------------------------------------------------------+
|     t5    |   Applier-2  | DDL executes                                          |
+-----------+--------------+-------------------------------------------------------+
|     t5    |   Applier-1  | DML executes                                          |
+-----------+--------------+-------------------------------------------------------+
|     t6    |   Applier-2  | DDL reaches commit_order_manager, finds that          |
|           |              | it needs to wait until Applier-1 is committed         |
+-----------+--------------+-------------------------------------------------------+
|     t7    |   Applier-1  | DML reaches commit, calls commit_order_enter_local()  |
|           |              | and will wait for Applier-2 to release commit monitor |
+-----------+--------------+-------------------------------------------------------+

In the end it will result in
Applier-1: DML executing Xid_log_event::commit waiting for DDL to release commit order
Applier-2: DDL executing ALTER TABLE waiting for applier-1 to commit first

Solution
========
This commit introduces "Async Monitor" at server layer to ensure that the
ordering in galera is done in the same order in the relay log.

Async Monitor is similar to Commit Monitor used in galera, but differs in how
seqnois are tracked within the monitor.

The Async Monitor has the following methods.
1. schedule() - Schedules the transactions in as per their order in the
                the relay log.
2. enter() - Ensures that threads are serialized as per their order.
3. leave() - Ensures that threads are removed from the monitor in the serialized order.
4. skip()  - Provides some flexibilty to the monitor to skip transactions. This
             will be required for handling skipped and filtered transactions.

Lifetime of the Async Monitor:
Async Monitor is created on START REPLICA and is destroyed on STOP REPLICA, or
whenever applier thread exits.
sql/log_event.cc Show resolved Hide resolved
sql/rpl_gtid_execution.cc Show resolved Hide resolved
sql/rpl_rli.h Show resolved Hide resolved
sql/wsrep_async_monitor.cc Show resolved Hide resolved
sql/wsrep_async_monitor.cc Show resolved Hide resolved
sql/wsrep_mysqld.cc Show resolved Hide resolved
sql/wsrep_mysqld.cc Show resolved Hide resolved
sql/log_event.cc Show resolved Hide resolved
sql/wsrep_async_monitor.cc Show resolved Hide resolved

// Wait until this transaction is at the head of the scheduled queue
m_cond.wait(lock, [this, seqno] {
// Remove skipped transactions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this can happen? Could you add a code comment?
I suspect this is the case when A, B, C have been scheduled, then B skipped, but skip() removes only from the front of scheduled_seqno queue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Imagine a scenario where scheduled seqnos is 1,2,3 and threads enter in an out of order manner.

  • 3 enters the monitor first, since it is not in the front of the queue, it goes to cond_wait
  • 2 enters the monitor, adds 2 to skipped_seqnos
  • 1 enters the monitor, since it is in the front, it acquires the monitor, does its job and leaves the monitor by removing itself from the scheduled_seqnos and signals 3.
  • 3 wakes up, but it will see that 2 in the front of the queue. To proceed, it needs to remove all the skipped seqnos in the scheduled_seqnos queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to the code explaining the purpose of this part?

@kamil-holubicki
Copy link
Contributor

The commit message says about ordering transactions basing on their seqnos i not way this = prev + 1. But the code does it in a different way (I can understand that logically it is still the same). It would be good to add more details about how the current implementation actually works.

@venkatesh-prasad-v
Copy link
Contributor Author

@kamil-holubicki Description is updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants