Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed files erroneously resurrected on git import. #1

Merged
merged 1 commit into from
Feb 17, 2015

Conversation

akopytov
Copy link
Contributor

No description provided.

akopytov added a commit that referenced this pull request Feb 17, 2015
Removed files erroneously resurrected on git import.
@akopytov akopytov merged commit 6500ca1 into percona:5.5 Feb 17, 2015
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Feb 24, 2015
The RELEASE_LOCK() implementation introduced by the multiple user level
locks patch can result in a deadlock under the following conditions:

- connection #1 calls RELEASE_LOCK() for a previously acquired lock. In
  which case MDL_lock::remove_ticket() is called, which write-locks
  MDL_lock::m_rwlock corresponding to the user-level MDL object. With
  that lock held, MDL_map_partition::remove() is called which locks
  MDL_map_partition::m_mutex protecting the hash of MDL locks belonging
  to an MDL partition.

- connection #2 calls RELEASE_LOCK() simultaneously for the same lock
  being released by connection #1. Since connection #2 did not own the
  lock, it calls MDL_map_partition::get_lock_owner() to check if 0 or
  NULL should be returned (i.e. if the lock exists). That function also
  locks both MDL_map_partition::m_mutex and MDL_lock::m_rwlock(), but in
  the reverse order as compared to connection #1.

With the right timing for the above events we get each thread waiting
for a lock acquired by the other thread, i.e. a deadlock.

Fixed by avoiding to lock MDL_lock::m_rwlock with
MDL_map_partition::m_mutex locked. There is already infrastructure to
release the latter and acquire the former and guarantee that a reference
to an MDL_lock object is valid at the same time. It is implemented in
MDL_map_partition::move_from_hash_to_lock_mutex(), so the fix utilizes
it to remove the deadlock condition.

(cherry picked from commit 4c1eeb4)

Conflicts:
	sql/mdl.cc
akopytov added a commit to akopytov/percona-server that referenced this pull request Jul 28, 2015
https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info

One inefficiency of the backup locks feature is that even though LOCK
TABLES FOR BACKUP as a light-weight FTWRL alternative does not affect
DML statements updating InnoDB tables, LOCK BINLOG FOR BACKUP does
affect them by blocking commits.

XtraBackup uses LOCK BINLOG FOR BACKUP to:

1. retrieve consistent binary log coordinates with SHOW MASTER
   STATUS. More precisely, binary log coordinates must be consistent
   with the REDO log copy and non-transactional tables. Therefore, no
   updates can be done to non-transactional tables (this is achieved by
   an active LOCK TABLES FOR BACKUP lock), and no commits can be
   performed between SHOW MASTER STATUS and finalizing the redo log
   copy, which is achieved by LOCK BINLOG FOR BACKUP.

2. retrieve consistent master connection information for a replication
   slave. More precisely, the binary log coordinates on the master as
   reported by SHOW SLAVE STATUS must be consistent with the REDO log
   copy, so LOCK BINLOG FOR BACKUP also block the I/O replication thread.

3. For a GTID-enabled PXC node, the last binary log file must be
   included into an SST snapshot. Which is a rather artificial
   limitation on the WSREP side, but still XtraBackup obeys it by
   blocking commits with LOCK BINLOG FOR BACKUP to ensure the integrity
   of the binary log file copy.

Depending on the write rate on the server, finalizing the REDO log copy
may take a long time, so blocking commits for that duration may still
affect server availability considerably.

This task is to make the necessary server-side change to make it
possible for XtraBackup to avoid LOCK BINLOG FOR BINLOG in case percona#1, when
cases percona#2 and percona#3 do not apply, i.e. when no --slave-info is requested by
the XtraBackup options and the server is not a GTID-enabled PXC node.

Lifting limitations for cases percona#2 and percona#3 is also possible, but that is
outside the scope of this task.

The idea of the optimization is that even though InnoDB provides a
transactional storage for the binary log information (i.e. current file
name and offset), it cannot be fully trusted by XtraBackup, because that
information is only updated on an InnoDB commit operation. Which means
if the last operation before LOCK TABLES FOR BACKUP was an update to a
non-transactional storage engine, and no InnoDB commits occur before the
backup is finalized by XtraBackup, the InnoDB system header will contain
stale binary log coordinates.

One way to fix that would be to force binlog coordinates update in the
InnoDB system header on each update, regardless of the involved storage
engine(s). This is what a Galera node does to ensure XID consistency
which is stored in the same way as binary log coordinates: it forces XID
update in the InnoDB system header on each TOI operation, in particular
on each non-transactional update.

Another approach is less invasive: XtraBackup blocks all
non-transactional updates with LOCK TABLES FOR BACKUP anyway, so instead
of having all non-transactional updates flush binlog coordinates to
InnoDB unconditionally, LTFB can be modified to flush (and redo-log) the
current binlog coordinates to InnoDB. In which case binlog coordinates
provided by InnoDB will be consistent with REDO log under any
circumstances.

This patch implements the latter approach.
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Oct 23, 2015
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Oct 23, 2015
…TO SELF

Problem: If a multi-column update statement fails when updating one of
the columns in a row, it will go on and update the remaining columns
in that row before it stops and reports an error. If the failure
happens when updating a JSON column, and the JSON column is also
referenced later in the update statement, new and more serious errors
can happen when the update statement attempts to read the JSON column,
as it may contain garbage at this point.

The fix is twofold:

1) Field_json::val_str() currently returns NULL if an error happens.
This is correct for val_str() functions in the Item class hierarchy,
but not for val_str() functions in the Field class hierarchy. The
val_str() functions in the Field classes instead return a pointer to
an empty String object on error. Since callers don't expect it to
return NULL, this caused a crash when a caller unconditionally
dereferenced the returned pointer. The patch makes
Field_json::val_str() return a pointer to an empty String on error to
avoid such crashes.

2) Whereas #1 fixes the immediate crash, Field_json::val_str() may
still read garbage when this situation occurs. This could lead to
unreliable behaviour, and both valgrind and ASAN warn about it. The
patch therefore also makes Field_json::store() start by clearing the
field, so that it will hold an empty value rather than garbage after
an error has happened.

Fix #2 is sufficient to fix the reported problems. Fix #1 is included
for consistency, so that Field_json::val_str() behaves the same way as
the other Field::val_str() functions.

The query in the bug report didn't always crash. Since the root cause
was that it had read garbage, it could be lucky and read something
that looked like a valid value. In that case, Field_json::val_str()
didn't return NULL, and the crash was avoided.

The patch also makes these changes:

- It removes the Field_json::store_dom() function, since it is only
  called from one place. It is now inlined instead.

- It corrects information about return values in the comment that
  describes the ensure_utf8mb4() function.
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Oct 23, 2015
Background:

WAIT_FOR_EXECUTED_GTID_SET waits until a specified set of GTIDs is
included in GTID_EXECUTED. SET GTID_PURGED adds GTIDs to
GTID_EXECUTED. RESET MASTER clears GTID_EXECUTED.

There were multiple issues:

 1. Problem:

    The change in GTID_EXECUTED implied by SET GTID_PURGED did
    not cause WAIT_FOR_EXECUTED_GTID_SET to stop waiting.

    Analysis:

    WAIT_FOR_EXECUTED_GTID_SET waits for a signal to be sent.
    But SET GTID_PURGED never sent the signal.

    Fix:

    Make GTID_PURGED send the signal.

    Changes:
    - sql/rpl_gtid_state.cc:Gtid_state::add_lost_gtids
    - sql/rpl_gtid_state.cc: removal of #ifdef HAVE_GTID_NEXT_LIST
    - sql/rpl_gtid.h: removal of #ifdef HAVE_GTID_NEXT_LIST

 2. Problem:

    There was a race condition where WAIT_FOR_EXECUTED_GTID_SET
    could miss the signal from a commit and go into an infinite
    wait even if GTID_EXECUTED contains all the waited-for GTIDs.

    Analysis:

    In the bug, WAIT_FOR_EXECUTED_GTID_SET took a lock while
    taking a copy of the global state. Then it released the lock,
    analyzed the copy of the global state, and decided whether it
    should wait.  But if the GTID to wait for was committed after
    the lock was released, WAIT_FOR_EXECUTED_GTID_SET would miss
    the signal and go to an infinite wait even if GTID_EXECUTED
    contains all the waited-for GTIDs.

    Fix:

    Refactor the code so that it holds the lock all the way from
    before it reads the global state until it goes to the wait.

    Changes:

    - sql/rpl_gtid_state.cc:Gtid_state::wait_for_gtid_set:
      Most of the changes in this function are to fix this bug.

    Note:

    When the bug existed, it was possible to create a test case
    for this by placing a debug sync point in the section where
    it does not hold the lock.  However, after the bug has been
    fixed this section does not exist, so there is no way to test
    it deterministically.  The bug would also cause the test to
    fail rarely, so a way to test this is to run the test case
    1000 times.

 3. Problem:

    The function would take global_sid_lock.wrlock every time it has
    to wait, and while holding it takes a copy of the entire
    gtid_executed (which implies allocating memory).  This is not very
    optimal: it may process the entire set each time it waits, and it
    may wait once for each member of the set, so in the worst case it
    is O(N^2) where N is the size of the set.

    Fix:

    This is fixed by the same refactoring that fixes problem #2.  In
    particular, it does not re-process the entire Gtid_set for each
    committed transaction. It only removes all intervals of
    gtid_executed for the current sidno from the remainder of the
    wait-for-set.

    Changes:
    - sql/rpl_gtid_set.cc: Add function remove_intervals_for_sidno.
    - sql/rpl_gtid_state.cc: Use remove_intervals_for_sidno and remove
      only intervals for the current sidno. Remove intervals
      incrementally in the innermost while loop, rather than recompute
      the entire set each iteration.

 4. Problem:

    If the client that executes WAIT_FOR_EXECUTED_GTID_SET owns a
    GTID that is included in the set, then there is no chance for
    another thread to commit it, so it will wait forever.  In
    effect, it deadlocks with itself.

    Fix:

    Detect the situation and generate an error.

    Changes:
    - sql/share/errmsg-utf8.txt: new error code
      ER_CANT_WAIT_FOR_EXECUTED_GTID_SET_WHILE_OWNING_A_GTID
    - sql/item_func.cc: check the condition and generate the new error

 5. Various simplfications.

    - sql/item_func.cc:Item_wait_for_executed_gtid_set::val_int:
      - Pointless to set null_value when generating an error.
      - add DBUG_ENTER
      - Improve the prototype for Gtid_state::wait_for_gtid_set so
        that it takes a Gtid_set instead of a string, and also so that
        it requires global_sid_lock.
    - sql/rpl_gtid.h:Mutex_cond_array
      - combine wait functions into one and make it return bool
      - improve some comments
    - sql/rpl_gtid_set.cc:Gtid_set::remove_gno_intervals:
      - Optimize so that it returns early if this set becomes empty

@mysql-test/extra/rpl_tests/rpl_wait_for_executed_gtid_set.inc
- Move all wait_for_executed_gtid_set tests into
  mysql-test/suite/rpl/t/rpl_wait_for_executed_gtid_set.test

@mysql-test/include/kill_wait_for_executed_gtid_set.inc
@mysql-test/include/wait_for_wait_for_executed_gtid_set.inc
- New auxiliary scripts.

@mysql-test/include/rpl_init.inc
- Document undocumented side effect.

@mysql-test/suite/rpl/r/rpl_wait_for_executed_gtid_set.result
- Update result file.

@mysql-test/suite/rpl/t/rpl_wait_for_executed_gtid_set.test
- Rewrote the test to improve coverage and cover all parts of this bug.

@sql/item_func.cc
- Add DBUG_ENTER
- No point in setting null_value when generating an error.
- Do the decoding from text to Gtid_set here rather than in Gtid_state.
- Check for the new error
  ER_CANT_WAIT_FOR_EXECUTED_GTID_SET_WHILE_OWNING_A_GTID

@sql/rpl_gtid.h
- Simplify the Mutex_cond_array::wait functions in the following ways:
  - Make them one function since they share most code. This also allows
    calling the three-argument function with NULL as the last
    parameter, which simplifies the caller.
  - Make it return bool rather than 0/ETIME/ETIMEOUT, to make it more
    easy to use.
- Make is_thd_killed private.
- Add prototype for new Gtid_set::remove_intervals_for_sidno.
- Add prototype for Gtid_state::wait_for_sidno.
- Un-ifdef-out lock_sidnos/unlock_sidnos/broadcast_sidnos since we now
  need them.
- Make wait_for_gtid_set return bool.

@sql/rpl_gtid_mutex_cond_array.cc
- Remove the now unused check_thd_killed.

@sql/rpl_gtid_set.cc
- Optimize Gtid_set::remove_gno_intervals, so that it returns early
  if the Interval list becomes empty.
- Add Gtid_set::remove_intervals_for_sidno. This is just a wrapper
  around the already existing private member function
  Gtid_set::remove_gno_intervals.

@sql/rpl_gtid_state.cc
- Rewrite wait_for_gtid_set to fix problems 2 and 3. See code
  comment for details.
- Factor out wait_for_sidno from wait_for_gtid.
- Enable broadcast_sidnos/lock_sidnos/unlock_sidnos, which were ifdef'ed out.
- Call broadcast_sidnos after updating the state, to fix issue #1.

@sql/share/errmsg-utf8.txt
- Add error message used to fix issue #4.
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Oct 23, 2015
An assert failure is seen in some queries which have a semijoin and
use the materialization strategy.

The assertion fails if either the length of the key is zero or the
number of key parts is zero. This could indicate two different
problems.

1) If the length is zero, there may not be a problem, as it can
legitimately be zero if, for example, the key is a zero-length string.

2) If the number of key parts is zero, there is a bug, as a key must
have at least one part.

The patch fixes issue #1 by removing the length check in the
assertion.

Issue #2 happens if JOIN::update_equalities_for_sjm() doesn't
recognize the expression selected from a subquery, and fails to
replace it with a reference to a column in a temporary table that
holds the materialized result. This causes it to not recognize it as a
part of the key later, and keyparts could end up as zero. The patch
fixes it by calling real_item() on the expression in order to see
through Item_refs that may wrap the expression if the subquery reads
from a view.
laurynas-biveinis pushed a commit that referenced this pull request Dec 8, 2015
The RELEASE_LOCK() implementation introduced by the multiple user level
locks patch can result in a deadlock under the following conditions:

- connection #1 calls RELEASE_LOCK() for a previously acquired lock. In
  which case MDL_lock::remove_ticket() is called, which write-locks
  MDL_lock::m_rwlock corresponding to the user-level MDL object. With
  that lock held, MDL_map_partition::remove() is called which locks
  MDL_map_partition::m_mutex protecting the hash of MDL locks belonging
  to an MDL partition.

- connection #2 calls RELEASE_LOCK() simultaneously for the same lock
  being released by connection #1. Since connection #2 did not own the
  lock, it calls MDL_map_partition::get_lock_owner() to check if 0 or
  NULL should be returned (i.e. if the lock exists). That function also
  locks both MDL_map_partition::m_mutex and MDL_lock::m_rwlock(), but in
  the reverse order as compared to connection #1.

With the right timing for the above events we get each thread waiting
for a lock acquired by the other thread, i.e. a deadlock.

Fixed by avoiding to lock MDL_lock::m_rwlock with
MDL_map_partition::m_mutex locked. There is already infrastructure to
release the latter and acquire the former and guarantee that a reference
to an MDL_lock object is valid at the same time. It is implemented in
MDL_map_partition::move_from_hash_to_lock_mutex(), so the fix utilizes
it to remove the deadlock condition.
laurynas-biveinis added a commit that referenced this pull request Dec 8, 2015
vlad-lesin referenced this pull request in vlad-lesin/percona-server Feb 28, 2016
Problem:

The binary log group commit sync is failing when committing a group of
transactions into a non-transactional storage engine while other thread
is rotating the binary log.

Analysis:

The binary log group commit procedure (ordered_commit) acquires LOCK_log
during the #1 stage (flush). As it holds the LOCK_log, a binary log
rotation will have to wait until this flush stage to finish before
actually rotating the binary log.

For the percona#2 stage (sync), the binary log group commit only holds the
LOCK_log if sync_binlog=1. In this case, the rotation has to wait also
for the sync stage to finish.

When sync_binlog>1, the sync stage releases the LOCK_log (to let other
groups to enter the flush stage), holding only the LOCK_sync. In this
case, the rotation can acquire the LOCK_log in parallel with the sync
stage.

For commits into transactional storage engine, the binary log rotation
checks a counter of "flushed but not yet committed" transactions,
waiting until this counter to be zeroed before closing the current
binary log file.  As the commit of the transactions happen in the percona#3
stage of the binary log group commit, the sync of the binary log in
stage percona#2 always succeed.

For commits into non-transactional storage engine, the binary log
rotation is checking the "flushed but not yet committed" transactions
counter, but it is zero because it only counts transactions that
contains XIDs. So, the rotation is allowed to take place in parallel
with the percona#2 stage of the binary log group commit. When the sync is
called at the same time that the rotation has closed the old binary log
file but didn't open the new file yet, the sync is failing with the
following error: 'Can't sync file 'UNOPENED' to disk (Errcode: 9 - Bad
file descriptor)'.

Fix:

For non-transactional only workload, binary log group commit will keep
the LOCK_log when entering percona#2 stage (sync) if the current group is
supposed to be synced to the binary log file.
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
…down

Use ha_rocksdb::open() should check table->found_next_number_field,
not table->next_number_field (like InnoDB does)
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
…cona#1

Two testcases are empty because RocksDB-SE doesn't support the feature:
-autoincrement.test
-autoinc_secondary.test
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
This adds two features in RocksDB.
1. Supporting START TRANSACTION WITH CONSISTENT SNAPSHOT
2. Getting current binlog position in addition to percona#1.
With these features, mysqldump can take consistent logical backup.

The second feature is done by START TRANSACTION WITH
CONSISTENT ROCKSDB SNAPSHOT. This is Facebook's extension, and
it works like existing START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT.

This diff changed some existing codebase/behaviors.

- Original Facebook-MySQL always started InnoDB transaction
regardless of engine clause. For example, START TRANSACTION WITH
CONSISTENT MYISAM SNAPSHOT was accepted but it actually started
InnoDB transaction, not MyISAM. This patch does not allow
setting engine that does not support consistent snapshot.

mysql> start transaction with consistent myisam snapshot;
ERROR 1105 (HY000): Consistent Snapshot is not supported for this engine

Currently only InnoDB and RocksDB support consistent snapshot.
To check engines, I modified sql/sql_yacc.yy, trans_begin()
and ha_start_consistent_snapshot() to pass handlerton.

- Changed constant name from
  MYSQL_START_TRANS_OPT_WITH_CONS_INNODB_SNAPSHOT to
MYSQL_START_TRANS_OPT_WITH_CONS_ENGINE_SNAPSHOT, because it's no longer
InnoDB dependent.

- When not setting engine, START TRANSACTION WITH CONSISTENT SNAPSHOT
takes both InnoDB and RocksDB snapshots, and both InnoDB and RocksDB
participate in transaction. When executing COMMIT, both InnoDB and
RocksDB modifications are committed. Remember that XA is not supported yet,
so mixing engines is not recommended anyway.

- When setting engine, START TRANSACTION WITH CONSISTENT.. takes
snapshot for the specified engine only. But it starts both
InnoDB and RocksDB transactions.

Test Plan: mtr --suite=rocksdb,rocksdb_rpl, --repeat=3

Reviewers: hermanlee4, jonahcohen, jtolmer, tian.xia, maykov, spetrunia

Reviewed By: spetrunia

Subscribers: steaphan

Differential Revision: https://reviews.facebook.net/D32355
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
This adds two features in RocksDB.
1. Supporting START TRANSACTION WITH CONSISTENT SNAPSHOT
2. Getting current binlog position in addition to percona#1.
With these features, mysqldump can take consistent logical backup.

The second feature is done by START TRANSACTION WITH
CONSISTENT ROCKSDB SNAPSHOT. This is Facebook's extension, and
it works like existing START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT.

This diff changed some existing codebase/behaviors.

- Original Facebook-MySQL always started InnoDB transaction
regardless of engine clause. For example, START TRANSACTION WITH
CONSISTENT MYISAM SNAPSHOT was accepted but it actually started
InnoDB transaction, not MyISAM. This patch does not allow
setting engine that does not support consistent snapshot.

mysql> start transaction with consistent myisam snapshot;
ERROR 1105 (HY000): Consistent Snapshot is not supported for this engine

Currently only InnoDB and RocksDB support consistent snapshot.
To check engines, I modified sql/sql_yacc.yy, trans_begin()
and ha_start_consistent_snapshot() to pass handlerton.

- Changed constant name from
  MYSQL_START_TRANS_OPT_WITH_CONS_INNODB_SNAPSHOT to
MYSQL_START_TRANS_OPT_WITH_CONS_ENGINE_SNAPSHOT, because it's no longer
InnoDB dependent.

- When not setting engine, START TRANSACTION WITH CONSISTENT SNAPSHOT
takes both InnoDB and RocksDB snapshots, and both InnoDB and RocksDB
participate in transaction. When executing COMMIT, both InnoDB and
RocksDB modifications are committed. Remember that XA is not supported yet,
so mixing engines is not recommended anyway.

- When setting engine, START TRANSACTION WITH CONSISTENT.. takes
snapshot for the specified engine only. But it starts both
InnoDB and RocksDB transactions.

Test Plan: mtr --suite=rocksdb,rocksdb_rpl, --repeat=3

Reviewers: hermanlee4, jonahcohen, jtolmer, tian.xia, maykov, spetrunia

Reviewed By: spetrunia

Subscribers: steaphan

Differential Revision: https://reviews.facebook.net/D32355
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
This adds two features in RocksDB.
1. Supporting START TRANSACTION WITH CONSISTENT SNAPSHOT
2. Getting current binlog position in addition to percona#1.
With these features, mysqldump can take consistent logical backup.

The second feature is done by START TRANSACTION WITH
CONSISTENT ROCKSDB SNAPSHOT. This is Facebook's extension, and
it works like existing START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT.

This diff changed some existing codebase/behaviors.

- Original Facebook-MySQL always started InnoDB transaction
regardless of engine clause. For example, START TRANSACTION WITH
CONSISTENT MYISAM SNAPSHOT was accepted but it actually started
InnoDB transaction, not MyISAM. This patch does not allow
setting engine that does not support consistent snapshot.

mysql> start transaction with consistent myisam snapshot;
ERROR 1105 (HY000): Consistent Snapshot is not supported for this engine

Currently only InnoDB and RocksDB support consistent snapshot.
To check engines, I modified sql/sql_yacc.yy, trans_begin()
and ha_start_consistent_snapshot() to pass handlerton.

- Changed constant name from
  MYSQL_START_TRANS_OPT_WITH_CONS_INNODB_SNAPSHOT to
MYSQL_START_TRANS_OPT_WITH_CONS_ENGINE_SNAPSHOT, because it's no longer
InnoDB dependent.

- When not setting engine, START TRANSACTION WITH CONSISTENT SNAPSHOT
takes both InnoDB and RocksDB snapshots, and both InnoDB and RocksDB
participate in transaction. When executing COMMIT, both InnoDB and
RocksDB modifications are committed. Remember that XA is not supported yet,
so mixing engines is not recommended anyway.

- When setting engine, START TRANSACTION WITH CONSISTENT.. takes
snapshot for the specified engine only. But it starts both
InnoDB and RocksDB transactions.

Test Plan: mtr --suite=rocksdb,rocksdb_rpl, --repeat=3

Reviewers: hermanlee4, jonahcohen, jtolmer, tian.xia, maykov, spetrunia

Reviewed By: spetrunia

Subscribers: steaphan

Differential Revision: https://reviews.facebook.net/D32355
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
Don't update secondary indexes that do not have columns that
are included in table->write_map at query start.

Test Plan: mtr

Reviewers: maykov, yoshinorim, hermanlee4, jonahcohen

Reviewed By: jonahcohen

Differential Revision: https://reviews.facebook.net/D33471
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
Inside index_next_same() call, we should
1. first check whether the record matches the index
   lookup prefix,
2. then check pushed index condition.

If we try to check percona#2 without checking percona#1 first, we may walk
off the index lookup prefix and scan till the end of the index.

Test Plan: Run mtr

Reviewers: hermanlee4, maykov, jtolmer, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D38769
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
Inside index_next_same() call, we should
1. first check whether the record matches the index
   lookup prefix,
2. then check pushed index condition.

If we try to check percona#2 without checking percona#1 first, we may walk
off the index lookup prefix and scan till the end of the index.

Test Plan: Run mtr

Reviewers: hermanlee4, maykov, jtolmer, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D38769
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
MyRocks had two bugs when calculating index scan cost.
1. block_size was not considered. This made covering index scan cost
(both full index scan and range scan) much higher
2. ha_rocksdb::records_in_range() may have estimated more rows
than the estimated number of rows in the table. This was wrong,
and MySQL optimizer decided to use full index scan even though
range scan was more efficient.

This diff fixes percona#1 by setting stats.block_size at ha_rocksdb::open(),
and fixes percona#2 by reducing the number of estimated rows if it was
larger than stats.records.

Test Plan:
mtr, updating some affected test cases, and new test case
rocksdb_range2

Reviewers: hermanlee4, jkedgar, spetrunia

Reviewed By: spetrunia

Subscribers: MarkCallaghan, webscalesql-eng

Differential Revision: https://reviews.facebook.net/D55869
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
Summary:
MyRocks had two bugs when calculating index scan cost.
1. block_size was not considered. This made covering index scan cost
(both full index scan and range scan) much higher
2. ha_rocksdb::records_in_range() may have estimated more rows
than the estimated number of rows in the table. This was wrong,
and MySQL optimizer decided to use full index scan even though
range scan was more efficient.

This diff fixes percona#1 by setting stats.block_size at ha_rocksdb::open(),
and fixes percona#2 by reducing the number of estimated rows if it was
larger than stats.records.

Test Plan:
mtr, updating some affected test cases, and new test case
rocksdb_range2

Reviewers: hermanlee4, jkedgar, spetrunia

Reviewed By: spetrunia

Subscribers: MarkCallaghan, webscalesql-eng

Differential Revision: https://reviews.facebook.net/D55869
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 9, 2016
…down

Use ha_rocksdb::open() should check table->found_next_number_field,
not table->next_number_field (like InnoDB does)
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 9, 2016
…cona#1

Two testcases are empty because RocksDB-SE doesn't support the feature:
-autoincrement.test
-autoinc_secondary.test
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 9, 2016
dlenev pushed a commit that referenced this pull request Aug 30, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
dlenev pushed a commit that referenced this pull request Aug 30, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.

----------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests)

https://perconadev.atlassian.net/browse/PS-9218

mysql/mysql-server@44a77b5
inikep pushed a commit that referenced this pull request Sep 23, 2024
This worklog introduces dynamic offload of Queries to RAPID in following
ways:

When system variable rapid_use_dynamic_offload is 0/false , then we
fall back to normal cost threshold classifier, which also implies that
when use secondary engine is set to forced, eligible queries will go to
secondary engine, regardless of cost threshold or this classifier.

When rapid_use_dynamic_offload is 1/true, then we proceed with looking
for optimal execution engine for this queries, if secondary engine is
found more optimal, then query is offloaded, otherwise it is sent back
to mysql. This is handled in following scenarios:

1. Static Scenario: When there's no Change propagation or Queue on RAPID
side, this introduces decision tree which has > 85 % precision in
training which queries should be faster on mysql or which queries
should be faster on mysql, and accepts or rejects queries.  the decision
tree takes around 20-100 microseconds for fast queries, hence
minimal overhead, for bigger queries this introduces overhead of
upto maximum observed 700 microseconds, these end up with long execution
time, hence not a problem. For very fast queries, defined here by having
cost < 10 and of the form point select, dynamic offload is not applied,
since 100 % of these queries  (out of 16667 samples) are faster on
MySQL. Additionally, routing these "very fast queries" through dynamic
offload leads to performance regressions due to 3 phase optimisation.

2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog
 introduces dynamic feature normalization to factor into account
 extra catch up time RAPID needs, and factoring in that, attempts to
 verify if RAPID is still the best engine for execution. If queue is
 too long or CP is too long, this mechanism wants to progressively start
 shifting queries to mysql, moving gradually towards the heavier queries

The steps in this worklog with respect to query lifecycle in server with
secondary_engine = ON, are described below:

query
   |
Primary Tentatively optimisation -> mysql optimises for Innodb
   |
secondary_engine_pre_prepare_hook -> following Rapid function called:
   |  RapidCachePrimaryInfoAtPrimaryTentativelyStep
   |  If dynamic offload is enabled and query is not "very fast":
   |   This caches features from mysql plan in rapid_statement_context
   |   to be used for dynamic offload.
   |  If dynamic offload is disabled or the query is "very fast":
   |   This function invokes standary mysql cost threshold classifier,
   |   which decides if query needs further RAPID optimisation.
   |
   |
   |-> if returns False, then query proceeds to Innodb for execution
   |-> if returns true, step below is called
   |
 Secondary optimisation -> mysql optimises for RAPID
   |
prepare_secondary_engine -> following Rapid function is called:
   |   RapidPrepareEstimateQueryCosts
   |     In this function, Dynamic offload combines mysql plan features
   |      retrieved from rapid_statement_context
   |     and RAPID info such as rapid base table cardinality,
   |     dict encoding projection, varlen projection size, rapid queue
   |     size in to decide if query should be offloaded to RAPID.
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, step below is called
   |
optimize_secondary_engine -> following Rapid function is called
   |    RapidOptimize
   |     In this function, Dynamic offload retrieves info from
   |     rapid_statement_context and additionally looks at Change
   |     propagation lag to decide if query should be offloaded to rapid
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, then query goes to Rapid Execution.

Following new MYSQL ERR log messages are printed with this WL, when
dynamic offload is enabled, and query is not a "very fast query".

1. SelOffload allow decision 1 : as secondary not forced 1 and enable
 var value 1 and transactional enabled 1 and( big shape detected 0
  or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 ,
   no lh table: 1

   Message such as this shows if dynamic offload is used to classify
   this query or not. If not, why not, using each of the conditions.
   1 = pass, 0 = not pass.

2. myqid=65 Selective offload classifier #1#1#1
    f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <=
    68.3899040222168 : 0.028218, f_count_all_base_tables = 0 ,
    f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 :
    0.173916 are_all_ts_index_ref = true outcome=0

   Line such as this serialises what leg of decision tree decided
   outcome of this query 0 -> back to mysql 1 -> keep on rapid.
   each leg is uniquely searchable via identifier such as #1#1#1 here.

This worklog additionally introduces python scripts to run queries on
mysql client with multiple queries and multiple dmls at once, in
various modes such as simulator mode and standard benchmark modes.

By Default this WL is enabled, but before release it will be disabled.
This is tracked via BUG#36343189 #no-close.

Perf mode unittests will be enabled on jenkins after this wl.
Further cleanup will be done via BUG#36368437 #no-close.

Bugs tackled via this WL: 	BUG#35738194, Enh#34132523, Bug#36343208

Unrelated bugs fixed: BUG#35987975

Old gerrit review : 25567 (abandoned due to 1000 update limit reached)

Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
inikep pushed a commit that referenced this pull request Sep 23, 2024
… for connection xxx'.

The new iterator based explains are not impacted.

The issue here is a race condition. More than one thread is using the
query term iterator at the same time (whoch is neithe threas safe nor
reantrant), and part of its state is in the query terms being visited
which leads to interference/race conditions.

a) the explain thread

uses an iterator here:

   Sql_cmd_explain_other_thread::execute

is inspecting the Query_expression of the running query
calling master_query_expression()->find_blocks_query_term which uses
an iterator over the query terms in the query expression:

   for (auto qt : query_terms<>()) {
       if (qt->query_block() == qb) {
           return qt;
       }
   }

the above search fails to find qb due to the interference of the
thread b), see below, and then tries to access a nullpointer:

    * thread #36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11
  frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50
  frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21
  frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26
  frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32
  frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9
  frame #6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14
  frame #7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20
  frame #8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11
  frame #9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13
  frame #10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7
  frame #11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17
  frame #12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13
 * frame #13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11
  frame #14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29
  frame #15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19
  frame #16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7
  frame #17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18
  frame #18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13
  frame #19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3
  frame #20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99
  frame #21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15

b) the query thread being explained is itself performing LEX::cleanup
and as part of the iterates over the query terms, but still allows
EXPLAIN of the query plan since

   thd->query_plan.set_query_plan(SQLCOM_END, ...)

hasn't been called yet.

     20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613)
     21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861)
     22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286)
     30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799)
     31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648)
     32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303)
     33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135)
     34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464)
     57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304)
     58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053)
     65:frame: _pthread_start (in libsystem_pthread.dylib) + 99
     66:frame: thread_start (in libsystem_pthread.dylib) + 15

Solution:

This patch solves the issue by removing iterator state from
Query_term, making the query_term iterators thread safe. This solution
labels every child query_term with its index in its parent's
m_children vector.  The iterator can therefore easily compute the next
child to visit based on Query_term::m_sibling_idx.

A unit test case is added to check reentrancy.

One can also manually verify that we have no remaining race condition
by running two client connections files (with \. <file>) with a big
number of copies of the repro query in one connection and a big number
of EXPLAIN format=json FOR <connection>, e.g.

    EXPLAIN FORMAT=json FOR CONNECTION 8\G

in the other. The actual connection number would need to verified
in connection one, of course.

Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
inikep pushed a commit that referenced this pull request Sep 23, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 #2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 #3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 #4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 #5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 #6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
inikep pushed a commit that referenced this pull request Sep 23, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow #3: fetch ttl value from ttl column

The change is to merge #1 and #3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
inikep pushed a commit that referenced this pull request Sep 23, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    #13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    #13 trans_commit
    #14 Xid_log_event::do_commit
    #15 Xid_apply_log_event::do_apply_event_worker
    #16 Slave_worker::slave_worker_exec_event
    #17 slave_worker_exec_job_group
    #18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep added a commit that referenced this pull request Sep 23, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
inikep pushed a commit that referenced this pull request Sep 23, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.

----------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests)

https://perconadev.atlassian.net/browse/PS-9218

mysql/mysql-server@44a77b5
inikep pushed a commit that referenced this pull request Sep 25, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 #2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 #3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 #4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 #5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 #6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
inikep pushed a commit that referenced this pull request Sep 25, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow #3: fetch ttl value from ttl column

The change is to merge #1 and #3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
inikep added a commit that referenced this pull request Sep 25, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep pushed a commit that referenced this pull request Sep 25, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.
inikep pushed a commit that referenced this pull request Sep 25, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    #13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    #13 trans_commit
    #14 Xid_log_event::do_commit
    #15 Xid_apply_log_event::do_apply_event_worker
    #16 Slave_worker::slave_worker_exec_event
    #17 slave_worker_exec_job_group
    #18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep pushed a commit that referenced this pull request Oct 10, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 #2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 #3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 #4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 #5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 #6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
inikep pushed a commit that referenced this pull request Oct 10, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow #3: fetch ttl value from ttl column

The change is to merge #1 and #3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
inikep added a commit that referenced this pull request Oct 10, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep pushed a commit that referenced this pull request Oct 10, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.
inikep pushed a commit that referenced this pull request Oct 10, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    #13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    #13 trans_commit
    #14 Xid_log_event::do_commit
    #15 Xid_apply_log_event::do_apply_event_worker
    #16 Slave_worker::slave_worker_exec_event
    #17 slave_worker_exec_job_group
    #18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev pushed a commit that referenced this pull request Oct 28, 2024
… and .6node3rpl

Issue #1
 Problem:
   Test fail in 4node4rpl (1 node group).
 Solution:
   Skip test when there is only one NG.

Issue #2
  Problem:
    Test fail in 6node3rpl (2 node group) with timeout.
    Test idea is to restart, with nostart option, *ALL* nodes
    in same node group to check if QMGR handles it wrongly as
    "node group is missing".
    In the test only two nodes in same node group are restarted,
    it works for 2 replica setups but, for 4 replica, test
    hangs waiting cluster to enter a noStart state.
  Solution:
   Instead of restart exactly 2 nodes, restart ALL nodes in a
   given node group.

Change-Id: Iafb0511992a553723013e73593ea10540cd03661
dlenev pushed a commit that referenced this pull request Oct 28, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 #2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 #3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 #4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 #5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 #6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
dlenev pushed a commit that referenced this pull request Oct 28, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow #3: fetch ttl value from ttl column

The change is to merge #1 and #3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
dlenev pushed a commit that referenced this pull request Oct 28, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    #13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    #13 trans_commit
    #14 Xid_log_event::do_commit
    #15 Xid_apply_log_event::do_apply_event_worker
    #16 Slave_worker::slave_worker_exec_event
    #17 slave_worker_exec_job_group
    #18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev pushed a commit that referenced this pull request Oct 28, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
dlenev pushed a commit that referenced this pull request Oct 28, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.

----------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests)

https://perconadev.atlassian.net/browse/PS-9218

mysql/mysql-server@44a77b5
inikep pushed a commit that referenced this pull request Oct 30, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 #2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 #3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 #4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 #5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 #6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
inikep pushed a commit that referenced this pull request Oct 30, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow #3: fetch ttl value from ttl column

The change is to merge #1 and #3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
inikep added a commit that referenced this pull request Oct 30, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9369: Fix currently processed query comparison in audit_log

https://perconadev.atlassian.net/browse/PS-9369

The audit_log uses stack to keep track of table access operations being
performed in scope of one query. It compares last known table access query
string stored on top of this stack with actual query in audit event being
processed at the moment to decide if new record should be pushed to stack
or it is time to clean records from the stack.

Currently audit_log simply compares char* variables to decide if this is
the same query string. This approach doesn't work. As a result plugin looses
control of the stack size and it starts growing with the time consuming
memory. This issue is not noticable on short term server connections
as memory is freed once connection is closed. At the same time this
leads to extra memory consumption for long running server connections.

The following is done to fix the issue:
- Query is sent along with audit event as MYSQL_LEX_CSTRING structure.
  It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as
  sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised
  properly. Added string length check to make sure structure contains
  any valid string.
- Used strncmp to compare actual strings instead of comparing char*
  variables.
inikep pushed a commit that referenced this pull request Oct 30, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.
inikep pushed a commit that referenced this pull request Oct 30, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    #13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    #13 trans_commit
    #14 Xid_log_event::do_commit
    #15 Xid_apply_log_event::do_apply_event_worker
    #16 Slave_worker::slave_worker_exec_event
    #17 slave_worker_exec_job_group
    #18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant