Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ps 8.0.20 merge myrocks tokudb #2

Open
wants to merge 17 commits into
base: ps-8.0.20-merge
Choose a base branch
from

Conversation

george-lorch
Copy link

No description provided.

percona-ysorokin and others added 17 commits May 5, 2020 22:38
…ace encryption)

https://jira.percona.com/browse/PS-6789

Temporarily reverted PS-3822 "InnoDB system tablespace encryption"
https://jira.percona.com/browse/PS-3822
(commit 78b6114)
to make parallel doublewrite part of the upstream 8.0.20 merge easier.

Temporarily disabled the following MTR test cases:
- 'innodb.percona_parallel_dblwr_encrypt'
- 'innodb.percona_sys_tablespace_encrypt'
- 'innodb.percona_sys_tablespace_encrypt_dblwr'
- 'sys_vars.innodb_parallel_dblwr_encrypt_basic'
- 'sys_vars.innodb_sys_tablespace_encrypt_basic'
…b_doublewrite file when innodb_doublewrite is disabled)

https://jira.percona.com/browse/PS-6789

Temporarily reverted PS-3411 "LP #1570682: Parallel doublewrite buffer file created when skip-innodb_doublewrite is set"
https://jira.percona.com/browse/PS-3411
(commit 14318e4)
to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…must crash server on I/O error)

https://jira.percona.com/browse/PS-6789

Temporarily reverted PS-5678 "Parallel doublewrite must crash server on I/O error"
https://jira.percona.com/browse/PS-5678
(commit 0f810d7)
to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…rotation. ALPHA)

https://jira.percona.com/browse/PS-6789

Temporarily reverted 'buf0dblwr.cc' part of the PS-3829 "Innodb key rotation. ALPHA"
https://jira.percona.com/browse/PS-3829
(commit c7f44ee)
to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…d to set O_DIRECT on xb_doublewrite when running MTR test cases)

https://jira.percona.com/browse/PS-6789

Temporarily reverted PS-1068 "Fix bug 1669414 (Failed to set O_DIRECT on xb_doublewrite when running MTR test cases)"
https://jira.percona.com/browse/PS-1068
(commit 7f41824)
to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…lel doublewrite memory not freed with innodb_fast_shutdown=2)

https://jira.percona.com/browse/PS-6789

Temporarily reverted PS-1707 "LP #1578139: Parallel doublewrite memory not freed with innodb_fast_shutdown=2"
https://jira.percona.com/browse/PS-1707
(commit 8a53ed7)
to make parallel doublewrite part of the upstream 8.0.20 merge easier.
… implementation (Implement parallel doublewrite)

https://jira.percona.com/browse/PS-6789

Reverted 'parallel-doublewrite' blueprint implementation "Implement parallel doublewrite"
https://blueprints.launchpad.net/percona-server/+spec/parallel-doublewrite
(commit 4596aaa)
to make parallel doublewrite part of the upstream 8.0.20 merge easier.

Temporarily disabled the following MTR test cases:
- 'sys_vars.innodb_parallel_doublewrite_path_basic'
- 'innodb.percona_doublewrite'
https://jira.percona.com/browse/PS-6789

***
Updated man pages from MySQL Server 8.0.20 source tarball.

***
Updated 'scripts/fill_help_tables.sql' from MySQL Server 8.0.20 source
tarball.
https://jira.percona.com/browse/PS-6789

***
Reverted our fix for PS-6094
"Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception"
(https://jira.percona.com/browse/PS-6094)
(commit 31b5c73)
in favor of the upstream fix for the Bug #30561920 / #97682
"Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception"
(https://bugs.mysql.com/bug.php?id=97682)
(commit mysql/mysql-server@72c6171).

***
Reverted our fix for PS-3630
"LP #1660255: Test innodb.innodb_mysql is unstable"
(https://jira.percona.com/browse/PS-3630)
(commit e0b5050)
in favor of the upstream fix for the Bug #30810572
"FIX INNODB-MYSQL TEST"
(commit mysql/mysql-server@2692669).

***
Reverted our 8.0.17 merge postfix
"PS-5363 (Merge MySQL 8.0.17): fixed regexps in the rpl.rpl_perfschema_threads_processlist_status MTR test case"
(https://jira.percona.com/browse/PS-5363)
(commit 8d7dd4a)
affecting 'rpl.rpl_perfschema_threads_processlist_status' MTR test case
in favor of the changes made by upstream in WL#3549
"Binlog Compression"
(commit mysql/mysql-server@1e5ae34).

***
Reverted our 8.0.18 merge postfix
"PS-5674: gen_lex_token generator reworked"
(https://jira.percona.com/browse/PS-5674)
(commit 214212a)
in favor of the changes made by upstream Bug #30765691
"FREE TOKEN SLOTS ARE EXHAUSTED IN GEN_LEX_TOKEN.CC"
(commit mysql/mysql-server@17ca03f).
'SYM_PERCONA()' macro preserved and made a synonym for upstream's 'SYM()'.
Percona Server 5.7-specific tokens
- CHANGED_PAGE_BITMAPS_SYM
- CLIENT_STATS_SYM
- CLUSTERING_SYM
- COMPRESSION_DICTIONARY_SYM
- INDEX_STATS_SYM
- TABLE_STATS_SYM
- THREAD_STATS_SYM
- USER_STATS_SYM
- ENCRYPTION_KEY_ID_SYM
explicitly assigned values starting from 1300. The same values were assigned
to them implicitly in Percona Server 8.0.19.
Percona Server 8.0-specific tokens
- EFFECTIVE_SYM
- SEQUENCE_TABLE_SYM
explicitly assigned values starting from 1350. This group has different values
than in Percona Server 8.0.19.

***
Similarly to other 'innodb.log_encrypt_<n>' MTR test cases 'innodb.log_encrypt_7'
coming from upstream 8.0.20 cloned into two 'innodb.log_encrypt_7_mk' and
'innodb.log_encrypt_7_rk'.

***
Similarly to other 'innodb.table_encrypt_<n>' MTR test cases 'innodb.table_encrypt_6'
coming from upstream 8.0.20 cloned into three 'innodb.table_encrypt_6',
'keyring_vault.table_encrypt_6' and 'keyring_vault.table_encrypt_6_directory'.

***
VERSION raised to "8.0.20-11".
univ.i version raised to "11".
https://jira.percona.com/browse/PS-6789

In the fix for Bug #30508721
"MTR DOESN'T KEEP TRACK OF THE STATE OF INNODB MONITORS"
(commit mysql/mysql-server@abd33c2)
Oracle extended MTR 'check-testcase' procedure with additional comparison of
data from InnoDB metrics state. They also introduced
'mysql-test/include/innodb_monitor_restore.inc' MTR include file that is
supposed to reset InnoDB monitors to their default state.

'mysql-test/include/innodb_monitor_restore.inc' extended with enabling
Percona-specific monitors, those that are enabled (defined with
'MONITOR_DEFAULT_ON' flag) by default.

Similarly to what was done in the upstream patch
  "SET GLOBAL innodb_monitor_enable=default;"
  "SET GLOBAL innodb_monitor_disable=default;"
  "SET GLOBAL innodb_monitor_reset_all=default;"
statement sequences were substituted with
'--source include/innodb_monitor_restore.inc' all over the test code.

As the result, fixed the following MTR test cases:
- 'innodb.innodb_idle_flush_pct'
- 'innodb.lock_contention_big'
- 'innodb.monitor'
- 'innodb.percona_ahi_partitions'
- 'innodb.percona_changed_page_bmp_flush_5446'
- 'innodb.transportable_tbsp-debug'
- 'innodb_zip.transportable_tbsp_debug_zip'
- 'sys_vars.innodb_monitor_disable_basic'
- 'sys_vars.innodb_monitor_enable_basic'
- 'sys_vars.innodb_monitor_reset_all_basic'
- 'sys_vars.innodb_monitor_reset_basic'
- 'sys_vars.innodb_purge_run_now_basic'
- 'sys_vars.innodb_purge_stop_now_basic'
…ated MTR test cases

https://jira.percona.com/browse/PS-6789

The following MTR test cases re-recorded because of the 'filesort' improvements
introduced in the fix for Oracle's Bug #30776132
"MAKE FILESORT KEYS CONSISTENT BETWEEN FIELDS AND ITEMS"
(commit mysql/mysql-server@6d587a6)
- 'main.pool_of_threads'
- 'main.pool_of_threads_high_prio_tickets'.

The following MTR test cases re-recorded because of the changed execution plan
(more hash joins instead of nested blok loops) introduced in these improvements
Bug #30528604
"DELETE THE PRE-ITERATOR EXECUTOR"
(commit mysql/mysql-server@ef166f8),
Bug #30473261
"CONVERT THE INDEX SUBQUERY ENGINES INTO USING THE ITERATOR EXECUTOR"
(commit mysql/mysql-server@cb4116e)
(commit mysql/mysql-server@629b549)
(commit mysql/mysql-server@5a41fba)
(commit mysql/mysql-server@31bd903)
(commit mysql/mysql-server@75bbe1b)
(commit mysql/mysql-server@6226c1a)
(commit mysql/mysql-server@0b45e96)
(commit mysql/mysql-server@8e45d7e)
(commit mysql/mysql-server@7493ae4)
(commit mysql/mysql-server@a5f60bf)
(commit mysql/mysql-server@609b86e),
Bug #30912972
"ASSERTION `KEYLEN == M_START_KEY.LENGTH' FAILED"
(commit mysql/mysql-server@b28bea5)
- 'audit_log.audit_log_filter_db'
- 'main.pool_of_threads'
- 'main.pool_of_threads_high_prio_tickets'
- 'main.percona_expand_fast_index_creation'
- 'main.percona_sequence_table'
https://jira.percona.com/browse/PS-6789

Re-recorded 'main.bug74778' MTR test case because of the new 'SHOW_ROUTINE'
privilege implemented by Oracle in WL #9049
"Add a dynamic privilege for stored routine backup"
(https://dev.mysql.com/worklog/task/?id=9049)
(commit mysql/mysql-server@3e41e44)
… MTR test case

https://jira.percona.com/browse/PS-6789

Re-recorded 'main.backup_locks_mysqldump' MTR test case because of the new default
'mysqldump' network timeout introduced in the fix for Oracle Bug #30755992 / #98203
"mysql dump sufficiently long network timeout too short"
(https://bugs.mysql.com/bug.php?id=98203)
(commit mysql/mysql-server@1f90fad)
https://jira.percona.com/browse/PS-6789

Re-recorded 'main.bug88797' MTR test case because of the new deprecation
warning introduced in the implementation of WL #13325
"Deprecate VALUES syntax in INSERT ... ON DUPLICATE KEY UPDATE"
(https://dev.mysql.com/worklog/task/?id=13325)
(commit mysql/mysql-server@6f3b9df)
- Changed use of (Field*)::real_maybe_null to (Field*)::is_nullable due to
  changes in upstream at c5f8a62
- Added macro trickery to undefine and redefine ZSTD macro required for rocksdb
  around including sql/sql_class.h in order to prevent collision with new
  binlog compression type ZSTD.
- chmod +x to mysql-test/mysql-test-run.pl
- Re-recorded test(s) due to binog offet changes:
  - rocksdb.read_only_tx
- Re-recorded test(s) due to new EXPLAIN result:
  - rocksdb.type_enum_indexes
- Re-recorded test(s) due to new "hash join" EXPLAIN result:
  - rocksdb.index_merge_rocksdb2
- Re-recorded test(s) due to new deprecation warning
  "Warning  1287  'VALUES function' is deprecated ...":
  - rocksdb.rocksdb
  - rocksdb.insert_with_keys
- Re-recorded test(s) due to new column 'Require_table_primary_key_check'
  appearing in 'SHOW CREATE TABLE mysql.slave_relay_log_info;':
  - rocksdb_rppl.rpl_rocksdb_stm_mixed_crash_safe
  - rocksdb_rpl.rpl_rocksdb_row_crash_safe
- Re-recorded test(s) due to changes to common test in 'extra/rpl_tests':
  - rocksdb_rpl.rpl_rocksdb_stm_mixed_crash_safe
  - rocksdb_rpl.rpl_rocksdb_row_crash_safe
- Fixed linking issue with PerconaFT by adding new minchassis dependency to
  tokudbdump and advancing git submodule commmit pointer.
- Changed use of (Field*)::real_maybe_null to (Field*)::is_nullable due to
  changes in upstream at c5f8a62
- Re-recorded test(s) due to new hash join in explain result:
  - tokudb.type.bit
  - tokudb.type_time
  - tokudb.cluster_2968-0
  - tokudb.cluster_2968-1
  - tokudb.cluster_2968-2
  - tokudb.cluster_2968-3
- Re-recorded test(s) due to new deprecation warning
  "Warning  1287  '@@max_length_for_sort_data' is deprecated ...":
  - tokudb.type_bit
  - tokudb.type_bit_innnodb
- Re-recorded test(s) due to new deprecation warning
  "Warning  1287  'VALUES function' is deprecated ...":
  - tokudb.fast_upsert_values
- Re-recorded test(s) with either ORDER BY or --sorted_result due to
  non-deterministic query order results:
  - tokudb.type_bit
  - tokudb.type_year
- Re-recorded test(s) due to new behavior of debug function WEIGHT_STRING:
  - tokudb.type_temporal_fractional
- Re-recorded test(s) due to new column 'Require_table_primary_key_check'
  appearing in 'SHOW CREATE TABLE mysql.slave_relay_log_info;':
  - tokudb_rpl.rpl_tokudb_row_crash_safe
  - tokudb_rpl.rpl_tokudb_stm_mixed_crash_safe
- Re-recorded test(s) due to changes to common test in 'extra/rpl_tests':
  - tokudb_rpl.rpl_tokudb_row_crash_safe
  - tokudb_rpl.rpl_tokudb_stm_mixed_crash_safe
- Re-recorded test(s) due to change in SHOW CREATE TABLE that no longer shows
  INT display width:
  - tokudb_parts.partition_max_parts_hash_tokudb
  - tokudb_parts.partition_max_parts_key_tokudb
- Re-recorded test(s) due to error code changes:
  - tokudb_backup.tokudb_backup_exclude
percona-ysorokin pushed a commit that referenced this pull request Sep 17, 2020
…o: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded

Problem
=======
Running mtr with ASAN build on Gentoo tests fails since the path to
libtirpc is not /lib64/libtirpc.so which is the path mtr uses for
preloading the library.

Further more the libasan path in Gentoo may contain also underscores and
minus which mtr safe_process does not recognize.

Fails on Gentoo since /lib64/libtirpc.so do not exist
+ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

Fails on Gentoo since /usr/lib64/libtirpc.so is a GNU LD script
+ERROR: ld.so: object '/usr/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (invalid ELF header): ignored.

Need to preload /lib64/libtirpc.so.3 on gentoo.

When compiling with GNU C++ libasan path also include minus and underscores:

$ less mysql-test/lib/My/SafeProcess/ldd_asan_test_result
        linux-vdso.so.1 (0x00007ffeba962000)
        libasan.so.4 => /usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4 (0x00007f3c2e827000)

Tests that been affected in different ways are for example:

$ ./mtr group_replication.gr_clone_integration_clone_not_installed
[100%] group_replication.gr_clone_integration_clone_not_installed w3  [ fail ]
...
ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
mysqltest: At line 21: Query 'START GROUP_REPLICATION' failed.
ERROR 2013 (HY000): Lost connection to MySQL server during query
...
ASAN:DEADLYSIGNAL
=================================================================
==11970==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc
0x7f0e5cecfb8c bp 0x7f0e340f1650 sp 0x7f0e340f0dc8 T44)
==11970==The signal is caused by a READ memory access.
==11970==Hint: address points to the zero page.
    #0 0x7f0e5cecfb8b in xdr_uint32_t (/lib64/libc.so.6+0x13cb8b)
    #1 0x7f0e5fbe6d43
(/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4+0x87d43)
    #2 0x7f0e3c675e59 in xdr_node_no
plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:88
    #3 0x7f0e3c67744d in xdr_pax_msg_1_6
plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:852
...

$ ./mtr ndb.ndb_config
[100%] ndb.ndb_config                             [ fail ]
...
 --- /.../src/mysql-test/suite/ndb/r/ndb_config.result 2019-06-25
21:19:08.308997942 +0300
 +++ /.../bld/mysql-test/var/log/ndb_config.reject     2019-06-26
11:58:11.718512944 +0300
@@ -30,16 +30,22 @@
 == 16 == bug44689
 192.168.0.1 192.168.0.2 192.168.0.3 192.168.0.4 192.168.0.1 192.168.0.1
 == 17 == bug49400
+ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
+ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be
preloaded (cannot open shared object file): ignored.
  ERROR    -- at line 25: TCP connection is a duplicate of the existing TCP
link from line 14
  ERROR    -- at line 25: Could not store section of configuration file.

$ ./mtr ndb.ndb_basic
[100%] ndb.ndb_basic                             [ pass ]  34706
ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.
ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded
(cannot open shared object file): ignored.

Solution
========
In safe_process use same trick for libtirpc as for libasan to determine
path to library for pre loading.

Also allow underscores and minus in paths.

In addition also add some memory leak suppressions for perl.

Change-Id: Ia02e354a20cf8b279eb2573f3f8c2c39776343dc
(cherry picked from commit e88706d)
percona-ysorokin pushed a commit that referenced this pull request Feb 15, 2021
              HENCE ABORTING THE SERVER.

  Description:
  ------------
  When ‘gtid_purged’ is set to its max value, server stops
  after executing the next transaction with an error,
  'ERROR 1598 (HY000): Binary logging not possible.
    Message: An error occurred during flush stage of the
    commit.‘binlog_error_action’ is set to ‘ABORT_SERVER’.
    Hence aborting the server.'
 

  Analysis:
  ---------
  Reason for server is being stopped is due to max-out of
  GTID's integer component(GNO) while assigning new
  automatic GTID.
  - When gtid_purgedis set to
    CONCAT(@@GLOBAL.server_uuid,':1-9223372036854775805'),
    server updates gtid_executed with the same value.
  - During the second transaction, when assigning new
    automatic GTID, GTID(GNO) hits the
    max_limit(9223372036854775807).
  - Server returns error from get_automatic_gno().
    Then sets binlog_error_action=ABORT_SERVER.
  - Server then prints out the error message and triggers
    abort signal.
  - It is documented that the server shuts down immediately
    if the binary log cannot be written:
    'https://dev.mysql.com/doc/refman/8.0/en/
    replication-options-binary-log.html
    #sysvar_binlog_error_action'
  Hence, Server shutdown is intentional, and default
  behavior.

  Error message text "An error occurred during flush stage
  of the commit" is imprecise and a bit internal. It would
  be better to mention that the limit for generated GTIDs
  has been reached, and suggest how to fix the problem.
  There is also no warning message when system getting
  close to GTID max limit.
 

  Fix:
  ----
  1. Give a better error message when exhausting the range
     and acting according to
     binlog_error_action=ABORT_SERVER.
  2. Set GTID Threshold as 99% of the max GTID limit.
     Generate a warning message in the error log when,
      - auto generated GTID is above threshold.
      - setting gtid above threshold using SET gtid_purged.
  Point #2 is only implemented for mysql-8.0 onwards.

  RB#25130
percona-ysorokin pushed a commit that referenced this pull request Feb 18, 2021
To call a service implementation one needs to:
1. query the registry to get a reference to the service needed
2. call the service via the reference
3. call the registry to release the reference

While #2 is very fast (just a function pointer call) #1 and #3 can be
expensive since they'd need to interact with the registry's global
structure in a read/write fashion.

Hence if the above sequence is to be repeated in a quick succession it'd
be beneficial to do steps #1 and #3 just once and aggregate as many #2
steps in a single sequence.

This will usually mean to cache the service reference received in #1 and
delay 3 for as much as possible.

But since there's an active reference held to the service implementation
until 3 is taken special handling is needed to make sure that:

The references are released at regular intervals so changes in the
registry
can become effective. There is a way to mark a service implementation
as "inactive" ("dying") so that until all of the active references to it
are released no new ones are possible.

All of the above is part of the current audit API machinery, but needs
to be isolated into a separate service suite and made generally
available to
all services.

This is what this worklog aims to implement.

RB#24806
percona-ysorokin pushed a commit that referenced this pull request Feb 18, 2021
A heap-buffer-overflow in libmyqlxclient when
- auth-method is MYSQL41
- the "server" sends a nonce that is shortert than 20 bytes.

==2466857==ERROR: AddressSanitizer: heap-buffer-overflow on address
#0 0x4a7b76 in memcpy (routertest_component_routing_splicer+0x4a7b76)
#1 0x7fd3a1d89052 in SHA1_Update (/libcrypto.so.1.1+0x1c2052)
#2 0x63409c in compute_mysql41_hash_multi(unsigned char*, char const*,
   unsigned int, char const*, unsigned int)
   ...

RB: 25305
Reviewed-by: Lukasz Kotula <lukasz.kotula@oracle.com>
percona-ysorokin pushed a commit that referenced this pull request Feb 18, 2021
TABLESPACE STATE DOES NOT CHANGE THE SPACE TO EMPTY

After the commit for Bug#31991688, it was found that an idle system may
not ever get around to truncating an undo tablespace when it is SET INACTIVE.
Actually, it takes about 128 seconds before the undo tablespace is finally
truncated.

There are three main tasks for the function trx_purge().
1) Process the undo logs and apply changes to the data files.
   (May be multiple threads)
2) Clean up the history list by freeing old undo logs and rollback
   segments.
3) Truncate undo tablespaces that have grown too big or are SET INACTIVE
   explicitly.

Bug#31991688 made sure that steps 2 & 3 are not done too often.
Concentrating this effort keeps the purge lag from growing too large.
By default, trx_purge() does step#1 128 times before attempting steps
#2 & #3 which are called 'truncate' steps.  This is set by the setting
innodb_purge_rseg_truncate_frequency.

On an idle system, trx_purge() is called once per second if it has nothing
to do in step 1.  After 128 seconds, it will finally do steps 2 (truncating
the undo logs and rollback segments which reduces the history list to zero)
and step 3 (truncating any undo tablespaces that need it).

The function that the purge coordinator thread uses to make these repeated
calls to trx_purge() is called srv_do_purge(). When trx_purge() returns
having done nothing, srv_do_purge() returns to srv_purge_coordinator_thread()
which will put the purge thread to sleep.  It is woke up again once per
second by the master thread in srv_master_do_idle_tasks() if not sooner
by any of several of other threads and activities.

This is how an idle system can wait 128 seconds before the truncate steps
are done and an undo tablespace that was SET INACTIVE can finally become
'empty'.

The solution in this patch is to modify srv_do_purge() so that if trx_purge()
did nothing and there is an undo space that was explicitly set to inactive,
it will immediately call trx_purge again with do_truncate=true so that steps
#2 and #3 will be done.

This does not affect the effort by Bug#31991688 to keep the purge lag from
growing too big on sysbench UPDATE NO_KEY. With this change, the purge lag
has to be zero and there must be a pending explicit undo space truncate
before this extra call to trx_purge is done.

Approved by Sunny in RB#25311
percona-ysorokin pushed a commit that referenced this pull request Feb 18, 2021
…TH VS 2019 [#2] [noclose]

storage\ndb\src\kernel\blocks\backup\Backup.cpp(2807,37): warning C4805: '==': unsafe mix of type 'Uint32' and type 'bool' in operation

Change-Id: I0582c4e40bcfc69cdf3288ed84ad3ac62c9e4b80
percona-ysorokin pushed a commit that referenced this pull request Sep 8, 2021
…ING TABLESPACES

The occurrence of this message is a minor issue fixed by change #1 below.
But during testing, I found that if mysqld is restarted while remote and
local tablespaces are discarded, especially if the tablespaces to be imported
are already in place at startup, then many things can go wrong.  There were
various asserts that occurred depending on timing. During all the testing
and debugging, the following changes were made.

1. Prevent the stats thread from complaining about a missing tablespace.
   See dict_stats_update().
2. Prevent a discarded tablespace from being opened at startup, even if the
   table to be imported is already in place. See Validate_files::check().
3. dd_tablespace_get_state_enum() was refactored to separate the normal
   way to do it in v8.0, which is to use "state" key in
   dd::tablespaces::se_private_date, from the non-standard way which is
   to check undo::spaces or look for the old key value pair of
   "discarded=true". This allowed the new call to this routine by the
   change in fix #2 above.
4. Change thd_tablespace_op() in sql/sql_thd_api.cc such that instead of
   returning 1 if the DDL requires an implicit tablespace, it returns the
   DDL operation flag.  This can still be interpreted as a boolean, but it
   can also be used to determine if the op is an IMPORT or a DISCARD.
5. With that change, the annoying message that a space is discarded can be
   avoided during an import when it needs to be discarded.
6. Several test cases were corrected now that the useless "is discarded"
   warning is no longer being written.
7. Two places where dd_tablespace_set_state() was called to set the state
   to either "discard" or "normal" were consolidated to a new version of
   dd_tablespace_set_state(thd, dd_space_id, space_name, dd_state).
8. This new version of dd_tablespace_set_state() was used in
   dd_commit_inplace_alter_table() to make sure that in all three places
   the dd is changed to identify a discarded tablesapace, it is identified
   in dd:Tablespace::se_private_data as well as dd:Table::se_private_data
   or dd::Partition::se_private_data.  The reason it is necessary to
   record this in dd::Tablespace is that during startup, boot_tablespaces()
   and Validate::files::check() are only traversing dd::Tablespace.
   And that is where fix #2 is done!
9. One of the asserts that occurred was during IMPORT TABLESPACE after a
   restart that found a discarded 5.7 tablespace in the v8.0 discarded
   location. This assert occurred in Fil_shard::get_file_size() just after
   ER_IB_MSG_272.  The 5.7 file did not have the SDI flag, but the v8.0
   space that was discarded did have that flag.  So the flags did not match.
   That crash was fixed by setting the fil_space_t::flags to what it is in
   the tablespace header page.  A descriptive comment was added.
10. There was a section in fil_ibd_open() that checked
   `if (space != nullptr) {` and if true, it would close and free stuff
   then immediately crash.  I think I remember many years ago adding that
   assert because I did not think it actually occurred. Well it did occur
   during my testing before I added fix #2 above.  This made fil_ibd_open()
   assume that the file was NOT already open.
   So fil_ibd_open() is now changed to allow for that possibility by adding
   `if (space != nullptr) {return DB_SUCCESS}` further down.
   Since fil_ibd_open() can be called with a `validate` boolean, the routine
   now attempts to do all the validation whether or not the tablespace is
   already open.

The following are non-functional changes;
- Many code documentation lines were added or improved.
- dict_sys_t::s_space_id renamed to dict_sys_t::s_dict_space_id in order
  to clarify better which space_id it referred to.
- For the same reason, change s_dd_space_id to s_dd_dict_space_id.
- Replaced `table->flags2 & DICT_TF2_DISCARDED`
  with `dict_table_is_discarded(table)` in dict0load.cc
- A redundant call to ibuf_delete_for_discarded_space(space_id) was deleted
  from fil_discard_tablespace() because it is also called higher up in
  the call stack in row_import_for_mysql().
- Deleted the declaration to `row_import_update_discarded_flag()` since
  the definition no longer exists.  It was deleted when we switched from
  `discarded=true` to 'state=discarded' in dd::Tablespace::se_private_data
  early in v8.0 developement.

Approved by Mateusz in RB#26077
percona-ysorokin pushed a commit that referenced this pull request Mar 11, 2022
This error happens for queries such as:

SELECT ( SELECT 1 FROM t1 ) AS a,
  ( SELECT a FROM ( SELECT x FROM t1 ORDER BY a ) AS d1 );

Query_block::prepare() for query block #4 (corresponding to the 4th
SELECT in the query above) calls setup_order() which again calls
find_order_in_list(). That function replaces an Item_ident for 'a' in
Query_block.order_list with an Item_ref pointing to query block #2.
Then Query_block::merge_derived() merges query block #4 into query
block #3. The Item_ref mentioned above is then moved to the order_list
of query block #3.

In the next step, find_order_in_list() is called for query block #3.
At this point, 'a' in the select list has been resolved to another
Item_ref, also pointing to query block #2. find_order_in_list()
detects that the Item_ref in the order_list is equivalent to the
Item_ref in the select list, and therefore decides to replace the
former with the latter. Then find_order_in_list() calls
Item::clean_up_after_removal() recursively (via Item::walk()) for the
order_list Item_ref (since that is no longer needed).

When calling clean_up_after_removal(), no
Cleanup_after_removal_context object is passed. This is the actual
error, as there should be a context pointing to query block #3 that
ensures that clean_up_after_removal() only purge Item_subselect.unit
if both of the following conditions hold:

1) The Item_subselect should not be in any of the Item trees in the
   select list of query block #3.

2) Item_subselect.unit should be a descendant of query block #3.

These conditions ensure that we only purge Item_subselect.unit if we
are sure that it is not needed elsewhere. But without the right
context, query block #2 gets purged even if it is used in the select
lists of query blocks #1 and #3.

The fix is to pass a context (for query block #3) to clean_up_after_removal().
Both of the above conditions then become false, and Item_subselect.unit is
not purged. As an additional shortcut, find_order_in_list() will not call
clean_up_after_removal() if real_item() of the order item and the select
list item are identical.

In addition, this commit changes clean_up_after_removal() so that it
requires the context to be non-null, to prevent similar errors. It
also simplifies Item_sum::clean_up_after_removal() by removing window
functions unconditionally (and adds a corresponding test case).

Change-Id: I449be15d369dba97b23900d1a9742e9f6bad4355
percona-ysorokin pushed a commit that referenced this pull request Mar 11, 2022
#2]

If the schema distribution client detects timeout, but before freeing
the schema object if the coordinator receives the schema event, then
coordinator instead of returning the function, will process the stale
schema event.

The coordinator does not know if the schema distribution time out is
detected by the client. It starts processing the schema event whenever
the schema object is valid. So, introduce a new variable to indicate
the state of the schema object and change the state when client detect
the schema distribution timeout or when the schema event is received by
the coordinator. So that both coordinator and client can be in sync.

Change-Id: Ic0149aa9a1ae787c7799a675f2cd085f0ac0c4bb
percona-ysorokin pushed a commit that referenced this pull request May 18, 2022
*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    percona#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.
percona-ysorokin pushed a commit that referenced this pull request Jul 5, 2022
…ILER WARNINGS

Remove some stringop-truncation warning using cstrbuf.

Change-Id: I3ab43f6dd8c8b0b784d919211b041ac3ad4fad40
percona-ysorokin pushed a commit that referenced this pull request Sep 5, 2022
-- Patch #1: Persist secondary load information --

Problem:
We need a way of knowing which tables were loaded to HeatWave after
MySQL restarts due to a crash or a planned shutdown.

Solution:
Add a new "secondary_load" flag to the `options` column of mysql.tables.
This flag is toggled after a successful secondary load or unload. The
information about this flag is also reflected in
INFORMATION_SCHEMA.TABLES.CREATE_OPTIONS.

-- Patch #2 --

The second patch in this worklog triggers the table reload from InnoDB
after MySQL restart.

The recovery framework recognizes that the system restarted by checking
whether tables are present in the Global State. If there are no tables
present, the framework will access the Data Dictionary and find which
tables were loaded before the restart.

This patch introduces the "Data Dictionary Worker" - a MySQL service
recovery worker whose task is to query the INFORMATION_SCHEMA.TABLES
table from a separate thread and find all tables whose secondary_load
flag is set to 1.

All tables that were found in the Data Dictionary will be appended to
the list of tables that have to be reloaded by the framework from
InnoDB.

If an error occurs during restart recovery we will not mark the recovery
as failed. This is done because the types of failures that can occur
when the tables are reloaded after a restart are less critical compared
to previously existing recovery situations. Additionally, this code will
soon have to be adapted for the next worklog in this area so we are
proceeding with the simplest solution that makes sense.

A Global Context variable m_globalStateEmpty is added which indicates
whether the Global State should be recovered from an external source.

-- Patch #3 --

This patch adds the "rapid_reload_on_restart" system variable. This
variable is used to control whether tables should be reloaded after a
restart of mysqld or the HeatWave plugin. This variable is persistable
(i.e., SET PERSIST RAPID_RELOAD_ON_RESTART = TRUE/FALSE).

The default value of this variable is set to false.

The variable can be modified in OFF, IDLE, and SUSPENDED states.

-- Patch #4 --

This patch refactors the recovery code by removing all recovery-related
code from ha_rpd.cc and moving it to separate files:

  - ha_rpd_session_factory.h/cc:
  These files contain the MySQLAdminSessionFactory class, which is used
to create admin sessions in separate threads that can be used to issue
SQL queries.

  - ha_rpd_recovery.h/cc:
  These files contain the MySQLServiceRecoveryWorker,
MySQLServiceRecoveryJob and ObjectStoreRecoveryJob classes which were
previously defined in ha_rpd.cc. This file also contains a function that
creates the RecoveryWorkerFactory object. This object is passed to the
constructor of the Recovery Framework and is used to communicate with
the other section of the code located in rpdrecoveryfwk.h/cc.

This patch also renames rpdrecvryfwk to rpdrecoveryfwk for better
readability.

The include relationship between the files is shown on the following
diagram:

        rpdrecoveryfwk.h◄──────────────rpdrecoveryfwk.cc
            ▲    ▲
            │    │
            │    │
            │    └──────────────────────────┐
            │                               │
        ha_rpd_recovery.h◄─────────────ha_rpd_recovery.cc──┐
            ▲                               │           │
            │                               │           │
            │                               │           │
            │                               ▼           │
        ha_rpd.cc───────────────────────►ha_rpd.h       │
                                            ▲           │
                                            │           │
            ┌───────────────────────────────┘           │
            │                                           ▼
    ha_rpd_session_factory.cc──────►ha_rpd_session_factory.h

Other changes:
  - In agreement with Control Plane, the external Global State is now
  invalidated during recovery framework startup if:
    1) Recovery framework recognizes that it should load the Global
    State from an external source AND,
    2) rapid_reload_on_restart is set to OFF.

  - Addressed review comments for Patch #3, rapid_reload_on_restart is
  now also settable while plugin is ON.

  - Provide a single entry point for processing external Global State
  before starting the recovery framework loop.

  - Change when the Data Dictionary is read. Now we will no longer wait
  for the HeatWave nodes to connect before querying the Data Dictionary.
  We will query it when the recovery framework starts, before accepting
  any actions in the recovery loop.

  - Change the reload flow by inserting fake global state entries for
  tables that need to be reloaded instead of manually adding them to a
  list of tables scheduled for reload. This method will be used for the
  next phase where we will recover from Object Storage so both recovery
  methods will now follow the same flow.

  - Update secondary_load_dd_flag added in Patch #1.

  - Increase timeout in wait_for_server_bootup to 300s to account for
  long MySQL version upgrades.

  - Add reload_on_restart and reload_on_restart_dbg tests to the rapid
  suite.

  - Add PLUGIN_VAR_PERSIST_AS_READ_ONLY flag to "rapid_net_orma_port"
  and "rapid_reload_on_restart" definitions, enabling their
  initialization from persisted values along with "rapid_bootstrap" when
  it is persisted as ON.

  - Fix numerous clang-tidy warnings in recovery code.

  - Prevent suspended_basic and secondary_load_dd_flag tests to run on
  ASAN builds due to an existing issue when reinstalling the RAPID
  plugin.

-- Bug#33752387 --

Problem:
A shutdown of MySQL causes a crash in queries fired by DD worker.

Solution:
Prevent MySQL from killing DD worker's queries by instantiating a
DD_kill_immunizer before the queries are fired.

-- Patch #5 --

Problem:
A table can be loaded before the DD Worker queries the Data Dictionary.
This means that table will be wrongly processed as part of the external
global state.

Solution:
If the table is present in the current in-memory global state we will
not consider it as part of the external global state and we will not
process it by the recovery framework.

-- Bug#34197659 --

Problem:
If a table reload after restart causes OOM the cluster will go into
RECOVERYFAILED state.

Solution:
Recognize when the tables are being reloaded after restart and do not
move the cluster into RECOVERYFAILED. In that case only the current
reload will fail and the reload for other tables will be attempted.

Change-Id: Ic0c2a763bc338ea1ae6a7121ff3d55b456271bf0
percona-ysorokin pushed a commit that referenced this pull request Dec 6, 2022
Add various json fields in the new JSON format. Have json field
"access_type" with value "index" for many scans that use some or the
other forms of index. Plans with "access_type=index" have additional
fields such as index_access_type, covering, lookup_condition,
index_name, etc. The value of index_access_type will further tell us
what specfic type of index scan it is; like Index range scan, Index
lookup scan, etc.

Join plan nodes have access_type=join. Such plans will, again, have
additional json fields that tell us whether it's a hash join, merge
join, and whether it is an antijoin, semijoin, etc.

If a plan node is a root of a subquery subtree, it additionally
has the field 'subquery' with value "true". Such plan nodes will also
have fields like "location=projection", "dependent=true" corresponding
to the TREE format synopsis :
Select #2 (subquery in projection; dependent)

If a json field is absent, its value should be interpreted as either
0, empty, or false, depending on its type.

A side effect of this commit is that for AccessPath::REF, the phrase
"iterate backwards" is changed to "reverse".

New test file added to test format=JSON with hypergraph optimizer.

Change-Id: I816af3ec546c893d4fc0c77298ef17d49cff7427
percona-ysorokin pushed a commit that referenced this pull request Dec 6, 2022
Enh#34350907 - [Nvidia] Allow DDLs when tables are loaded to HeatWave
Bug#34433145 - WL#15129: mysqld crash Assertion `column_count == static_cast<int64_t>(cp_table-
Bug#34446287 - WL#15129: mysqld crash at rapid::data::RapidNetChunkCtx::consolidateEncodingsDic
Bug#34520634 - MYSQLD CRASH : Sql_cmd_secondary_load_unload::mysql_secondary_load_or_unload
Bug#34520630 - Failed Condition: "table_id != InvalidTableId"

Currently, DDL statements such as ALTER TABLE*, RENAME TABLE, and
TRUNCATE TABLE are not allowed if a table has a secondary engine
defined. The statements fail with the following error: "DDLs on a table
with a secondary engine defined are not allowed."

This worklog lifts this restriction for tables whose secondary engine is
RAPID.

A secondary engine hook is called in the beginning (pre-hook) and in the
end (post-hook) of a DDL statement execution. If the DDL statement
succeeds, the post-hook will direct the recovery framework to reload the
table in order to reflect that change in HeatWave.

Currently all DDL statements that were previously disallowed will
trigger a reload. This can be improved in the future by checking whether
the DDL operation has an impact on HeatWave or not. However detecting
all edge-cases in this behavior is not straightforward so this
improvement has been left as a future improvement.

Additionally, if a DDL modifies the table schema in a way that makes it
incompatible with HeatWave (e.g., dropping a primary key column) the
reload will fail silently. There is no easy way to recognize whether the
table schema will become incompatible with HeatWave in a pre-hook.

List of changes:
  1) [MySQL] Add new HTON_SECONDARY_ENGINE_SUPPORTS_DDL flag to indicate
whether a secondary engine supports DDLs.
  2) [MySQL] Add RAII hooks for RENAME TABLE and TRUNCATE TABLE, modeled
on the ALTER TABLE hook.
  3) Define HeatWave hooks for ALTER TABLE, RENAME TABLE, and TRUNCATE
TABLE statements.
  4) If a table reload is necessary, trigger it by marking the table as
stale (WL#14914).
  4) Move all change propagation & DDL hooks to ha_rpd_hooks.cc.
  5) Adjust existing tests to support table reload upon DDL execution.
  6) Extract code related to RapidOpSyncCtx in ha_rpd_sync_ctx.cc, and
the PluginState enum to ha_rpd_fsm.h.

* Note: ALTER TABLE statements related to secondary engine setting and
loading were allowed before:
    - ALTER TABLE <TABLE> SECONDARY_UNLOAD,
    - ALTER TABLE SECONDARY_ENGINE = NULL.

-- Bug#34433145 --
-- Bug#34446287 --

--Problem #1--
Crashes in Change Propagation when the CP thread tries to apply DMLs of
tables with new schema to the not-yet-reloaded table in HeatWave.

--Solution #1--
Remove table from Change Propagation before marking it as stale and
revert the original change from rpd_binlog_parser.cc where we were
checking if the table was stale before continuing with binlog parsing.
The original change is no longer necessary since the table is removed
from CP before being marked as stale.

--Problem #2--
In case of a failed reload, tables are not removed from Global State.

--Solution #2--
Keep track of whether the table was reloaded because it was marked as
STALE. In that case we do not want the Recovery Framework to retry the
reload and therefore we can remove the table from the Global State.

-- Bug#34520634 --

Problem:
Allowing the change of primary engine for tables with a defined
secondary engine hits an assertion in mysql_secondary_load_or_unload().

Example:
    CREATE TABLE t1 (col1 INT PRIMARY KEY) SECONDARY_ENGINE = RAPID;
    ALTER TABLE t1 ENGINE = BLACKHOLE;
    ALTER TABLE t1 SECONDARY_LOAD; <- assertion hit here

Solution:
Disallow changing the primary engine for tables with a defined secondary
engine.

-- Bug#34520630 --

Problem:
A debug assert is being hit in rapid_gs_is_table_reloading_from_stale
because the table was dropped in the meantime.

Solution:
Instead of asserting, just return false if table is not present in the
Global State.

This patch also changes rapid_gs_is_table_reloading_from_stale to a more
specific check (inlined the logic in load_table()). This check now also
covers the case when a table was dropped/unloaded before the Recovery
Framework marked it as INRECOVERY. In that case, if the reload fails we
should not have an entry for that table in the Global State.

The patch also adjusts dict_types MTR test, where we no longer expect
for tables to be in UNAVAIL state after a failed reload. Additionally,
recovery2_ddls.test is adjusted to not try to offload queries running on
Performance Schema.

Change-Id: I6ee390b1f418120925f5359d5e9365f0a6a415ee
percona-ysorokin pushed a commit that referenced this pull request Oct 26, 2023
https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.
percona-ysorokin pushed a commit that referenced this pull request Dec 4, 2023
Post push fix.

NdbSocket::copy method duplicated the mutex pointer, leaving two objects
referring to one mutex. Typically the source will destroy its mutex,
making it unusable for target object.

Remove copy method.

Change-Id: I2cc36128c343c7bab08d96651b12946ecd87210c
percona-ysorokin pushed a commit that referenced this pull request Jan 23, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    percona#8  ha_commit_trans
    percona#9  trans_commit_implicit
    percona#10 mysql_create_like_table
    percona#11 Sql_cmd_create_table::execute
    percona#12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    percona#8  Commit_order_manager::wait_and_finish
    percona#9  ha_commit_low
    percona#10 trx_coordinator::commit_in_engines
    percona#11 MYSQL_BIN_LOG::commit
    percona#12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Part of WL#15135 Certificate Architecture

This patch introduces class TlsKeyManager, containing all TLS
authentication and key management logic. A single instance of
TlsKeyManager in each node owns the local NodeCertificate, an
SSL_CTX, and a table holding the serial numbers and expiration
dates of all peer certificates.

A large set of TLS-related error codes is introduced in the file
TlsKeyErrors.h.

The unit test testTlsKeyManager-t tests TLS authentication over
client/server connections on localhost.

Change-Id: I2ee42efc268219639691f73a1d7638a336844d88
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Implement ndb$certificates base table and certificates view.
Update results for tests ndbinfo and ndbinfo plans.

Change-Id: Iab1b89f5eb82ac1b3e0c049dd55eb7d07394070a
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Move client_authenticate() out of SocketClient::connect() (which
returns void) into a separate SocketClient::authenticate() method
which can return a value.

In SocketAuthenticator, change the signature of the authentication
routines to return an int (which can represent a result code) rather
than a bool. Results less than AuthOk represent failure, and results
greater than or equal to AuthOk represent success.

Remove the username and password variables from SocketAuthSimple;
make them constant strings in the implementation.

There are no functional changes.

Change-Id: I4c25e99f1b9b692db39213dfa63352da8993a8fb
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
This changes TransporterRegistry::connect_ndb_mgmd() to return
NdbSocket rather than ndb_socket_t.

It extends the StartTls test in testMgmd to test upgrading the
TLS MGM protocol socket to a transporter.

Change-Id: Ic3b9ccf39ec78ed25705a4bbbdc5ac2953a35611
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Post push fix.

NdbSocket::copy method duplicated the mutex pointer, leaving two objects
referring to one mutex. Typically the source will destroy its mutex,
making it unusable for target object.

Fix by use the transfer method instead.

Change-Id: I199c04b870049498463903f6358f79a38649f543
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
If the argument to a window function contains a subquery, the access path
of that subquery would be printed twice when doing 'EXPLAIN FORMAT=TREE'.
When using the Hypergraph optimizer, the subquery path was not printed at
all, whether using FORMAT=TREE or FORMAT=JSON.

This commit fixes this by ensuring that we ignore duplicate paths,
and (for Hypergraph) by traversing the structures needed to find the
relevant Item_subselect objects.

Change-Id: I2abedcf690294f98ce169b74e53f042f46c47a45
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Post-push fix: Cherry-picking the fix onto mysql-trunk introduced an
unintended duplication of a code block, causing a shadowing-warning
when building with g++. This commit corrects that.

Change-Id: I1b279818ca0d30e32fc8dabb76c647120b531e8f
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Problem
================================

Group Replication ASAN run failing without any symptom of a
leak, but with shutdown issues:

worker[6] Shutdown report from
/dev/shm/mtr-3771884/var-gr-debug/6/log/mysqld.1.err after tests:
 group_replication.gr_flush_logs
group_replication.gr_delayed_initialization_thread_handler_error
group_replication.gr_sbr_verifications
group_replication.gr_server_uuid_matches_group_name_bootstrap
group_replication.gr_stop_async_on_stop_gr
group_replication.gr_certifier_message_same_member
group_replication.gr_ssl_mode_verify_identity_error_xcom

Analysis and Fix
================================

It ended up being a leak on gr_ssl_mode_verify_identity_error_xcom test:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f1709fbe1c7 in operator new(unsigned long)
      ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7f16ea0df799 in xcom_tcp_server_startup(Xcom_network_provider*)
      (/export/home/tmp/BUG35594709/mysql-trunk/BIN-ASAN/plugin_output_directory
        /group_replication.so+0x65d799)
    #2 0x7f170751e2b2  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2)

This happens because we delegated incoming connections
cleanup to the external consumer in incoming_connection_task.
Since it calls incoming_connection() from
Network_provider_manager, in case of a concurrent stop,
a connection could be left orphan in the shared atomic
due to the lack of an Active Provider, thus creating a
memory leak.

The solution is to make this cleanup on
Network_provider_manager, on both stop_provider() and in
stop_all_providers() methods, thus ensuring that no
incoming connection leaks.

Change-Id: I2367c37608ad075dee63785e9f908af5e81374ca
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Post push fix.

In test program testTlsKeyManager-t a struct sockaddr pointer was passed
to inet_ntop instead of struct in_addr for AF_INET and struct in6_addr
for AF_INET6.

That caused wrong addresses to be printed on error:

  not ok 26 - Client cert for test hostname is OK
   >>> Test of address 2.0.0.0 for msdn.microsoft.com returned error authorization failure: bad hostname
  not ok 27 - Client cert for test hostname is OK
   >>> Test of address a00::2620:1ec:46:0 for msdn.microsoft.com returned error authorization failure: bad hostname
  not ok 28 - Client cert for test hostname is OK
   >>> Test of address a00::2620:1ec:bdf:0 for msdn.microsoft.com returned error authorization failure: bad hostname

Should be 13.107.x.53 or 2620:1ec:x::53.

Changed to use ndb_sockaddr and Ndb_inet_ntop instead.

Change-Id: Iae4bebca26462f9b65c3232e9768c574e767b380
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Move client_authenticate() out of SocketClient::connect() (which
returns void) into a separate SocketClient::authenticate() method
which can return a value.

In SocketAuthenticator, change the signature of the authentication
routines to return an int (which can represent a result code) rather
than a bool. Results less than AuthOk represent failure, and results
greater than or equal to AuthOk represent success.

Remove the username and password variables from SocketAuthSimple;
make them constant strings in the implementation.

There are no functional changes.

Change-Id: I4c25e99f1b9b692db39213dfa63352da8993a8fb
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
This changes TransporterRegistry::connect_ndb_mgmd() to return
NdbSocket rather than ndb_socket_t.

Back-ported from mysql-trunk.

Change-Id: Ic3b9ccf39ec78ed25705a4bbbdc5ac2953a35611
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
Post-push fix.

ASan reported memory leaks from some EXPLAIN tests, such as
main.explain_tree.

The reason was that the Json_dom objects that were discarded to avoid
describing a subquery twice, were not properly destroyed.

The EXPLAIN code uses unique_ptr to make sure the Json_dom objects are
destroyed, but there are windows in which the objects only exist as
unmanaged raw pointers. This patch closes the window which caused this
memory leak by changing ExplainChild::obj from a raw pointer to a
unique_ptr, so that it gets destroyed even if it doesn't make it into
the final tree that describes the full plan.

Change-Id: I0f0885da867e8a34335ff11f3ae9da883a878ba4
percona-ysorokin pushed a commit that referenced this pull request Feb 16, 2024
BUG#35949017 Schema dist setup lockup
Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#2]
Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#1]
Bug#32550019 Missing check for ndb_schema_result leads to schema dist timeout

Change-Id: I4a32197992bf8b6899892f21587580788f828f34
percona-ysorokin pushed a commit that referenced this pull request Mar 4, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    percona#8  ha_commit_trans
    percona#9  trans_commit_implicit
    percona#10 mysql_create_like_table
    percona#11 Sql_cmd_create_table::execute
    percona#12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    percona#8  Commit_order_manager::wait_and_finish
    percona#9  ha_commit_low
    percona#10 trx_coordinator::commit_in_engines
    percona#11 MYSQL_BIN_LOG::commit
    percona#12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin pushed a commit that referenced this pull request Mar 4, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Merge remote-tracking branch 'venki/PS-9018-8.0-gca' into HEAD

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    percona#8  ha_commit_trans
    percona#9  trans_commit_implicit
    percona#10 mysql_create_like_table
    percona#11 Sql_cmd_create_table::execute
    percona#12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    percona#8  Commit_order_manager::wait_and_finish
    percona#9  ha_commit_low
    percona#10 trx_coordinator::commit_in_engines
    percona#11 MYSQL_BIN_LOG::commit
    percona#12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin pushed a commit that referenced this pull request Apr 17, 2024
cache [#2]

This is second patch, solving the problem of ineffiecent cache
invalidation when invalidating a table which is known to be invalid but
unknown if it is in the cache or not.

Problem:
Currently the only way to invalidate a table in the NdbApi dictionary
cache is to open the table and then mark it as invalid. In case the
table does not exists in the cache, it will still have to be opened and
thus fetched fom NDB.

This means that in order to get the latest table definition it has to be
fetched two times, although the table definition does not already exist
in the cache. This is inefficient.

Analysis:
In order to avoid the double roundtrip there need to be a function which
marks the table as invalid only if it exists in the cache.

Fix:
Implement a NdbApi function that invalidates table by name if it exists
in the cache.
Replace the old pattern of opening table in order to invalidate it with
the new function.

The old pattern is still a valid use case for invalidating a table after
having worked with it.

Change-Id: I20f275f1fed76d991330348bea4ae72548366467
percona-ysorokin pushed a commit that referenced this pull request Jun 24, 2024
…nt on Windows and posix [#2]

The posix version of NdbProcess::start_process assumed the arguments
where quoted using " and \ in a way that resembles POSIX sh quoting, and
unquoted spaces were treated as argument separators splitting the
argument to several.

But the Windows version of NdbProcess::start_process did not treat
options in the same way. And the Windows C runtime (CRT) parse arguments
different from POSIX sh. Note that if program do not use CRT when it may
treat the command line in its own way and the quoting done for CRT will
mess up the command line.

On Windows NdbProcess:start_process should only be used for CRT
compatible programs on Windows with respect to argument quoting on
command line, or one should make sure given arguments will not trigger
unwanted quoting. This may be relevant for ndb_sign_keys and
--CA-tool=<batch-file>.

Instead this patch change the intention of start_process to pass
arguments without modification from caller to the called C programs
argument vector in its main entry function.

In posix path that is easy, just pass the incoming C strings to execvp.

On Windows one need to quote for Windows CRT when composing the command
line. Note that the command part of command line have different quoting
than the following arguments have.

Change-Id: I763530c634d3ea460b24e6e01061bbb5f3321ad4
percona-ysorokin pushed a commit that referenced this pull request Jun 24, 2024
Problem:
Starting ´ndb_mgmd --bind-address´ may potentially cause abnormal
program termination in MgmtSrvr destructor when ndb_mgmd restart itself.

  Core was generated by `ndb_mgmd --defa'.
  Program terminated with signal SIGABRT,   Aborted.
  #0  0x00007f8ce4066b8f in raise () from /lib64/libc.so.6
  #1  0x00007f8ce4039ea5 in abort () from /lib64/libc.so.6
  #2  0x00007f8ce40a7d97 in __libc_message () from /lib64/libc.so.6
  #3  0x00007f8ce40af08c in malloc_printerr () from /lib64/libc.so.6
  #4  0x00007f8ce40b132d in _int_free () from /lib64/libc.so.6
  #5  0x00000000006e9ffe in MgmtSrvr::~MgmtSrvr (this=0x28de4b0) at
mysql/8.0/storage/ndb/src/mgmsrv/MgmtSrvr.cpp:
890
  #6  0x00000000006ea09e in MgmtSrvr::~MgmtSrvr (this=0x2) at mysql/8.0/
storage/ndb/src/mgmsrv/MgmtSrvr.cpp:849
  #7  0x0000000000700d94 in mgmd_run () at
mysql/8.0/storage/ndb/src/mgmsrv/main.cpp:260
  percona#8  0x0000000000700775 in mgmd_main (argc=<optimized out>,
argv=0x28041d0) at mysql/8.0/storage/ndb/src/
mgmsrv/main.cpp:479

Analysis:
While starting up, the ndb_mgmd will allocate memory for bind_address in
order to potentially rewrite the parameter. When ndb_mgmd restart itself
the memory will be released and dangling pointer causing double free.

Fix:
Drop support for bind_address=[::], it is not documented anywhere, is
not useful and doesn't work.
This means the need to rewrite bind_address is gone and bind_address
argument need neither alloc or free.

Change-Id: I7797109b9d8391394587188d64d4b1f398887e94
percona-ysorokin pushed a commit that referenced this pull request Jun 28, 2024
https://perconadev.atlassian.net/browse/PS-9222

Problem
=======
When writing to the redo log, an issue of column order change not
being recorded with INSTANT DDL was fixed by checking if the fields
are also reordered, then adding the columns into the list.
However when calculating the size of the buffer this fix doesn't take
account the extra fields that may be logged, and causing the assertion
on the buffer size failed eventually.

Solution
========
To calculate the buffer size correctly, we move the logic of finding
reordered fiedls before buffer size calculation, then count the number
of fields with the same logic when deciding if a field needs to be logged.
percona-ysorokin pushed a commit that referenced this pull request Jul 26, 2024
percona-ysorokin pushed a commit that referenced this pull request Nov 6, 2024
…EXCEPT SELECT 4)

A work-around is to set the optimizer flag to not use hash map
de-duplication for INTERSECT, EXCEPT, like so:

SET optimizer_switch="hash_set_operations=off";

With hash_set_operations enabled, however, we get too may result rows.
For the IN predicate, the set operation is computed repeatedly, with
filters pushed down to set set operation operands:

-> Filter: <in_optimizer>(c.pk,<exists>(select #2))  (cost=2.25 rows=20)
    -> Covering index scan on c using idx_c_col_datetime_key  (cost=2.25 rows=20)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)  (cost=2.61..2.61 rows=1)
            -> Table scan on <except temporary>  (cost=2.61..2.61 rows=1)
                -> Except materialize with deduplication  (cost=0.1..0.1 rows=1)
                    -> Filter: (<cache>(c.pk) = <ref_null_helper>(2))  (cost=0..0 rows=1)
                        -> Rows fetched before execution  (cost=0..0 rows=1)
                    -> Filter: (<cache>(c.pk) = <ref_null_helper>(4))  (cost=0..0 rows=1)
                        -> Rows fetched before execution  (cost=0..0 rows=1)

Only the row with pk==2 should pass the filters under the except node, and that's
what happens. However, on repeated execution, the hash map used to implement
the Except materialize is not re-initialized to being empty.

The patch adds reinitialization of the hash map for such cases.

Change-Id: Idf2e36f9085e36748900017a0aad420e4e476f78
percona-ysorokin pushed a commit that referenced this pull request Nov 6, 2024
…rong result

This error happens both with hash_set_operations on or off. If we look
    at the explain, we can see why it happens:

    -> Filter: <in_optimizer>(c.pk,<exists>(select #2))
        -> Covering index scan on c using idx_c_col_datetime_key
        -> Select #2 (subquery in condition; dependent)
            -> Limit: 1 row(s)  <--------------------- OK optimization
                -> Table scan on <except temporary>
                    -> Except all materialize
                        -> Limit: 1 row(s)  <--------------------------- problem
                            -> Table scan on <union temporary>
                                -> Union all materialize
                                    -> Filter: (<cache>(c.pk) = <ref_null_helper>(2))
                                        -> Rows fetched before execution
                                    -> Filter: (<cache>(c.pk) = <ref_null_helper>(2))
                                        -> Rows fetched before execution
                        -> Filter: (<cache>(c.pk) = <ref_null_helper>(2))
                            -> Rows fetched before execution

There is a limit node on top of the left EXCEPT operand right over the
UNION ALL node which shouldn't be there. It used to be a clever
optimization when MySQL only had UNION set operations, but it clearly
is wrong for EXCEPT ALL. For EXCEPT DISTINCT, UNION and INTERSECT, it
is fine, though.

The solution is to skip pushed down limits in the presence of EXCEPT
ALL inside subqueries, unless it is the top query block in the
subquery's query_expression, i.e. we retain the top-most limit 1 above.

Change-Id: Idf784cbfbe8efbaca03ad17c8c42d73ab7acaa1a
percona-ysorokin pushed a commit that referenced this pull request Nov 6, 2024
… for connection xxx'.

The new iterator based explains are not impacted.

The issue here is a race condition. More than one thread is using the
query term iterator at the same time (whoch is neithe threas safe nor
reantrant), and part of its state is in the query terms being visited
which leads to interference/race conditions.

a) the explain thread

uses an iterator here:

   Sql_cmd_explain_other_thread::execute

is inspecting the Query_expression of the running query
calling master_query_expression()->find_blocks_query_term which uses
an iterator over the query terms in the query expression:

   for (auto qt : query_terms<>()) {
       if (qt->query_block() == qb) {
           return qt;
       }
   }

the above search fails to find qb due to the interference of the
thread b), see below, and then tries to access a nullpointer:

    * thread percona#36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11
  frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50
  frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21
  frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26
  frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32
  frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9
  frame #6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14
  frame #7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20
  frame percona#8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11
  frame percona#9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13
  frame percona#10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7
  frame percona#11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17
  frame percona#12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13
 * frame percona#13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11
  frame percona#14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29
  frame percona#15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19
  frame percona#16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7
  frame percona#17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18
  frame percona#18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13
  frame percona#19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3
  frame percona#20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99
  frame percona#21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15

b) the query thread being explained is itself performing LEX::cleanup
and as part of the iterates over the query terms, but still allows
EXPLAIN of the query plan since

   thd->query_plan.set_query_plan(SQLCOM_END, ...)

hasn't been called yet.

     20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613)
     21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861)
     22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286)
     30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799)
     31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648)
     32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303)
     33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135)
     34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464)
     57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304)
     58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053)
     65:frame: _pthread_start (in libsystem_pthread.dylib) + 99
     66:frame: thread_start (in libsystem_pthread.dylib) + 15

Solution:

This patch solves the issue by removing iterator state from
Query_term, making the query_term iterators thread safe. This solution
labels every child query_term with its index in its parent's
m_children vector.  The iterator can therefore easily compute the next
child to visit based on Query_term::m_sibling_idx.

A unit test case is added to check reentrancy.

One can also manually verify that we have no remaining race condition
by running two client connections files (with \. <file>) with a big
number of copies of the repro query in one connection and a big number
of EXPLAIN format=json FOR <connection>, e.g.

    EXPLAIN FORMAT=json FOR CONNECTION 8\G

in the other. The actual connection number would need to verified
in connection one, of course.

Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
percona-ysorokin pushed a commit that referenced this pull request Nov 6, 2024
This is a combination of 5 commits.

This is the 1st commit message:

WL#15746: TLS Enhancements for HeatWave-AutoML & Dask Comm. Upgrade

Problem:
--------
- HeatWave-AutoML communication was unauthenticated, unauthorized,
  and unencrypted.
- Dask communication utilized TCP, not aligning with FedRamp
  guidelines.

Solution:
---------
- Introduced TLS and mTLS in HeatWave-AutoML's plugin and driver for
  authentication, authorization, and encryption.
- Applied TLS to Dask to ensure authentication, encryption, and
  authorization.

Dask Authorization (OCID-based):
--------------------------------
1. For each DBsystem:
    - MySQL node sends OCIDs of authorized nodes to the head driver
      via:
        a. rapid_net_nodes
        b. rapid_net_allowed_ocids (older API, mainly for MTR tests)
    - Scenarios:
        a. All OCIDs provided: Dask authorizes.
        b. Any OCID absent: ML call fails with message.
2. During Dask worker registration to the Dask scheduler, a script is
    dispatched to the Dask worker for execution, retrieving the worker's
    OCID for authorization purposes.
    - If the OCID isn't approved, the connection is denied, terminating
      the worker and causing the ML query to fail.
3. For every Dask worker (both as listener and connector), an OCID-
    based authorization is performed post SSL/TLS connection handshake.
    The process compares the OCID from the peer's certificate against
    the allowed_ocids received from the HeatWave-AutoML MySQL plugin.

HWAML Plugin Changes:
---------------------
- Sourced certificate data and SSL setup from disk, incorporating
  SSL/TLS for HWAML.
- Reused "keystore" variable to specify disk location for
  certificate retrieval.
- Certificates and keys expected in PKCS12 format.
- Introduced "have_ml_encryption" variable (default=0).
    > Acts as a switch to explicitly deactivate HWAML network
      encryption, akin to "disable_net_encryption" affecting
      network encryption for HeatWave. Set to 1 to enable.
- Introduced a customized verifier function for verify_callback to
  be set in SSL_CTX_set_verify and used in the handshake process
  of SSL/TLS. The customized verifier function will perform
  instance id (OCID) based authorization on the plugin side during
  standard SSL/TLS handshake process.
- CRL (Certificate Revocation List) checks are also conducted if CRL
  Distribution Points are present and accessible in the provided
  certificate.

HWAML Driver Changes & OCID-based Authorization:
------------------------------------------------
- Introduced "enable_encryption" (default=0).
    > Set to 1 to enable encryption.
- When receiving a new connection request and encryption is on, the
  driver performs OCID-based self-checking, comparing OCID retrieved
  from its own instance principal with the OCID in the
  provided certificate on disk.
- The driver compares OCID from "mysql_compute_id" and extracted OCID
  from mTLS certificate during connection.
- Introduced "cert_dir" argument for certificate directory
  specification.
- Expected files: cert_chain.pem, certificate.pem, private_key.pem.
    > OCID should be in the userID (UID) or CN field of the
      certificate.pem subject.
- CRL (Certificate Revocation List) checks are also conducted post
  handshake, if CRL Distribution Points are present and accessible in
  the provided certificate, alongside OCID authorization.

Encryption Behavior:
--------------------
- If encryption is deactivated on both plugin and driver side, HWAML
  will work without encryption as it was before this commit.

Enabling Encryption:
--------------------
- By default, "have_ml_encryption" and "enable_encryption" are set to 0
    > Encryption is disabled by default.
- For the HWAML plugin:
    > "have_ml_encryption" set to 1 (default is 0).
    > Specify the .pfx file's path using the "keystore".
- For the HWAML Driver:
    > "enable_encryption" set to 1 (default is 0)
    > Specify "mysql_instance_id" and "cert_dir".

Testing:
--------
- MTR has been modified for the encryption setup.
    > Runs with encryption if "OCI_INSTANCE_ID" is set to a valid
      value.
- On OCI (when "OLRAPID_KEYSTORE" is not set):
    > Certificates and keys are generated; PEMs for driver and PKCS12
      for plugin.
- On AWS (when "OLRAPID_KEYSTORE" is set as the path to PKCS12
  keystore files):
    > PEM files are extracted from the provided PKCS12 and used for
      the driver. The plugin uses the provided PKCS12 keystore file.

Change-Id: I553ca135241e03484db6debbe186e6d34d582bf4

This is the commit message #2:

WL#15746 - Adding ML encryption support to BM

Enabling ML encryption on Baumeister:
- Certificates are generated on MySQLd during initialization
- Needed certicates for workers are packaged and sent to worker nodes
- Workers use packaged files to generate their certificates
- Arguments are added to driver.py invoke
- Keystore path is added to mysql config

Change-Id: I11a5cc5926488ff4fbf91bb6c10a091358db7dc9

This is the commit message #3:

WL#15746: Enhanced CRL Daemon Checker

Issue
=====
The previous design assumed a plain HTTPS link for the CRL distribution
point, accessible to all. This assumption no longer holds, as public
accessibility for CRL distribution contradicts OCI guidelines. Now, the
CRL distribution point in certificates provided by the control plane is
expected to be protected by OCI Instance Principal Authentication.
However, using this authentication method introduces a delay of several
seconds, which is impractical for HeatWave-AutoML.

Solution
========
The CRL fetching code now uses OCI Instance Principal Authentication.
To mitigate performance issues, the CRL checking process has been
redesigned. Instead of enforcing CRL checks per connection in MySQL
Plugin and HeatWave-AutoML Driver communications, a daemon thread in
HeatWave-AutoML Driver, Dask scheduler, and Dask Worker now periodically
fetches and verifies the CRL against all active connections. This
separation minimizes performance impacts. Consequently, MySQL Plugin's
CRL checks have been removed, as checks in the Driver, Scheduler, and
Worker sufficiently cover all cluster nodes.

Changes
=======
- Implemented CRL checker as a daemon thread in Driver, Scheduler, and
  Worker.
- Each connection/socket has an associated CRL checker.
- CRL checks occur periodically at set intervals.
- Skips CRL check if the CRL is temporarily unavailable.
- Failing a CRL check results in the associated connection/socket being
  closed. On the Driver, a stop event is triggered (akin to CTRL-C).

Change-Id: Id998cfe9e15d9236291b0ae420d65c2197837966

This is the commit message #4:

WL#15746: Fix Dask workers being shutdown without releasing address

Issue
=====
Dask workers getting shutting but not releasing the address used
properly sometimes.

Solution
========
Reverted some changes in heatwave_cluster.py in dask worker shutdown
function. Hopefully this will fix the address issue

Change-Id: I5a6749b5a25b0ccb73ba7369e545bc010da1b84f

This is the commit message #5:

WL#15746: Implement Dask Worker Join Timeout for Head Node

Issue:
======
In the cluster_shutdown method, the join operation on the head node's
worker process lacked a timeout. This led to potential indefinite
waiting and eventual hanging of the head node.

Solution:
=========
A timeout has been introduced for the worker process join on the head
node. Unlike non-head nodes, which rely on worker join to complete Dask
tasks and cannot have a timeout, the head node can safely implement
this. Now, if the worker process on the head node fails to shut down
post-join, indicating a timeout, it will be manually terminated. This
ensures proper release of associated resources and prevents hanging of
the head node.

Additional Change:
==================
Added Cert Rotation Guard for DASK clusters. This feature initiates on
the first plugin-driver connection when the DASK cluster is off,
recording the certificate's expiry date. During driver idle times,
it checks the current cert's expiry against this date. If it detects a
change, indicating a certificate rotation, it shuts down the DASK
cluster. The cluster restarts on the next connection request, ensuring
the use of the latest certificate.

Change-Id: Ie63a2e2b7664e05e1622d8bd6503663e13fa73cb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants