forked from percona/percona-server
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable system tablespace encryption again in 8.0.20 #1
Open
satya-bodapati
wants to merge
1
commit into
percona-ysorokin:ps-8.0.20-merge
Choose a base branch
from
satya-bodapati:ps-8.0.20-merge
base: ps-8.0.20-merge
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Enable system tablespace encryption again in 8.0.20 #1
satya-bodapati
wants to merge
1
commit into
percona-ysorokin:ps-8.0.20-merge
from
satya-bodapati:ps-8.0.20-merge
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
percona-ysorokin
force-pushed
the
ps-8.0.20-merge
branch
from
June 2, 2020 14:29
eedf3a9
to
c0138f0
Compare
satya-bodapati
force-pushed
the
ps-8.0.20-merge
branch
from
June 4, 2020 04:19
656a4c8
to
5e8458c
Compare
percona-ysorokin
force-pushed
the
ps-8.0.20-merge
branch
from
June 17, 2020 14:54
3708733
to
f6f0876
Compare
percona-ysorokin
force-pushed
the
ps-8.0.20-merge
branch
from
June 26, 2020 00:23
81e3869
to
6c5dc40
Compare
satya-bodapati
force-pushed
the
ps-8.0.20-merge
branch
from
June 30, 2020 07:24
5e8458c
to
33ea29b
Compare
…r_pool_size > 1G Problem: ------- According to documentation, default value of innodb_doublewrite_files is innodb_buffer_pool_instances * 2 But when you start InnoDB with buffer pool size > 1G (the number of instances is 8), it still creates 2 files only. Expected files is 16 (8*2) Fix:
satya-bodapati
force-pushed
the
ps-8.0.20-merge
branch
from
June 30, 2020 07:37
33ea29b
to
c435065
Compare
percona-ysorokin
pushed a commit
that referenced
this pull request
Sep 17, 2020
…o: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded Problem ======= Running mtr with ASAN build on Gentoo tests fails since the path to libtirpc is not /lib64/libtirpc.so which is the path mtr uses for preloading the library. Further more the libasan path in Gentoo may contain also underscores and minus which mtr safe_process does not recognize. Fails on Gentoo since /lib64/libtirpc.so do not exist +ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. Fails on Gentoo since /usr/lib64/libtirpc.so is a GNU LD script +ERROR: ld.so: object '/usr/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (invalid ELF header): ignored. Need to preload /lib64/libtirpc.so.3 on gentoo. When compiling with GNU C++ libasan path also include minus and underscores: $ less mysql-test/lib/My/SafeProcess/ldd_asan_test_result linux-vdso.so.1 (0x00007ffeba962000) libasan.so.4 => /usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4 (0x00007f3c2e827000) Tests that been affected in different ways are for example: $ ./mtr group_replication.gr_clone_integration_clone_not_installed [100%] group_replication.gr_clone_integration_clone_not_installed w3 [ fail ] ... ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. mysqltest: At line 21: Query 'START GROUP_REPLICATION' failed. ERROR 2013 (HY000): Lost connection to MySQL server during query ... ASAN:DEADLYSIGNAL ================================================================= ==11970==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0e5cecfb8c bp 0x7f0e340f1650 sp 0x7f0e340f0dc8 T44) ==11970==The signal is caused by a READ memory access. ==11970==Hint: address points to the zero page. #0 0x7f0e5cecfb8b in xdr_uint32_t (/lib64/libc.so.6+0x13cb8b) #1 0x7f0e5fbe6d43 (/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4+0x87d43) #2 0x7f0e3c675e59 in xdr_node_no plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:88 #3 0x7f0e3c67744d in xdr_pax_msg_1_6 plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:852 ... $ ./mtr ndb.ndb_config [100%] ndb.ndb_config [ fail ] ... --- /.../src/mysql-test/suite/ndb/r/ndb_config.result 2019-06-25 21:19:08.308997942 +0300 +++ /.../bld/mysql-test/var/log/ndb_config.reject 2019-06-26 11:58:11.718512944 +0300 @@ -30,16 +30,22 @@ == 16 == bug44689 192.168.0.1 192.168.0.2 192.168.0.3 192.168.0.4 192.168.0.1 192.168.0.1 == 17 == bug49400 +ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. +ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR -- at line 25: TCP connection is a duplicate of the existing TCP link from line 14 ERROR -- at line 25: Could not store section of configuration file. $ ./mtr ndb.ndb_basic [100%] ndb.ndb_basic [ pass ] 34706 ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. Solution ======== In safe_process use same trick for libtirpc as for libasan to determine path to library for pre loading. Also allow underscores and minus in paths. In addition also add some memory leak suppressions for perl. Change-Id: Ia02e354a20cf8b279eb2573f3f8c2c39776343dc (cherry picked from commit e88706d)
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 18, 2021
To call a service implementation one needs to: 1. query the registry to get a reference to the service needed 2. call the service via the reference 3. call the registry to release the reference While #2 is very fast (just a function pointer call) #1 and #3 can be expensive since they'd need to interact with the registry's global structure in a read/write fashion. Hence if the above sequence is to be repeated in a quick succession it'd be beneficial to do steps #1 and #3 just once and aggregate as many #2 steps in a single sequence. This will usually mean to cache the service reference received in #1 and delay 3 for as much as possible. But since there's an active reference held to the service implementation until 3 is taken special handling is needed to make sure that: The references are released at regular intervals so changes in the registry can become effective. There is a way to mark a service implementation as "inactive" ("dying") so that until all of the active references to it are released no new ones are possible. All of the above is part of the current audit API machinery, but needs to be isolated into a separate service suite and made generally available to all services. This is what this worklog aims to implement. RB#24806
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 18, 2021
A heap-buffer-overflow in libmyqlxclient when - auth-method is MYSQL41 - the "server" sends a nonce that is shortert than 20 bytes. ==2466857==ERROR: AddressSanitizer: heap-buffer-overflow on address #0 0x4a7b76 in memcpy (routertest_component_routing_splicer+0x4a7b76) #1 0x7fd3a1d89052 in SHA1_Update (/libcrypto.so.1.1+0x1c2052) #2 0x63409c in compute_mysql41_hash_multi(unsigned char*, char const*, unsigned int, char const*, unsigned int) ... RB: 25305 Reviewed-by: Lukasz Kotula <lukasz.kotula@oracle.com>
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 18, 2021
…TH VS 2019 [#1] [noclose] storage\ndb\include\util\Bitmask.hpp(388,23): warning C4146: unary minus operator applied to unsigned type, result still unsigned Change-Id: I657966b790b96356d987767302cfceb446e29a98
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 15, 2021
Fixes to ZenFS packaging for Deb
percona-ysorokin
pushed a commit
that referenced
this pull request
Sep 8, 2021
…lize subqueries] For IN subqueries that have not been converted to semijoin, consider materializing them. To do this, we need to add two interrelated steps: 1. Every subquery that has gone through IN-to-EXISTS needs to be planned twice; once as is (for direct execution), and once without the added IN-to-EXISTS conditions (for materialization, since the extra conditions are dependent on the outer query block). There is some overlap between these two plans, in particular if the subquery involves many tables. At one point, we had a prototype that did both at the same time, tagging access paths with whether they included such conditions or not (and not combining incompatible alternatives), but it ended up becoming very intrusive, so instead, we simply plan from scratch twice. Most subqueries are cheap to plan anyway. 2. Whenever we see a filter with a subquery, we propose two alternatives, one as-is and one where the subquery is materialized. (These are the two plans generated by #1.) The latter will have high init_once_cost but usually much lower cost, so the planner will make a cost-based decision largely depending on the number of joined rows. This is fairly similar to what the old optimizer does, except that the old one seems to ignore some of the costs (it doesn't plan both alternatives before making the decisions; it only replans if it actually chooses materialization). Change-Id: I6947419e1f9d4ec0f03f7ba214f46daf2f690c4c
percona-ysorokin
pushed a commit
that referenced
this pull request
Sep 8, 2021
…ING TABLESPACES The occurrence of this message is a minor issue fixed by change #1 below. But during testing, I found that if mysqld is restarted while remote and local tablespaces are discarded, especially if the tablespaces to be imported are already in place at startup, then many things can go wrong. There were various asserts that occurred depending on timing. During all the testing and debugging, the following changes were made. 1. Prevent the stats thread from complaining about a missing tablespace. See dict_stats_update(). 2. Prevent a discarded tablespace from being opened at startup, even if the table to be imported is already in place. See Validate_files::check(). 3. dd_tablespace_get_state_enum() was refactored to separate the normal way to do it in v8.0, which is to use "state" key in dd::tablespaces::se_private_date, from the non-standard way which is to check undo::spaces or look for the old key value pair of "discarded=true". This allowed the new call to this routine by the change in fix #2 above. 4. Change thd_tablespace_op() in sql/sql_thd_api.cc such that instead of returning 1 if the DDL requires an implicit tablespace, it returns the DDL operation flag. This can still be interpreted as a boolean, but it can also be used to determine if the op is an IMPORT or a DISCARD. 5. With that change, the annoying message that a space is discarded can be avoided during an import when it needs to be discarded. 6. Several test cases were corrected now that the useless "is discarded" warning is no longer being written. 7. Two places where dd_tablespace_set_state() was called to set the state to either "discard" or "normal" were consolidated to a new version of dd_tablespace_set_state(thd, dd_space_id, space_name, dd_state). 8. This new version of dd_tablespace_set_state() was used in dd_commit_inplace_alter_table() to make sure that in all three places the dd is changed to identify a discarded tablesapace, it is identified in dd:Tablespace::se_private_data as well as dd:Table::se_private_data or dd::Partition::se_private_data. The reason it is necessary to record this in dd::Tablespace is that during startup, boot_tablespaces() and Validate::files::check() are only traversing dd::Tablespace. And that is where fix #2 is done! 9. One of the asserts that occurred was during IMPORT TABLESPACE after a restart that found a discarded 5.7 tablespace in the v8.0 discarded location. This assert occurred in Fil_shard::get_file_size() just after ER_IB_MSG_272. The 5.7 file did not have the SDI flag, but the v8.0 space that was discarded did have that flag. So the flags did not match. That crash was fixed by setting the fil_space_t::flags to what it is in the tablespace header page. A descriptive comment was added. 10. There was a section in fil_ibd_open() that checked `if (space != nullptr) {` and if true, it would close and free stuff then immediately crash. I think I remember many years ago adding that assert because I did not think it actually occurred. Well it did occur during my testing before I added fix #2 above. This made fil_ibd_open() assume that the file was NOT already open. So fil_ibd_open() is now changed to allow for that possibility by adding `if (space != nullptr) {return DB_SUCCESS}` further down. Since fil_ibd_open() can be called with a `validate` boolean, the routine now attempts to do all the validation whether or not the tablespace is already open. The following are non-functional changes; - Many code documentation lines were added or improved. - dict_sys_t::s_space_id renamed to dict_sys_t::s_dict_space_id in order to clarify better which space_id it referred to. - For the same reason, change s_dd_space_id to s_dd_dict_space_id. - Replaced `table->flags2 & DICT_TF2_DISCARDED` with `dict_table_is_discarded(table)` in dict0load.cc - A redundant call to ibuf_delete_for_discarded_space(space_id) was deleted from fil_discard_tablespace() because it is also called higher up in the call stack in row_import_for_mysql(). - Deleted the declaration to `row_import_update_discarded_flag()` since the definition no longer exists. It was deleted when we switched from `discarded=true` to 'state=discarded' in dd::Tablespace::se_private_data early in v8.0 developement. Approved by Mateusz in RB#26077
percona-ysorokin
pushed a commit
that referenced
this pull request
Mar 11, 2022
This error happens for queries such as: SELECT ( SELECT 1 FROM t1 ) AS a, ( SELECT a FROM ( SELECT x FROM t1 ORDER BY a ) AS d1 ); Query_block::prepare() for query block #4 (corresponding to the 4th SELECT in the query above) calls setup_order() which again calls find_order_in_list(). That function replaces an Item_ident for 'a' in Query_block.order_list with an Item_ref pointing to query block #2. Then Query_block::merge_derived() merges query block #4 into query block #3. The Item_ref mentioned above is then moved to the order_list of query block #3. In the next step, find_order_in_list() is called for query block #3. At this point, 'a' in the select list has been resolved to another Item_ref, also pointing to query block #2. find_order_in_list() detects that the Item_ref in the order_list is equivalent to the Item_ref in the select list, and therefore decides to replace the former with the latter. Then find_order_in_list() calls Item::clean_up_after_removal() recursively (via Item::walk()) for the order_list Item_ref (since that is no longer needed). When calling clean_up_after_removal(), no Cleanup_after_removal_context object is passed. This is the actual error, as there should be a context pointing to query block #3 that ensures that clean_up_after_removal() only purge Item_subselect.unit if both of the following conditions hold: 1) The Item_subselect should not be in any of the Item trees in the select list of query block #3. 2) Item_subselect.unit should be a descendant of query block #3. These conditions ensure that we only purge Item_subselect.unit if we are sure that it is not needed elsewhere. But without the right context, query block #2 gets purged even if it is used in the select lists of query blocks #1 and #3. The fix is to pass a context (for query block #3) to clean_up_after_removal(). Both of the above conditions then become false, and Item_subselect.unit is not purged. As an additional shortcut, find_order_in_list() will not call clean_up_after_removal() if real_item() of the order item and the select list item are identical. In addition, this commit changes clean_up_after_removal() so that it requires the context to be non-null, to prevent similar errors. It also simplifies Item_sum::clean_up_after_removal() by removing window functions unconditionally (and adds a corresponding test case). Change-Id: I449be15d369dba97b23900d1a9742e9f6bad4355
percona-ysorokin
pushed a commit
that referenced
this pull request
Mar 11, 2022
…nt [#1] Problem ======= When the coordinator receives a stale schema event. It crashes due to assert failure. Description =========== After bug#32593352 fix, client/user thread can now detect schema distribution timeout by itself and can free the schema object. So, if a stale schema event reaches the coordinator after the client/user thread have freed the schema object, then the coordinator will try to get the schema object and will hit the assert failure. prior to bug#32593352, the schema distribution timeout can be detected only by the coordinator. So, it is assumed that the schema object should be always valid inside coordinator. As, there exists a valid scenario where schema object can be invalid the assert check now is not useful and can be removed. Fix === Fixed by removing the assert check. Change-Id: I0482ccc940505e83d66cbf2258528fbac6951599
percona-ysorokin
pushed a commit
that referenced
this pull request
Mar 11, 2022
…NSHIP WITH THE BUFFER SIZE Bug #33501541: Unmanageable Sort Buffer Behavior in 8.0.20+ Implement direct disk-to-disk copies of large packed addons during the filesort merge phase; if a single row is so large that its addons do not fit into its slice of the sort buffer during merging (even after emptying that slice of all other rows), but the sort key _does_ fit, simply sort the truncated row as usual, and then copy the rest of the addon incrementally from the input to the output, 4 kB at a time, when the row is to be written to the merge output. This is possible because the addon itself doesn't need to be in RAM for the row to be compared against other rows; only the sort key must. This greatly relaxes the sort buffer requirements for successful merging, especially when it comes to JSON rows or small blobs (which are typically used as packed addons, not sort keys). The rules used to be: 1. During initial chunk generation: The sort buffer must be at least as large as the largest row to be sorted. 2. During merging: Merging is guaranteed to pass if the sort buffer is at least 15 times as large as the largest row (sort key + addons), but one may be lucky and pass with only the demands from #1. Now, for sorts implemented using packed addons (which is the common case for small blobs and JSON), the new rules are: 1. Unchanged from #1 above. 2. During merging: Merging is guaranteed to pass if the sort buffer is at least 15 times are large as the largest _sort key_ (plus 4-byte length marker), but one may be lucky and pass with only the demands from #1. In practice, this means that filesort merging will almost never fail due to insufficient buffer space anymore; the query will either fail because a single row is too large in the sort step, or it will pass nearly all of the time. However, do note that while such merges will work, they will not always be very performant, as having lots of 1-row merge chunks will mean many merge passes and little work being done during the initial in-memory sort. Thus, the main use of this functionality is to be able to do sorts where there are a few rows with large JSON values or similar, but where most fit comfortably into the buffer. Also note that since requirement #1 is unchanged, one still cannot sort e.g. 500 kB JSON values using the default 256 kB sort buffer. Older recommendations to keep sort buffers small at nearly any cost are no longer valid, and have not been for a while. Sort buffers should be sized to as much RAM as one can afford without interfering with other tasks (such as the buffer pool, join buffers, or other concurrent sorts), and small sorts are not affected by the maximum sort buffer size being set to a larger value, as the sort buffer is incrementally allocated. Change-Id: I85745cd513402a42ed5fc4f5b7ddcf13c5793100
percona-ysorokin
pushed a commit
that referenced
this pull request
May 18, 2022
*Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 percona#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable.
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 5, 2022
…NSHIP WITH THE BUFFER SIZE Bug #33501541: Unmanageable Sort Buffer Behavior in 8.0.20+ Implement direct disk-to-disk copies of large packed addons during the filesort merge phase; if a single row is so large that its addons do not fit into its slice of the sort buffer during merging (even after emptying that slice of all other rows), but the sort key _does_ fit, simply sort the truncated row as usual, and then copy the rest of the addon incrementally from the input to the output, 4 kB at a time, when the row is to be written to the merge output. This is possible because the addon itself doesn't need to be in RAM for the row to be compared against other rows; only the sort key must. This greatly relaxes the sort buffer requirements for successful merging, especially when it comes to JSON rows or small blobs (which are typically used as packed addons, not sort keys). The rules used to be: 1. During initial chunk generation: The sort buffer must be at least as large as the largest row to be sorted. 2. During merging: Merging is guaranteed to pass if the sort buffer is at least 15 times as large as the largest row (sort key + addons), but one may be lucky and pass with only the demands from #1. Now, for sorts implemented using packed addons (which is the common case for small blobs and JSON), the new rules are: 1. Unchanged from #1 above. 2. During merging: Merging is guaranteed to pass if the sort buffer is at least 15 times are large as the largest _sort key_ (plus 4-byte length marker), but one may be lucky and pass with only the demands from #1. In practice, this means that filesort merging will almost never fail due to insufficient buffer space anymore; the query will either fail because a single row is too large in the sort step, or it will pass nearly all of the time. However, do note that while such merges will work, they will not always be very performant, as having lots of 1-row merge chunks will mean many merge passes and little work being done during the initial in-memory sort. Thus, the main use of this functionality is to be able to do sorts where there are a few rows with large JSON values or similar, but where most fit comfortably into the buffer. Also note that since requirement #1 is unchanged, one still cannot sort e.g. 500 kB JSON values using the default 256 kB sort buffer. Older recommendations to keep sort buffers small at nearly any cost are no longer valid, and have not been for a while. Sort buffers should be sized to as much RAM as one can afford without interfering with other tasks (such as the buffer pool, join buffers, or other concurrent sorts), and small sorts are not affected by the maximum sort buffer size being set to a larger value, as the sort buffer is incrementally allocated. Change-Id: I85745cd513402a42ed5fc4f5b7ddcf13c5793100
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 5, 2022
…NSHIP WITH THE BUFFER SIZE Bug #33501541: Unmanageable Sort Buffer Behavior in 8.0.20+ Implement direct disk-to-disk copies of large packed addons during the filesort merge phase; if a single row is so large that its addons do not fit into its slice of the sort buffer during merging (even after emptying that slice of all other rows), but the sort key _does_ fit, simply sort the truncated row as usual, and then copy the rest of the addon incrementally from the input to the output, 4 kB at a time, when the row is to be written to the merge output. This is possible because the addon itself doesn't need to be in RAM for the row to be compared against other rows; only the sort key must. This greatly relaxes the sort buffer requirements for successful merging, especially when it comes to JSON rows or small blobs (which are typically used as packed addons, not sort keys). The rules used to be: 1. During initial chunk generation: The sort buffer must be at least as large as the largest row to be sorted. 2. During merging: Merging is guaranteed to pass if the sort buffer is at least 15 times as large as the largest row (sort key + addons), but one may be lucky and pass with only the demands from #1. Now, for sorts implemented using packed addons (which is the common case for small blobs and JSON), the new rules are: 1. Unchanged from #1 above. 2. During merging: Merging is guaranteed to pass if the sort buffer is at least 15 times are large as the largest _sort key_ (plus 4-byte length marker), but one may be lucky and pass with only the demands from #1. In practice, this means that filesort merging will almost never fail due to insufficient buffer space anymore; the query will either fail because a single row is too large in the sort step, or it will pass nearly all of the time. However, do note that while such merges will work, they will not always be very performant, as having lots of 1-row merge chunks will mean many merge passes and little work being done during the initial in-memory sort. Thus, the main use of this functionality is to be able to do sorts where there are a few rows with large JSON values or similar, but where most fit comfortably into the buffer. Also note that since requirement #1 is unchanged, one still cannot sort e.g. 500 kB JSON values using the default 256 kB sort buffer. Older recommendations to keep sort buffers small at nearly any cost are no longer valid, and have not been for a while. Sort buffers should be sized to as much RAM as one can afford without interfering with other tasks (such as the buffer pool, join buffers, or other concurrent sorts), and small sorts are not affected by the maximum sort buffer size being set to a larger value, as the sort buffer is incrementally allocated. Change-Id: I85745cd513402a42ed5fc4f5b7ddcf13c5793100
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 5, 2022
This reduces the number of issues of type "Single - argument constructor may inadvertently be used as a type conversion constructor" as reported by Flint++ tool. Some of the reported issues were not addressed: - lines explicitly marked with NOLINT (like plugin/x/src/prepare_param_handler.h) - false positives (tool reports alignas() calls as constructors) - issues where objects were intended to be used with type conversion - constructors with "const std::string &" param, to accept literal strings as well Change-Id: I7eb6fabea7137fc7e81143f06ec7636b22f6ea97
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 5, 2022
Includes a partial implementation of span from C++20. Change-Id: Ibae9a4aeed95135f4ef35a7ce7b095e6930c1d66
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 5, 2022
Post push fix. Remove include of unused header files "ndb_global.h" and <cassert>. Use std::size_t instead of size_t. Change-Id: I2c718d0889965ce5967d575172da8df4aa55b1d7
percona-ysorokin
pushed a commit
that referenced
this pull request
Jul 5, 2022
Patch #1 caused several problems in mysql-trunk related to ndbinfo initialization and upgrade, including the failure of the test ndb_76_inplace_upgrade and the failure of all NDB MTR tests in Pushbuild on Windows. This patch fixes these issues, including fixes for bug#33726826 and bug#33730799. In ndbinfo, revert the removal of ndb$blocks and ndb$index_stats and the change of blocks and index_stats from views to tables. Improve the ndbinfo schema upgrade & initialization logic to better handle such a change in the future. This logic now runs in two passes: first it drops the known tables and views from current and previous versions, then it creates the tables and views for the current version. Add a new class method NdbDictionary::printColumnTypeDescription(). This is needed for the ndbinfo.columns table in patch #2 but was missing from patch #1. Add boilerplate index lookup initialization code that was also missing. Fix ndbinfo prefix determination on Windows. Change-Id: I422856bcad4baf5ae9b14c1e3a1f2871bd6c5f59
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2022
**Problem:** The tests fail under ASAN: ``` ==470513==ERROR: AddressSanitizer: heap-use-after-free on address 0x632000054e20 at pc 0x556599b68016 bp 0x7ffc630afb30 sp 0x7ffc630afb20 READ of size 8 at 0x632000054e20 thread T0 #0 0x556599b68015 in destroy_rwlock(PFS_rwlock*) /tmp/ps/storage/perfschema/pfs_instr.cc:430 #1 0x556599b30b82 in pfs_destroy_rwlock_v2(PSI_rwlock*) /tmp/ps/storage/perfschema/pfs.cc:2596 #2 0x7fa44336d62e in inline_mysql_rwlock_destroy /tmp/ps/include/mysql/psi/mysql_rwlock.h:289 #3 0x7fa44336da39 in vtoken_lock_cleanup::~vtoken_lock_cleanup() /tmp/ps/plugin/version_token/version_token.cc:517 #4 0x7fa46a7188a6 in __run_exit_handlers /build/glibc-SzIz7B/glibc-2.31/stdlib/exit.c:108 #5 0x7fa46a718a5f in __GI_exit /build/glibc-SzIz7B/glibc-2.31/stdlib/exit.c:139 #6 0x556596531da2 in mysqld_exit /tmp/ps/sql/mysqld.cc:2512 #7 0x55659655d579 in mysqld_main(int, char**) /tmp/ps/sql/mysqld.cc:8505 percona#8 0x55659609c5b5 in main /tmp/ps/sql/main.cc:25 percona#9 0x7fa46a6f6082 in __libc_start_main ../csu/libc-start.c:308 percona#10 0x55659609c4ed in _start (/tmp/results/PS/runtime_output_directory/mysqld+0x3c1b4ed) 0x632000054e20 is located 50720 bytes inside of 90112-byte region [0x632000048800,0x63200005e800) freed by thread T0 here: #0 0x7fa46b5f940f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122 #1 0x556599b617eb in pfs_free(PFS_builtin_memory_class*, unsigned long, void*) /tmp/ps/storage/perfschema/pfs_global.cc:113 #2 0x556599b61a15 in pfs_free_array(PFS_builtin_memory_class*, unsigned long, unsigned long, void*) /tmp/ps/storage/perfschema/pfs_global.cc:177 #3 0x556599b6f28b in PFS_buffer_default_allocator<PFS_rwlock>::free_array(PFS_buffer_default_array<PFS_rwlock>*) /tmp/ps/storage/perfschema/pfs_buffer_container.h:172 #4 0x556599b75628 in PFS_buffer_scalable_container<PFS_rwlock, 1024, 1024, PFS_buffer_default_array<PFS_rwlock>, PFS_buffer_default_allocator<PFS_rwlock> >::cleanup() /tmp/ps/storage/perfschema/pfs_buffer_container.h:452 #5 0x556599b6d591 in cleanup_instruments() /tmp/ps/storage/perfschema/pfs_instr.cc:231 #6 0x556599b8c3f1 in cleanup_performance_schema /tmp/ps/storage/perfschema/pfs_server.cc:343 #7 0x556599b8dcfc in shutdown_performance_schema() /tmp/ps/storage/perfschema/pfs_server.cc:374 percona#8 0x556596531d96 in mysqld_exit /tmp/ps/sql/mysqld.cc:2500 percona#9 0x55659655d579 in mysqld_main(int, char**) /tmp/ps/sql/mysqld.cc:8505 percona#10 0x55659609c5b5 in main /tmp/ps/sql/main.cc:25 percona#11 0x7fa46a6f6082 in __libc_start_main ../csu/libc-start.c:308 previously allocated by thread T0 here: #0 0x7fa46b5fa6e5 in __interceptor_posix_memalign ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:217 #1 0x556599b6167e in pfs_malloc(PFS_builtin_memory_class*, unsigned long, int) /tmp/ps/storage/perfschema/pfs_global.cc:68 #2 0x556599b6187a in pfs_malloc_array(PFS_builtin_memory_class*, unsigned long, unsigned long, int) /tmp/ps/storage/perfschema/pfs_global.cc:155 #3 0x556599b6fa9e in PFS_buffer_default_allocator<PFS_rwlock>::alloc_array(PFS_buffer_default_array<PFS_rwlock>*) /tmp/ps/storage/perfschema/pfs_buffer_container.h:159 #4 0x556599b6ff12 in PFS_buffer_scalable_container<PFS_rwlock, 1024, 1024, PFS_buffer_default_array<PFS_rwlock>, PFS_buffer_default_allocator<PFS_rwlock> >::allocate(pfs_dirty_state*) /tmp/ps/storage/perfschema/pfs_buffer_container.h:602 #5 0x556599b69abc in create_rwlock(PFS_rwlock_class*, void const*) /tmp/ps/storage/perfschema/pfs_instr.cc:402 #6 0x556599b341f5 in pfs_init_rwlock_v2(unsigned int, void const*) /tmp/ps/storage/perfschema/pfs.cc:2578 #7 0x556599b9487b in inline_mysql_rwlock_init /tmp/ps/include/mysql/psi/mysql_rwlock.h:261 percona#8 0x556599b94ba7 in init_pfs_tls_channels_instrumentation() /tmp/ps/storage/perfschema/pfs_tls_channel.cc:209 percona#9 0x556599b8ca44 in initialize_performance_schema(PFS_global_param*, PSI_thread_bootstrap**, PSI_mutex_bootstrap**, PSI_rwlock_bootstrap**, PSI_cond_bootstrap**, PSI_file_bootstrap**, PSI_socket_bootstrap**, PSI_table_bootstrap**, PSI_mdl_bootstrap**, PSI_idle_bootstrap**, PSI_stage_bootstrap**, PSI_statement_bootstrap**, PSI_transaction_bootstrap**, PSI_memory_bootstrap**, PSI_error_bootstrap**, PSI_data_lock_bootstrap**, PSI_system_bootstrap**, PSI_tls_channel_bootstrap**) /tmp/ps/storage/perfschema/pfs_server.cc:266 percona#10 0x55659655a585 in mysqld_main(int, char**) /tmp/ps/sql/mysqld.cc:7497 percona#11 0x55659609c5b5 in main /tmp/ps/sql/main.cc:25 percona#12 0x7fa46a6f6082 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: heap-use-after-free /tmp/ps/storage/perfschema/pfs_instr.cc:430 in destroy_rwlock(PFS_rwlock*) Shadow bytes around the buggy address: 0x0c6480002970: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c6480002980: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c6480002990: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c64800029a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c64800029b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c64800029c0: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd 0x0c64800029d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c64800029e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c64800029f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c6480002a00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c6480002a10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==470513==ABORTING ``` The reason of the error is Percona's change on 5ae4d27 which causes the static variables of the plugin not to be deallocated. This causes `void cleanup_instruments()` to be called before `vtoken_lock_cleanup::~vtoken_lock_cleanup()`, which finds the memory of the object to have been deallocated. **Solution:** Do not run the tests under ASAN or Valgrind.
percona-ysorokin
pushed a commit
that referenced
this pull request
Aug 2, 2022
**Problem:** The following leak is detected when running the test `encryption.upgrade_crypt_data_57_v1`: ``` ==388399==ERROR: LeakSanitizer: detected memory leaks Direct leak of 70 byte(s) in 1 object(s) allocated from: #0 0x7f5f87812808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144 #1 0x55f098875d2c in ut::detail::malloc(unsigned long) /home/ldonoso/src/release-8.0.29-20/storage/innobase/include/detail/ut/allocator_traits.h:71 #2 0x55f098875db5 in ut::detail::Alloc_fn::malloc(unsigned long) /home/ldonoso/src/release-8.0.29-20/storage/innobase/include/detail/ut/allocator_traits.h:88 #3 0x55f0988aa4b9 in void* ut::detail::Alloc_fn::alloc<false>(unsigned long) /home/ldonoso/src/release-8.0.29-20/storage/innobase/include/detail/ut/allocator_traits.h:97 #4 0x55f09889b7a3 in void* ut::detail::Alloc_pfs::alloc<false>(unsigned long, unsigned int) /home/ldonoso/src/release-8.0.29-20/storage/innobase/include/detail/ut/alloc.h:275 #5 0x55f09889bb9a in std::enable_if<ut::detail::Alloc_pfs::is_pfs_instrumented_v, void*>::type ut::detail::Alloc_<ut::detail::Alloc_pfs>::alloc<false, ut::detail::Alloc_pfs>(unsigned long, unsigned int) /home/ldonoso/src/release-8.0.29-20/storage/innobase/include/detail/ut/alloc.h:438 #6 0x55f0988767dd in ut::malloc_withkey(ut::PSI_memory_key_t, unsigned long) /home/ldonoso/src/release-8.0.29-20/storage/innobase/include/ut0new.h:604 #7 0x55f09937dd3c in rec_copy_prefix_to_buf_old /home/ldonoso/src/release-8.0.29-20/storage/innobase/rem/rem0rec.cc:1206 percona#8 0x55f09937dfd3 in rec_copy_prefix_to_buf(unsigned char const*, dict_index_t const*, unsigned long, unsigned char**, unsigned long*) /home/ldonoso/src/release-8.0.29-20/storage/innobase/rem/rem0rec.cc:1233 percona#9 0x55f098ae0ae3 in dict_index_copy_rec_order_prefix(dict_index_t const*, unsigned char const*, unsigned long*, unsigned char**, unsigned long*) /home/ldonoso/src/release-8.0.29-20/storage/innobase/dict/dict0dict.cc:3764 percona#10 0x55f098c3d0ba in btr_pcur_t::store_position(mtr_t*) /home/ldonoso/src/release-8.0.29-20/storage/innobase/btr/btr0pcur.cc:141 percona#11 0x55f098c027b6 in dict_getnext_system_low /home/ldonoso/src/release-8.0.29-20/storage/innobase/dict/dict0load.cc:256 percona#12 0x55f098c02933 in dict_getnext_system(btr_pcur_t*, mtr_t*) /home/ldonoso/src/release-8.0.29-20/storage/innobase/dict/dict0load.cc:298 percona#13 0x55f098c0c05b in dict_check_sys_tables /home/ldonoso/src/release-8.0.29-20/storage/innobase/dict/dict0load.cc:1573 percona#14 0x55f098c1770d in dict_load_tablespaces_for_upgrade() /home/ldonoso/src/release-8.0.29-20/storage/innobase/dict/dict0load.cc:3233 percona#15 0x55f0987e9ed1 in innobase_init_files /home/ldonoso/src/release-8.0.29-20/storage/innobase/handler/ha_innodb.cc:6072 percona#16 0x55f098819ed3 in innobase_ddse_dict_init /home/ldonoso/src/release-8.0.29-20/storage/innobase/handler/ha_innodb.cc:13985 percona#17 0x55f097fa5c10 in dd::bootstrap::DDSE_dict_init(THD*, dict_init_mode_t, unsigned int) /home/ldonoso/src/release-8.0.29-20/sql/dd/impl/bootstrap/bootstrapper.cc:742 percona#18 0x55f0986696a6 in dd::upgrade_57::do_pre_checks_and_initialize_dd(THD*) /home/ldonoso/src/release-8.0.29-20/sql/dd/upgrade_57/upgrade.cc:922 percona#19 0x55f09550e082 in handle_bootstrap /home/ldonoso/src/release-8.0.29-20/sql/bootstrap.cc:327 percona#20 0x55f0997416e7 in pfs_spawn_thread /home/ldonoso/src/release-8.0.29-20/storage/perfschema/pfs.cc:2943 percona#21 0x7f5f876a1608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477 SUMMARY: AddressSanitizer: 70 byte(s) leaked in 1 allocation(s). ``` **Solution:** The cause of the leak raises from the traversing of `pcur`. When traversing is exhausted `pcur.close()` is automatically called and all `pcur` resources are deallocated. Percona adds some early returns to the traverse, hence sometimes the traversing is not exhausted and `pcur.close()` is not called. The solution is calling `pcur.close()` explicitly. `close()` is an idempotent function so it is not a bug if it is called several times as a result of this change.
percona-ysorokin
pushed a commit
that referenced
this pull request
Sep 5, 2022
* PROBLEM The test "ndb.ndb_bug17624736" was constantly failing in [daily|weekly]-8.0-cluster branches in PB2, whether on `ndb-ps` or `ndb-default-big` profile test runs. The high-level reason for the failure was the installation of a duplicate entry in the Data Dictionary in respect to the `engine`-`se_private_id` pair, even when the previous table definition should have been dropped. * LOW-LEVEL EXPLANATION NDB reuses the least available ID for the dictionary table ID. The ID is then used by the NDB plugin to install as SE private ID field of the MySQL Data Dictionary table definition. If a problem occurs during the synchronization of NDB table definitions in the Data Dictionary (i.e., a previous definition was not successfully removed), then an attempt to install a table using an already installed SE private ID can occur. If that ID was inadvertedly cached as `missing`, then the function `acquire_uncached_table_by_se_private_id` will return fast without retrieving the table definition. Therefore, the old table definition on that ID cannot be retrieved ever for that Data Dictionary client instance, the new one won't be installed, and errors will be raised. * SOLUTION For NDB plugin to query a table definition, using the SE private ID (without causing a missing entry to be cached forever for that client instance), this patch adds a flag argument to the function to allow the caller to request it to skip the fast cache. Change-Id: I45eef594ee544000fe6b30b86977e5e91155dc80
percona-ysorokin
pushed a commit
that referenced
this pull request
Sep 5, 2022
-- Patch #1: Persist secondary load information -- Problem: We need a way of knowing which tables were loaded to HeatWave after MySQL restarts due to a crash or a planned shutdown. Solution: Add a new "secondary_load" flag to the `options` column of mysql.tables. This flag is toggled after a successful secondary load or unload. The information about this flag is also reflected in INFORMATION_SCHEMA.TABLES.CREATE_OPTIONS. -- Patch #2 -- The second patch in this worklog triggers the table reload from InnoDB after MySQL restart. The recovery framework recognizes that the system restarted by checking whether tables are present in the Global State. If there are no tables present, the framework will access the Data Dictionary and find which tables were loaded before the restart. This patch introduces the "Data Dictionary Worker" - a MySQL service recovery worker whose task is to query the INFORMATION_SCHEMA.TABLES table from a separate thread and find all tables whose secondary_load flag is set to 1. All tables that were found in the Data Dictionary will be appended to the list of tables that have to be reloaded by the framework from InnoDB. If an error occurs during restart recovery we will not mark the recovery as failed. This is done because the types of failures that can occur when the tables are reloaded after a restart are less critical compared to previously existing recovery situations. Additionally, this code will soon have to be adapted for the next worklog in this area so we are proceeding with the simplest solution that makes sense. A Global Context variable m_globalStateEmpty is added which indicates whether the Global State should be recovered from an external source. -- Patch #3 -- This patch adds the "rapid_reload_on_restart" system variable. This variable is used to control whether tables should be reloaded after a restart of mysqld or the HeatWave plugin. This variable is persistable (i.e., SET PERSIST RAPID_RELOAD_ON_RESTART = TRUE/FALSE). The default value of this variable is set to false. The variable can be modified in OFF, IDLE, and SUSPENDED states. -- Patch #4 -- This patch refactors the recovery code by removing all recovery-related code from ha_rpd.cc and moving it to separate files: - ha_rpd_session_factory.h/cc: These files contain the MySQLAdminSessionFactory class, which is used to create admin sessions in separate threads that can be used to issue SQL queries. - ha_rpd_recovery.h/cc: These files contain the MySQLServiceRecoveryWorker, MySQLServiceRecoveryJob and ObjectStoreRecoveryJob classes which were previously defined in ha_rpd.cc. This file also contains a function that creates the RecoveryWorkerFactory object. This object is passed to the constructor of the Recovery Framework and is used to communicate with the other section of the code located in rpdrecoveryfwk.h/cc. This patch also renames rpdrecvryfwk to rpdrecoveryfwk for better readability. The include relationship between the files is shown on the following diagram: rpdrecoveryfwk.h◄──────────────rpdrecoveryfwk.cc ▲ ▲ │ │ │ │ │ └──────────────────────────┐ │ │ ha_rpd_recovery.h◄─────────────ha_rpd_recovery.cc──┐ ▲ │ │ │ │ │ │ │ │ │ ▼ │ ha_rpd.cc───────────────────────►ha_rpd.h │ ▲ │ │ │ ┌───────────────────────────────┘ │ │ ▼ ha_rpd_session_factory.cc──────►ha_rpd_session_factory.h Other changes: - In agreement with Control Plane, the external Global State is now invalidated during recovery framework startup if: 1) Recovery framework recognizes that it should load the Global State from an external source AND, 2) rapid_reload_on_restart is set to OFF. - Addressed review comments for Patch #3, rapid_reload_on_restart is now also settable while plugin is ON. - Provide a single entry point for processing external Global State before starting the recovery framework loop. - Change when the Data Dictionary is read. Now we will no longer wait for the HeatWave nodes to connect before querying the Data Dictionary. We will query it when the recovery framework starts, before accepting any actions in the recovery loop. - Change the reload flow by inserting fake global state entries for tables that need to be reloaded instead of manually adding them to a list of tables scheduled for reload. This method will be used for the next phase where we will recover from Object Storage so both recovery methods will now follow the same flow. - Update secondary_load_dd_flag added in Patch #1. - Increase timeout in wait_for_server_bootup to 300s to account for long MySQL version upgrades. - Add reload_on_restart and reload_on_restart_dbg tests to the rapid suite. - Add PLUGIN_VAR_PERSIST_AS_READ_ONLY flag to "rapid_net_orma_port" and "rapid_reload_on_restart" definitions, enabling their initialization from persisted values along with "rapid_bootstrap" when it is persisted as ON. - Fix numerous clang-tidy warnings in recovery code. - Prevent suspended_basic and secondary_load_dd_flag tests to run on ASAN builds due to an existing issue when reinstalling the RAPID plugin. -- Bug#33752387 -- Problem: A shutdown of MySQL causes a crash in queries fired by DD worker. Solution: Prevent MySQL from killing DD worker's queries by instantiating a DD_kill_immunizer before the queries are fired. -- Patch #5 -- Problem: A table can be loaded before the DD Worker queries the Data Dictionary. This means that table will be wrongly processed as part of the external global state. Solution: If the table is present in the current in-memory global state we will not consider it as part of the external global state and we will not process it by the recovery framework. -- Bug#34197659 -- Problem: If a table reload after restart causes OOM the cluster will go into RECOVERYFAILED state. Solution: Recognize when the tables are being reloaded after restart and do not move the cluster into RECOVERYFAILED. In that case only the current reload will fail and the reload for other tables will be attempted. Change-Id: Ic0c2a763bc338ea1ae6a7121ff3d55b456271bf0
percona-ysorokin
pushed a commit
that referenced
this pull request
Dec 6, 2022
Enh#34350907 - [Nvidia] Allow DDLs when tables are loaded to HeatWave Bug#34433145 - WL#15129: mysqld crash Assertion `column_count == static_cast<int64_t>(cp_table- Bug#34446287 - WL#15129: mysqld crash at rapid::data::RapidNetChunkCtx::consolidateEncodingsDic Bug#34520634 - MYSQLD CRASH : Sql_cmd_secondary_load_unload::mysql_secondary_load_or_unload Bug#34520630 - Failed Condition: "table_id != InvalidTableId" Currently, DDL statements such as ALTER TABLE*, RENAME TABLE, and TRUNCATE TABLE are not allowed if a table has a secondary engine defined. The statements fail with the following error: "DDLs on a table with a secondary engine defined are not allowed." This worklog lifts this restriction for tables whose secondary engine is RAPID. A secondary engine hook is called in the beginning (pre-hook) and in the end (post-hook) of a DDL statement execution. If the DDL statement succeeds, the post-hook will direct the recovery framework to reload the table in order to reflect that change in HeatWave. Currently all DDL statements that were previously disallowed will trigger a reload. This can be improved in the future by checking whether the DDL operation has an impact on HeatWave or not. However detecting all edge-cases in this behavior is not straightforward so this improvement has been left as a future improvement. Additionally, if a DDL modifies the table schema in a way that makes it incompatible with HeatWave (e.g., dropping a primary key column) the reload will fail silently. There is no easy way to recognize whether the table schema will become incompatible with HeatWave in a pre-hook. List of changes: 1) [MySQL] Add new HTON_SECONDARY_ENGINE_SUPPORTS_DDL flag to indicate whether a secondary engine supports DDLs. 2) [MySQL] Add RAII hooks for RENAME TABLE and TRUNCATE TABLE, modeled on the ALTER TABLE hook. 3) Define HeatWave hooks for ALTER TABLE, RENAME TABLE, and TRUNCATE TABLE statements. 4) If a table reload is necessary, trigger it by marking the table as stale (WL#14914). 4) Move all change propagation & DDL hooks to ha_rpd_hooks.cc. 5) Adjust existing tests to support table reload upon DDL execution. 6) Extract code related to RapidOpSyncCtx in ha_rpd_sync_ctx.cc, and the PluginState enum to ha_rpd_fsm.h. * Note: ALTER TABLE statements related to secondary engine setting and loading were allowed before: - ALTER TABLE <TABLE> SECONDARY_UNLOAD, - ALTER TABLE SECONDARY_ENGINE = NULL. -- Bug#34433145 -- -- Bug#34446287 -- --Problem #1-- Crashes in Change Propagation when the CP thread tries to apply DMLs of tables with new schema to the not-yet-reloaded table in HeatWave. --Solution #1-- Remove table from Change Propagation before marking it as stale and revert the original change from rpd_binlog_parser.cc where we were checking if the table was stale before continuing with binlog parsing. The original change is no longer necessary since the table is removed from CP before being marked as stale. --Problem #2-- In case of a failed reload, tables are not removed from Global State. --Solution #2-- Keep track of whether the table was reloaded because it was marked as STALE. In that case we do not want the Recovery Framework to retry the reload and therefore we can remove the table from the Global State. -- Bug#34520634 -- Problem: Allowing the change of primary engine for tables with a defined secondary engine hits an assertion in mysql_secondary_load_or_unload(). Example: CREATE TABLE t1 (col1 INT PRIMARY KEY) SECONDARY_ENGINE = RAPID; ALTER TABLE t1 ENGINE = BLACKHOLE; ALTER TABLE t1 SECONDARY_LOAD; <- assertion hit here Solution: Disallow changing the primary engine for tables with a defined secondary engine. -- Bug#34520630 -- Problem: A debug assert is being hit in rapid_gs_is_table_reloading_from_stale because the table was dropped in the meantime. Solution: Instead of asserting, just return false if table is not present in the Global State. This patch also changes rapid_gs_is_table_reloading_from_stale to a more specific check (inlined the logic in load_table()). This check now also covers the case when a table was dropped/unloaded before the Recovery Framework marked it as INRECOVERY. In that case, if the reload fails we should not have an entry for that table in the Global State. The patch also adjusts dict_types MTR test, where we no longer expect for tables to be in UNAVAIL state after a failed reload. Additionally, recovery2_ddls.test is adjusted to not try to offload queries running on Performance Schema. Change-Id: I6ee390b1f418120925f5359d5e9365f0a6a415ee
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 20, 2023
… Signal (get_store_key at sql/sql_select.cc:2383) These are two related but distinct problems manifested in the shrinkage of key definitions for derived tables or common table expressions, implemented in JOIN::finalize_derived_keys(). The problem in Bug#34572040 is that we have two references to one CTE, each with a valid key definition. The function will first loop over the first reference (cte_a) and move its used key from position 0 to position 1. Next, it will attempt to move the key for the second reference (cte_b) from position 4 to position 2. However, for each iteration, the function will calculate used key information. On the first iteration, the values are correct, but since key value #1 has been moved into position #0, the old information is invalid and provides wrong information. The problem is thus that for subsequent iterations we read data that has been invalidated by earlier key moves. The best solution to the problem is to move the keys for all references to the CTE in one operation. This way, we can calculate used keys information safely, before any move operation has been performed. The problem in Bug#34634469 is also related to having more than one reference to a CTE, but in this case the first reference (ref_3) has a key in position 5 which is moved to position 0, and the second reference (ref_4) has a key in position 3 that is moved to position 1. However, the key parts of the first key will overlap with the key parts of the second key after the first move, thus invalidating the key structure during the copy. The actual problem is that we move a higher-numbered key (5) before a lower-numbered key (3), which in this case makes it impossible to find an empty space for the moved key. The solution to this problem is to ensure that keys are moved in increasing key order. The patch changes the algorithm as follows: - When identifying a derived table/common table expression, ensure to move all its keys in one operation (at least those references from the same query block). - First, collect information about all key uses: hash key, unique index keys and actual key references. For the key references, also populate a mapping array that enumerates table references with key references in order of increasing key number. Also clear used key information for references that do not use keys. - For each table reference with a key reference in increasing key order, move the used key into the lowest available position. This will ensure that used entries are never overwritten. - When all table references have been processed, remove unused key definitions. Change-Id: I938099284e34a81886621f6a389f34abc51e78ba
percona-ysorokin
pushed a commit
that referenced
this pull request
Oct 26, 2023
https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION.
percona-ysorokin
pushed a commit
that referenced
this pull request
Jan 23, 2024
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Part of WL#15135 Certificate Architecture This patch introduces a set of C++ classes to implement the creation of private keys, PKCS#10 signing requests, and X.509 certificates for NDB cluster. The TlsSearchPath class provides searching for files over a delimited list of directories. The PrivateKey and Certificate classes provide simple wrappers over the OpenSSL routines to create, free, save, and open keys and certificates. Classes PendingPrivateKey and PendingCertificate implement file naming conventions and life cycle for pending key pairs; ActivePrivateKey and ActiveCertificate implement them for active key pairs. Class SigningRequest provides the naming conventions and life cycle for PKCS#10 CSRs. Class NodeCertificate is the primary in-memory representation of a node's TLS credentials. A unit test, NodeCertificate-t, is intended to thoroughly test the whole suite of classes. It should be possible to run this test under valgrind with no reported leaks. Change-Id: I76bf719375ab2a9b6a97245e326158a49dde28c2
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
This is a complete implementation of ndb_sign_keys. It searches --ndb-tls-search-path for node certificate and key files, and additionally searchs in --CA-search-path for CA-related key and certificate files. It includes three methods for remote key signing: With --remote-CA-host, run ndb_sign_keys remotely, using ssh. With --remote-openssl, run openssl on the remote host, using ssh. With --CA-tool, run a local signing helper tool. Change-Id: I5d93b702a667fa98d820ed150631a91e8444b8d7
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Post-push fix for : WL#15166 patch #1 ndb_sign_keys DWORD is 'unsigned long' not int Remove an unused local variable. C-style cast (LPSTR) drops const qualifier [-Wcast-qual] Change-Id: I059ad8a5a5f6b1cc644456576a8acff9a78331e3
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Add boolean parameter "RequireCertificate" to [DB] section. Default is false. If true, node will fail at startup time unless it finds a TLS key and a current valid certificate. Add boolean parameter "RequireTls" to [DB] section. Default is false. If true, every transporter link involving the data node must use TLS. Add boolean parameter "RequireTls" to [TCP] sections. This is computed, and not user-setable. If either endpoint of a link has RequireTls set to true, RequireTls for the link will be set true. Add some clarifying comments to ndbinfo_plans test. Change-Id: I889d9b7563022e2ebb2eaae92c3b26b557180d40
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Add an MGM protocol command to turn a plaintext mgm api session into a TLS session. Add three new MGM API functions: ndb_mgm_set_ssl_ctx() ndb_mgm_start_tls() ndb_mgm_connect_tls() Define two client TLS requirement levels: CLIENT_TLS_RELAXED, CLIENT_TLS_STRICT This adds a new test: testMgmd -n StartTls Change-Id: Ib46faacd9198c474558e46c3fa0538c7e759f3fb
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Post push fix. Remove added C++ dependencies in C header mgmapi.h. - forward declare SSL_CTX. - add missing struct keyword with ndb_mgm_cert_table and ndb_mgm_tls_stats - make ndb_mgm_set_ssl_ctx return int instead of bool as other mgmapi functions do. Change-Id: I493b4c4fb1272974e1bb72e35abb08c8cef1a534
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Post push fix. Do not allow ndb_mgm_listen_event to return a socket that uses TLS since user can not access the corresponding SSL object thorugh the public MgmAPI. Change-Id: I2a741efe4f80db750419101ecabb03fb5e025346
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Post push fix. Make NdbSocket::ssl_readln return 0 on timeout. Change-Id: I4cad95abd319883c16f2c28eff5cf2b6761731d6
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Post push fix. Add missing socket close in testMgmd -n StartTls. Change-Id: Ia446b522ad2698f63d588d3c52122df8735765c7
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Problem ================================ Group Replication ASAN run failing without any symptom of a leak, but with shutdown issues: worker[6] Shutdown report from /dev/shm/mtr-3771884/var-gr-debug/6/log/mysqld.1.err after tests: group_replication.gr_flush_logs group_replication.gr_delayed_initialization_thread_handler_error group_replication.gr_sbr_verifications group_replication.gr_server_uuid_matches_group_name_bootstrap group_replication.gr_stop_async_on_stop_gr group_replication.gr_certifier_message_same_member group_replication.gr_ssl_mode_verify_identity_error_xcom Analysis and Fix ================================ It ended up being a leak on gr_ssl_mode_verify_identity_error_xcom test: Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f1709fbe1c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x7f16ea0df799 in xcom_tcp_server_startup(Xcom_network_provider*) (/export/home/tmp/BUG35594709/mysql-trunk/BIN-ASAN/plugin_output_directory /group_replication.so+0x65d799) #2 0x7f170751e2b2 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2) This happens because we delegated incoming connections cleanup to the external consumer in incoming_connection_task. Since it calls incoming_connection() from Network_provider_manager, in case of a concurrent stop, a connection could be left orphan in the shared atomic due to the lack of an Active Provider, thus creating a memory leak. The solution is to make this cleanup on Network_provider_manager, on both stop_provider() and in stop_all_providers() methods, thus ensuring that no incoming connection leaks. Change-Id: I2367c37608ad075dee63785e9f908af5e81374ca
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
Post push fix. Make NdbSocket::ssl_readln return 0 on timeout. Change-Id: I4cad95abd319883c16f2c28eff5cf2b6761731d6
percona-ysorokin
pushed a commit
that referenced
this pull request
Feb 16, 2024
BUG#35949017 Schema dist setup lockup Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#2] Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#1] Bug#32550019 Missing check for ndb_schema_result leads to schema dist timeout Change-Id: I4a32197992bf8b6899892f21587580788f828f34
percona-ysorokin
pushed a commit
that referenced
this pull request
Mar 4, 2024
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin
pushed a commit
that referenced
this pull request
Mar 4, 2024
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Merge remote-tracking branch 'venki/PS-9018-8.0-gca' into HEAD Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
percona-ysorokin
pushed a commit
that referenced
this pull request
Apr 17, 2024
… cache [#1] Problem: A MySQL Server which has been disconnected from schema distribution fails to setup event operations since the columns of the table can't be found in the event. Analysis: The ndbcluster plugin uses NDB table definitions which are cached by the NdbApi. These cached objects are reference counted and there can be multiple versions of the same table in the cache, the intention is that it should be possible to continue using the table even though it changes in NDB. When changing a table in NDB this cache need to be invalidated, both on the local MySQL Server and on all other MySQL Servers connected to the same cluster. Such invalidation is especially important before installing in DD and setting up event subscriptions. The local MySQL Server cache is invalidated directly when releasing the reference from the NdbApi after having modified the table. The other MySQL Servers are primarily invalidated by using schema distribution. Since schema distribution is event driven the invalidation will happen promptly but as with all things in a distributed system there is a possibility that these events are not handled for some reason. This means there must be a fallback mechanism which invalidates stale cache objects. The reported problem occurs since there is a stale NDB table definition in the NdbApi, it has the same name but different columns than the current table in NDB. In most cases the NdbApi continues to operate on a cached NDB table definition but when setting up events the "mismatch on version" will be detected inside the NdbApi(due to the relation between the event and the table), this causes the cache to be invalidated and current version to be loaded from NDB. However the caller is still using the "old" cached table definition and thus when trying to subscribe the columns they can not be found. Solution: 1) Invalidate NDB table definition in schema event handler that handles new table created. This covers the case where table is dropped directly in NDB using for example ndb_drop_table or ndb_restore and then subsequently created using SQL. This scenario is covered by the existing metadata_sync test cases who will be detected by 4) before this part of the fix. 2) Invalidate NDB table definition before table schema synchronization install tables in DD and setup event subscripotion. This function handles the case when schema distribution is reconnecting to the cluster and a table it knew about earlier has changed while schema distribution event handlers have not been active. This scenario is tested by the drop_util_table test case. 3) Invalidate NDB table definition when schema distribution event handler which is used for drop table and cluster failure occurs. At this time it's well known that table does not exists or it's status is unknown. Earlier this invalidation was only performed if there was a version mismatch in the the event vs. table relation. 4) Detect when problem occurs by checking that NDB table definition has not been invalidated (by NdbApi event functions) in the function that setup the event subscription. It's currently not possible to handle the problem this low down, but at least it can be detected and fix added to the callers. This detection is only done in debug compile. Change-Id: I4ed6efb9308be0022e99c51eb23ecf583805b1f4
percona-ysorokin
pushed a commit
that referenced
this pull request
Jun 7, 2024
UUID vx review by yura (partial)
percona-ysorokin
pushed a commit
that referenced
this pull request
Jun 24, 2024
When built with ASAN, a use-after-free is reported for the TcpPortPool. AddressSanitizer: heap-use-after-free on address 0x60200019f190 at pc 0x00000076a18d bp 0x7fff51e7d1d0 sp 0x7fff51e7d1c0 #4 0x770b73 in UniqueId::ProcessUniqueIds::erase(unsigned int) ../router/tests/helpers/tcp_port_pool.h:112 #5 0x770c48 in UniqueId::~UniqueId() ../router/tests/helpers/tcp_port_pool.cc:234 ... percona#12 0x82faa3 in testing::UnitTest::~UnitTest() ../extra/googletest/googletest-release-1.12.0/googletest/src/gtest.cc:5496 percona#13 0x7f5fe085ace8 in __run_exit_handlers (/lib64/libc.so.6+0x39ce8) 0x60200019f190 is located 0 bytes inside of 16-byte region [0x60200019f190,0x60200019f1a0) freed by thread T0 here: #0 0x7f5fe3cbd10f in operator delete(void*, unsigned long) (/lib64/libasan.so.6+0xb710f) #1 0x7f5fe085ace8 in __run_exit_handlers (/lib64/libc.so.6+0x39ce8) Background ========== __run_exit_handlers destroys "static" and "global" variables in reverse order of their creation. googletest's unit-tests are a static, and the TcpPortPool also has ProcessUniqueId's which contains the process-wide unique-ids. At construct: unittest -> tcp-port-pool -> proces-unique-ids At destruct : process-unique-ids -> tcp-port-pool -> 💥 The use-after-free happens as the process-unique-ids static is destructed before the tcp-port-pool which tries to its Ids from the process-unique-ids. Change ====== - extend the lifetime of the process-unique-ids to after the last use of the tcp-port-pool via a std::shared_ptr<> Change-Id: I75b8b781e1d240f18ca72f2c86182639a7699f06
percona-ysorokin
pushed a commit
that referenced
this pull request
Jun 24, 2024
…nt on Windows and posix [#1] When passing arguments to NdbProcess::create it will become important when introducing quoting to distinguish spaces that are port of the argument value or beeing an argument separator. This patch removes current uses of space as separator in arguments to NdbProcess::create. Change-Id: I1d1bab27e183fc33632bfd9974010129a8970365
percona-ysorokin
pushed a commit
that referenced
this pull request
Jun 24, 2024
Problem: Starting ´ndb_mgmd --bind-address´ may potentially cause abnormal program termination in MgmtSrvr destructor when ndb_mgmd restart itself. Core was generated by `ndb_mgmd --defa'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f8ce4066b8f in raise () from /lib64/libc.so.6 #1 0x00007f8ce4039ea5 in abort () from /lib64/libc.so.6 #2 0x00007f8ce40a7d97 in __libc_message () from /lib64/libc.so.6 #3 0x00007f8ce40af08c in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f8ce40b132d in _int_free () from /lib64/libc.so.6 #5 0x00000000006e9ffe in MgmtSrvr::~MgmtSrvr (this=0x28de4b0) at mysql/8.0/storage/ndb/src/mgmsrv/MgmtSrvr.cpp: 890 #6 0x00000000006ea09e in MgmtSrvr::~MgmtSrvr (this=0x2) at mysql/8.0/ storage/ndb/src/mgmsrv/MgmtSrvr.cpp:849 #7 0x0000000000700d94 in mgmd_run () at mysql/8.0/storage/ndb/src/mgmsrv/main.cpp:260 percona#8 0x0000000000700775 in mgmd_main (argc=<optimized out>, argv=0x28041d0) at mysql/8.0/storage/ndb/src/ mgmsrv/main.cpp:479 Analysis: While starting up, the ndb_mgmd will allocate memory for bind_address in order to potentially rewrite the parameter. When ndb_mgmd restart itself the memory will be released and dangling pointer causing double free. Fix: Drop support for bind_address=[::], it is not documented anywhere, is not useful and doesn't work. This means the need to rewrite bind_address is gone and bind_address argument need neither alloc or free. Change-Id: I7797109b9d8391394587188d64d4b1f398887e94
percona-ysorokin
pushed a commit
that referenced
this pull request
Jun 24, 2024
This worklog introduces dynamic offload of Queries to RAPID in following ways: When system variable rapid_use_dynamic_offload is 0/false , then we fall back to normal cost threshold classifier, which also implies that when use secondary engine is set to forced, eligible queries will go to secondary engine, regardless of cost threshold or this classifier. When rapid_use_dynamic_offload is 1/true, then we proceed with looking for optimal execution engine for this queries, if secondary engine is found more optimal, then query is offloaded, otherwise it is sent back to mysql. This is handled in following scenarios: 1. Static Scenario: When there's no Change propagation or Queue on RAPID side, this introduces decision tree which has > 85 % precision in training which queries should be faster on mysql or which queries should be faster on mysql, and accepts or rejects queries. the decision tree takes around 20-100 microseconds for fast queries, hence minimal overhead, for bigger queries this introduces overhead of upto maximum observed 700 microseconds, these end up with long execution time, hence not a problem. For very fast queries, defined here by having cost < 10 and of the form point select, dynamic offload is not applied, since 100 % of these queries (out of 16667 samples) are faster on MySQL. Additionally, routing these "very fast queries" through dynamic offload leads to performance regressions due to 3 phase optimisation. 2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog introduces dynamic feature normalization to factor into account extra catch up time RAPID needs, and factoring in that, attempts to verify if RAPID is still the best engine for execution. If queue is too long or CP is too long, this mechanism wants to progressively start shifting queries to mysql, moving gradually towards the heavier queries The steps in this worklog with respect to query lifecycle in server with secondary_engine = ON, are described below: query | Primary Tentatively optimisation -> mysql optimises for Innodb | secondary_engine_pre_prepare_hook -> following Rapid function called: | RapidCachePrimaryInfoAtPrimaryTentativelyStep | If dynamic offload is enabled and query is not "very fast": | This caches features from mysql plan in rapid_statement_context | to be used for dynamic offload. | If dynamic offload is disabled or the query is "very fast": | This function invokes standary mysql cost threshold classifier, | which decides if query needs further RAPID optimisation. | | |-> if returns False, then query proceeds to Innodb for execution |-> if returns true, step below is called | Secondary optimisation -> mysql optimises for RAPID | prepare_secondary_engine -> following Rapid function is called: | RapidPrepareEstimateQueryCosts | In this function, Dynamic offload combines mysql plan features | retrieved from rapid_statement_context | and RAPID info such as rapid base table cardinality, | dict encoding projection, varlen projection size, rapid queue | size in to decide if query should be offloaded to RAPID. | |->if returns True, then query proceeds to Innodb for execution |->if returns False, step below is called | optimize_secondary_engine -> following Rapid function is called | RapidOptimize | In this function, Dynamic offload retrieves info from | rapid_statement_context and additionally looks at Change | propagation lag to decide if query should be offloaded to rapid | |->if returns True, then query proceeds to Innodb for execution |->if returns False, then query goes to Rapid Execution. Following new MYSQL ERR log messages are printed with this WL, when dynamic offload is enabled, and query is not a "very fast query". 1. SelOffload allow decision 1 : as secondary not forced 1 and enable var value 1 and transactional enabled 1 and( big shape detected 0 or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 , no lh table: 1 Message such as this shows if dynamic offload is used to classify this query or not. If not, why not, using each of the conditions. 1 = pass, 0 = not pass. 2. myqid=65 Selective offload classifier #1#1#1 f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <= 68.3899040222168 : 0.028218, f_count_all_base_tables = 0 , f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 : 0.173916 are_all_ts_index_ref = true outcome=0 Line such as this serialises what leg of decision tree decided outcome of this query 0 -> back to mysql 1 -> keep on rapid. each leg is uniquely searchable via identifier such as #1#1#1 here. This worklog additionally introduces python scripts to run queries on mysql client with multiple queries and multiple dmls at once, in various modes such as simulator mode and standard benchmark modes. By Default this WL is enabled, but before release it will be disabled. This is tracked via BUG#36343189 #no-close. Perf mode unittests will be enabled on jenkins after this wl. Further cleanup will be done via BUG#36368437 #no-close. Bugs tackled via this WL: BUG#35738194, Enh#34132523, Bug#36343208 Unrelated bugs fixed: BUG#35987975 Old gerrit review : 25567 (abandoned due to 1000 update limit reached) Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
percona-ysorokin
pushed a commit
that referenced
this pull request
Jun 28, 2024
https://perconadev.atlassian.net/browse/PS-9222 Problem ======= When writing to the redo log, an issue of column order change not being recorded with INSTANT DDL was fixed by creating an array with size equal to the number of fields in the index which kept track of whether the original position of the field was changed or not. Later, that array would be used to make a decision on logging the field. But, this solution didn't take into account the fact that there could be column prefixes because of the primary key. This resulted in inaccurate entries being filled in the fields_with_changed_order[] array. Solution ======== It is fixed by using the method, get_col_phy_pos() which takes into account the existence of column prefix instead of get_phy_pos() while generating fields_with_changed_order[] array.
percona-ysorokin
pushed a commit
that referenced
this pull request
Sep 5, 2024
This worklog introduces dynamic offload of Queries to RAPID in following ways: When system variable rapid_use_dynamic_offload is 0/false , then we fall back to normal cost threshold classifier, which also implies that when use secondary engine is set to forced, eligible queries will go to secondary engine, regardless of cost threshold or this classifier. When rapid_use_dynamic_offload is 1/true, then we proceed with looking for optimal execution engine for this queries, if secondary engine is found more optimal, then query is offloaded, otherwise it is sent back to mysql. This is handled in following scenarios: 1. Static Scenario: When there's no Change propagation or Queue on RAPID side, this introduces decision tree which has > 85 % precision in training which queries should be faster on mysql or which queries should be faster on mysql, and accepts or rejects queries. the decision tree takes around 20-100 microseconds for fast queries, hence minimal overhead, for bigger queries this introduces overhead of upto maximum observed 700 microseconds, these end up with long execution time, hence not a problem. For very fast queries, defined here by having cost < 10 and of the form point select, dynamic offload is not applied, since 100 % of these queries (out of 16667 samples) are faster on MySQL. Additionally, routing these "very fast queries" through dynamic offload leads to performance regressions due to 3 phase optimisation. 2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog introduces dynamic feature normalization to factor into account extra catch up time RAPID needs, and factoring in that, attempts to verify if RAPID is still the best engine for execution. If queue is too long or CP is too long, this mechanism wants to progressively start shifting queries to mysql, moving gradually towards the heavier queries The steps in this worklog with respect to query lifecycle in server with secondary_engine = ON, are described below: query | Primary Tentatively optimisation -> mysql optimises for Innodb | secondary_engine_pre_prepare_hook -> following Rapid function called: | RapidCachePrimaryInfoAtPrimaryTentativelyStep | If dynamic offload is enabled and query is not "very fast": | This caches features from mysql plan in rapid_statement_context | to be used for dynamic offload. | If dynamic offload is disabled or the query is "very fast": | This function invokes standary mysql cost threshold classifier, | which decides if query needs further RAPID optimisation. | | |-> if returns False, then query proceeds to Innodb for execution |-> if returns true, step below is called | Secondary optimisation -> mysql optimises for RAPID | prepare_secondary_engine -> following Rapid function is called: | RapidPrepareEstimateQueryCosts | In this function, Dynamic offload combines mysql plan features | retrieved from rapid_statement_context | and RAPID info such as rapid base table cardinality, | dict encoding projection, varlen projection size, rapid queue | size in to decide if query should be offloaded to RAPID. | |->if returns True, then query proceeds to Innodb for execution |->if returns False, step below is called | optimize_secondary_engine -> following Rapid function is called | RapidOptimize | In this function, Dynamic offload retrieves info from | rapid_statement_context and additionally looks at Change | propagation lag to decide if query should be offloaded to rapid | |->if returns True, then query proceeds to Innodb for execution |->if returns False, then query goes to Rapid Execution. Following new MYSQL ERR log messages are printed with this WL, when dynamic offload is enabled, and query is not a "very fast query". 1. SelOffload allow decision 1 : as secondary not forced 1 and enable var value 1 and transactional enabled 1 and( big shape detected 0 or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 , no lh table: 1 Message such as this shows if dynamic offload is used to classify this query or not. If not, why not, using each of the conditions. 1 = pass, 0 = not pass. 2. myqid=65 Selective offload classifier #1#1#1 f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <= 68.3899040222168 : 0.028218, f_count_all_base_tables = 0 , f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 : 0.173916 are_all_ts_index_ref = true outcome=0 Line such as this serialises what leg of decision tree decided outcome of this query 0 -> back to mysql 1 -> keep on rapid. each leg is uniquely searchable via identifier such as #1#1#1 here. This worklog additionally introduces python scripts to run queries on mysql client with multiple queries and multiple dmls at once, in various modes such as simulator mode and standard benchmark modes. By Default this WL is enabled, but before release it will be disabled. This is tracked via BUG#36343189 #no-close. Perf mode unittests will be enabled on jenkins after this wl. Further cleanup will be done via BUG#36368437 #no-close. Bugs tackled via this WL: BUG#35738194, Enh#34132523, Bug#36343208 Unrelated bugs fixed: BUG#35987975 Old gerrit review : 25567 (abandoned due to 1000 update limit reached) Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
percona-ysorokin
pushed a commit
that referenced
this pull request
Nov 6, 2024
This worklog introduces dynamic offload of Queries to RAPID in following ways: When system variable rapid_use_dynamic_offload is 0/false , then we fall back to normal cost threshold classifier, which also implies that when use secondary engine is set to forced, eligible queries will go to secondary engine, regardless of cost threshold or this classifier. When rapid_use_dynamic_offload is 1/true, then we proceed with looking for optimal execution engine for this queries, if secondary engine is found more optimal, then query is offloaded, otherwise it is sent back to mysql. This is handled in following scenarios: 1. Static Scenario: When there's no Change propagation or Queue on RAPID side, this introduces decision tree which has > 85 % precision in training which queries should be faster on mysql or which queries should be faster on mysql, and accepts or rejects queries. the decision tree takes around 20-100 microseconds for fast queries, hence minimal overhead, for bigger queries this introduces overhead of upto maximum observed 700 microseconds, these end up with long execution time, hence not a problem. For very fast queries, defined here by having cost < 10 and of the form point select, dynamic offload is not applied, since 100 % of these queries (out of 16667 samples) are faster on MySQL. Additionally, routing these "very fast queries" through dynamic offload leads to performance regressions due to 3 phase optimisation. 2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog introduces dynamic feature normalization to factor into account extra catch up time RAPID needs, and factoring in that, attempts to verify if RAPID is still the best engine for execution. If queue is too long or CP is too long, this mechanism wants to progressively start shifting queries to mysql, moving gradually towards the heavier queries The steps in this worklog with respect to query lifecycle in server with secondary_engine = ON, are described below: query | Primary Tentatively optimisation -> mysql optimises for Innodb | secondary_engine_pre_prepare_hook -> following Rapid function called: | RapidCachePrimaryInfoAtPrimaryTentativelyStep | If dynamic offload is enabled and query is not "very fast": | This caches features from mysql plan in rapid_statement_context | to be used for dynamic offload. | If dynamic offload is disabled or the query is "very fast": | This function invokes standary mysql cost threshold classifier, | which decides if query needs further RAPID optimisation. | | |-> if returns False, then query proceeds to Innodb for execution |-> if returns true, step below is called | Secondary optimisation -> mysql optimises for RAPID | prepare_secondary_engine -> following Rapid function is called: | RapidPrepareEstimateQueryCosts | In this function, Dynamic offload combines mysql plan features | retrieved from rapid_statement_context | and RAPID info such as rapid base table cardinality, | dict encoding projection, varlen projection size, rapid queue | size in to decide if query should be offloaded to RAPID. | |->if returns True, then query proceeds to Innodb for execution |->if returns False, step below is called | optimize_secondary_engine -> following Rapid function is called | RapidOptimize | In this function, Dynamic offload retrieves info from | rapid_statement_context and additionally looks at Change | propagation lag to decide if query should be offloaded to rapid | |->if returns True, then query proceeds to Innodb for execution |->if returns False, then query goes to Rapid Execution. Following new MYSQL ERR log messages are printed with this WL, when dynamic offload is enabled, and query is not a "very fast query". 1. SelOffload allow decision 1 : as secondary not forced 1 and enable var value 1 and transactional enabled 1 and( big shape detected 0 or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 , no lh table: 1 Message such as this shows if dynamic offload is used to classify this query or not. If not, why not, using each of the conditions. 1 = pass, 0 = not pass. 2. myqid=65 Selective offload classifier #1#1#1 f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <= 68.3899040222168 : 0.028218, f_count_all_base_tables = 0 , f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 : 0.173916 are_all_ts_index_ref = true outcome=0 Line such as this serialises what leg of decision tree decided outcome of this query 0 -> back to mysql 1 -> keep on rapid. each leg is uniquely searchable via identifier such as #1#1#1 here. This worklog additionally introduces python scripts to run queries on mysql client with multiple queries and multiple dmls at once, in various modes such as simulator mode and standard benchmark modes. By Default this WL is enabled, but before release it will be disabled. This is tracked via BUG#36343189 #no-close. Perf mode unittests will be enabled on jenkins after this wl. Further cleanup will be done via BUG#36368437 #no-close. Bugs tackled via this WL: BUG#35738194, Enh#34132523, Bug#36343208 Unrelated bugs fixed: BUG#35987975 Old gerrit review : 25567 (abandoned due to 1000 update limit reached) Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
percona-ysorokin
pushed a commit
that referenced
this pull request
Nov 6, 2024
… for connection xxx'. The new iterator based explains are not impacted. The issue here is a race condition. More than one thread is using the query term iterator at the same time (whoch is neithe threas safe nor reantrant), and part of its state is in the query terms being visited which leads to interference/race conditions. a) the explain thread uses an iterator here: Sql_cmd_explain_other_thread::execute is inspecting the Query_expression of the running query calling master_query_expression()->find_blocks_query_term which uses an iterator over the query terms in the query expression: for (auto qt : query_terms<>()) { if (qt->query_block() == qb) { return qt; } } the above search fails to find qb due to the interference of the thread b), see below, and then tries to access a nullpointer: * thread percona#36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11 frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50 frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21 frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26 frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32 frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9 frame #6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14 frame #7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20 frame percona#8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11 frame percona#9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13 frame percona#10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7 frame percona#11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17 frame percona#12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13 * frame percona#13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11 frame percona#14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29 frame percona#15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19 frame percona#16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7 frame percona#17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18 frame percona#18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13 frame percona#19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3 frame percona#20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99 frame percona#21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15 b) the query thread being explained is itself performing LEX::cleanup and as part of the iterates over the query terms, but still allows EXPLAIN of the query plan since thd->query_plan.set_query_plan(SQLCOM_END, ...) hasn't been called yet. 20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613) 21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861) 22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286) 30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799) 31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648) 32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303) 33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135) 34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464) 57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304) 58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053) 65:frame: _pthread_start (in libsystem_pthread.dylib) + 99 66:frame: thread_start (in libsystem_pthread.dylib) + 15 Solution: This patch solves the issue by removing iterator state from Query_term, making the query_term iterators thread safe. This solution labels every child query_term with its index in its parent's m_children vector. The iterator can therefore easily compute the next child to visit based on Query_term::m_sibling_idx. A unit test case is added to check reentrancy. One can also manually verify that we have no remaining race condition by running two client connections files (with \. <file>) with a big number of copies of the repro query in one connection and a big number of EXPLAIN format=json FOR <connection>, e.g. EXPLAIN FORMAT=json FOR CONNECTION 8\G in the other. The actual connection number would need to verified in connection one, of course. Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enable system tablespace encryption again in 8.0.20
Remove innodb.percona_sys_tablespace_encrypt_dblwr test as there is no doublewrite
buffer in system tablespace anymore