PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema… #3143

satya-bodapati · 2019-04-16T11:59:32Z

…phore wait > 600 assertion

Problem:

A long running ALTER TABLE ADD INDEX with concurrent inserts causes sempahore waits and
eventually crashes the server.

To see this problem you need to have

A table with lots of data. Add index should take significant time to create many pages
Compressed table. This is becuase CPU is spent on compress() with mtr already latching index->lock
More time spent by mtr, more waits by the INSERT. Helps in crash.
Concurrent inserts when ALTER is running. The inserts should happen specifically after the read phase
of ALTER and after Bulk load index build (bottump build) started.

The entire bulkload process latches the index->lock X mode for the whole duration of bottom up build of index. The index->lock is held across mtrs (because many pages are created during index build).

An example is this: Page1 mtr latches index->lock X mode, when page is full, a sibling page is created.
The sibling Page 2 (mtr) also acquires index->lock X mode.

Recursive X latching is allowed by same thread. Now Page 1 mtr commits but index->lock is still held by Page 2. Now when page 2 is full, another sibling page is created. Sibling Page 3 now acquires index->lock X mode. Page 2 mtr commits.. This goes on and on. Also happens with Pages at non-root levels.

Essentially the time index->lock is held is equally proportional to number of pages/mtrs created. And compress tables helps in making mtr take a bit more time in doing compress() and duration of each mtr is higher with compressed tables.

At this stage, a concurrent INSERT comes and since there is concurrent DDL and the index is uncommited, this insert should go to online ALTER log. It tries to acquire index->lock in S mode.

Bulk load index already took index->lock X mode and is not going to release it until is over.

INSERT thread keeps on waiting, and when the wait crosses 600 seconds to acquire index->lock, it will crash.

Fix:

INSERT thread acquires index->lock to check the index online status. During the bulk load index build, there is no concurrent insert or read. So there is no need to acquire index->lock at all.

Bulk load index build is also used to create indexes in table rebuild cases. For example DROP COLUMN, ADD COLUMN. The indexes on intermediate table (#sql-ib..) are built using bulk load insert. A concurrent DMLs at this stage do not acquire index->lock. So acquiring index->lock on the intermediate table, which is not visible to anyone else doesn't block concurrent DMLs.

Ideally we can try to remove all index->lock X acquisitions in bulk load index build path. We play safe and remove acquisitions only incase of uncommited indexes. The other path (bulk load used during rebuild) is not affected anyway.

satya-bodapati · 2019-04-16T14:28:42Z

Submitted jenkins job with --big-test on 5.7 pipeline

https://ps57.cd.percona.com/job/percona-server-5.7-pipeline/923/

laurynas-biveinis

Can you show the point where INSERT thread tried to acquire index latch in S?

mysql-test/suite/innodb/t/percona_bulk_load_insert.test

laurynas-biveinis · 2019-04-17T12:35:18Z

Also, should 5.6 be fixed too?

satya-bodapati · 2019-04-17T12:36:52Z

Also, should 5.6 be fixed too?

no, bulk load (WL#7277) is added only in 5.7

…phore wait > 600 assertion Problem: -------- A long running ALTER TABLE ADD INDEX with concurrent inserts causes sempahore waits and eventually crashes the server. To see this problem you need to have 1. A table with lots of data. Add index should take significant time to create many pages 2. Compressed table. This is becuase CPU is spent on compress() with mtr already latching index->lock More time spent by mtr, more waits by the INSERT. Helps in crash. 3. Concurrent inserts when ALTER is running. The inserts should happen specifically after the read phase of ALTER and after Bulk load index build (bottump build) started. The entire bulkload process latches the index->lock X mode for the whole duration of bottom up build of index. The index->lock is held across mtrs (because many pages are created during index build). An example is this: Page1 mtr latches index->lock X mode, when page is full, a sibling page is created. The sibling Page 2 (mtr) also acquires index->lock X mode. Recursive X latching is allowed by same thread. Now Page 1 mtr commits but index->lock is still held by Page 2. Now when page 2 is full, another sibling page is created. Sibling Page 3 now acquires index->lock X mode. Page 2 mtr commits.. This goes on and on. Also happens with Pages at non-root levels. Essentially the time index->lock is held is equally proportional to number of pages/mtrs created. And compress tables helps in making mtr take a bit more time in doing compress() and duration of each mtr is higher with compressed tables. At this stage, a concurrent INSERT comes and since there is concurrent DDL and the index is uncommited, this insert should go to online ALTER log. It tries to acquire index->lock in S mode. Bulk load index already took index->lock X mode and is not going to release it until is over. INSERT thread keeps on waiting, and when the wait crosses 600 seconds to acquire index->lock, it will crash. Fix: ---- INSERT thread acquires index->lock to check the index online status. During the bulk load index build, there is no concurrent insert or read. So there is no need to acquire index->lock at all. Bulk load index build is also used to create indexes in table rebuild cases. For example DROP COLUMN, ADD COLUMN. The indexes on intermediate table (#sql-ib..) are built using bulk load insert. A concurrent DMLs at this stage do not acquire index->lock. So acquiring index->lock on the intermediate table, which is not visible to anyone else doesn't block concurrent DMLs. Ideally we can try to remove all index->lock X acquisitions in bulk load index build path. We play *safe* and remove acquisitions only incase of uncommited indexes. The other path (bulk load used during rebuild) is not affected anyway.

satya-bodapati · 2019-04-17T13:36:32Z

Can you show the point where INSERT thread tried to acquire index latch in S?

Thread 26 (Thread 0x7ffff0bf9700 (LWP 15407)):
#0 0x00007ffff7bc39f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7fff8c038120) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7fff8c0380d0, cond=0x7fff8c0380f8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7fff8c0380f8, mutex=0x7fff8c0380d0) at pthread_cond_wait.c:655
#3 0x0000555556f99b30 in os_event::wait (this=0x7fff8c0380c0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/os0event.h:156
#4 0x0000555556f99482 in os_event::wait_low (this=0x7fff8c0380c0, reset_sig_count=1) at /home/satya/WORK/PS-5.7-3410/storage/innobase/os/os0event.cc:131
#5 0x0000555556f99897 in os_event_wait_low (event=0x7fff8c0380c0, reset_sig_count=1) at /home/satya/WORK/PS-5.7-3410/storage/innobase/os/os0event.cc:328
#6 0x00005555570b5ab1 in sync_array_wait_event (arr=0x55555853d788, cell=@0x7ffff0bf5240: 0x7ffff135e0a8)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/sync/sync0arr.cc:475
#7 0x00005555570b9198 in rw_lock_s_lock_spin (lock=0x7fff8c038080, pass=0, file_name=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc",
line=2953) at /home/satya/WORK/PS-5.7-3410/storage/innobase/sync/sync0rw.cc:419
---Type to continue, or q to quit---
#8 0x0000555556ff9439 in rw_lock_s_lock_func (lock=0x7fff8c038080, pass=0, file_name=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc",
line=2953) at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/sync0rw.ic:433
#9 0x0000555556ff94c6 in pfs_rw_lock_s_lock_func (lock=0x7fff8c038080, pass=0,
file_name=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc", line=2953)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/sync0rw.ic:796
#10 0x0000555557007b9d in mtr_t::s_lock (this=0x7ffff0bf5950, lock=0x7fff8c038080,
file=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc", line=2953)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/mtr0mtr.ic:244
#11 0x0000555557005196 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7fff8c037e90, offsets_heap=0x7fff881f1248, heap=0x7fff88d59d48, entry=0x7fff88124300,
trx_id=0, thr=0x7fff881f0aa0, dup_chk_only=false) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:2953
#12 0x000055555700699f in row_ins_sec_index_entry (index=0x7fff8c037e90, entry=0x7fff88124300, thr=0x7fff881f0aa0, dup_chk_only=false)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3444
#13 0x0000555557006b4c in row_ins_index_entry (index=0x7fff8c037e90, entry=0x7fff88124300, thr=0x7fff881f0aa0)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3495
#14 0x00005555570070bc in row_ins_index_entry_step (node=0x7fff881f0730, thr=0x7fff881f0aa0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3643
#15 0x0000555557007478 in row_ins (node=0x7fff881f0730, thr=0x7fff881f0aa0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3785
#16 0x0000555557007ab5 in row_ins_step (thr=0x7fff881f0aa0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3970
#17 0x00005555570298de in row_insert_for_mysql_using_ins_graph (mysql_rec=0x7fff88ce4dd0 "\360", prebuilt=0x7fff881f0130)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0mysql.cc:2324
#18 0x0000555557029ed0 in row_insert_for_mysql (mysql_rec=0x7fff88ce4dd0 "\360", prebuilt=0x7fff881f0130)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0mysql.cc:2447
#19 0x0000555556ea57ab in ha_innobase::write_row (this=0x7fff88df5460, record=0x7fff88ce4dd0 "\360")
at /home/satya/WORK/PS-5.7-3410/storage/innobase/handler/ha_innodb.cc:8704
#20 0x0000555556351f57 in handler::ha_write_row (this=0x7fff88df5460, buf=0x7fff88ce4dd0 "\360") at /home/satya/WORK/PS-5.7-3410/sql/handler.cc:8529
#21 0x0000555556c88c5c in write_record (thd=0x7fff88000dd0, table=0x7fff881ef700, info=0x7ffff0bf6fd0, update=0x7ffff0bf7050)
at /home/satya/WORK/PS-5.7-3410/sql/sql_insert.cc:1881
#22 0x0000555556c85a8e in Sql_cmd_insert::mysql_insert (this=0x7fff88d83e38, thd=0x7fff88000dd0, table_list=0x7fff88d836f8)
at /home/satya/WORK/PS-5.7-3410/sql/sql_insert.cc:773
#23 0x0000555556c8cb97 in Sql_cmd_insert::execute (this=0x7fff88d83e38, thd=0x7fff88000dd0) at /home/satya/WORK/PS-5.7-3410/sql/sql_insert.cc:3120
#24 0x00005555569b9d6d in mysql_execute_command (thd=0x7fff88000dd0, first_level=true) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:3810
#25 0x00005555569c0255 in mysql_parse (thd=0x7fff88000dd0, parser_state=0x7ffff0bf84a0, update_userstat=false) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:5901
#26 0x00005555569b4379 in dispatch_command (thd=0x7fff88000dd0, com_data=0x7ffff0bf8dc0, command=COM_QUERY) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:1528
#27 0x00005555569b3104 in do_command (thd=0x7fff88000dd0) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:1053
#28 0x0000555556af7d8f in handle_connection (arg=0x555558d5ec60) at /home/satya/WORK/PS-5.7-3410/sql/conn_handler/connection_handler_per_thread.cc:312
#29 0x0000555556df0a44 in pfs_spawn_thread (arg=0x555558d57ef0) at /home/satya/WORK/PS-5.7-3410/storage/perfschema/pfs.cc:2190
#30 0x00007ffff7bbd6db in start_thread (arg=0x7ffff0bf9700) at pthread_create.c:463
#31 0x00007ffff5e6288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

====
row_ins_sec_index_entry_low:

948 const bool check = !index->is_committed(); │
│2949 if (check) { │
│2950 DEBUG_SYNC_C("row_ins_sec_index_enter"); │
│2951 if (mode == BTR_MODIFY_LEAF) { │
│2952 search_mode |= BTR_ALREADY_S_LATCHED; │

│2953 mtr_s_lock(dict_index_get_lock(index), &mtr); │
│2954 } else { │
│2955 mtr_sx_lock(dict_index_get_lock(index), &mtr); │

laurynas-biveinis

I am still having trouble wrapping around my head on why is it safe not to X-latch index lock here, given the comment at row_ins_sec_index_entry_low:
/* Ensure that we acquire index->lock when inserting into an
index with index->online_status == ONLINE_INDEX_COMPLETE, but
could still be subject to rollback_inplace_alter_table().
This prevents a concurrent change of index->online_status.
and upstream fix at f2f7d43

satya-bodapati · 2019-04-18T07:24:56Z

I am still having trouble wrapping around my head on why is it safe not to X-latch index lock here, given the comment at row_ins_sec_index_entry_low:
/* Ensure that we acquire index->lock when inserting into an
index with index->online_status == ONLINE_INDEX_COMPLETE, but
could still be subject to rollback_inplace_alter_table().
This prevents a concurrent change of index->online_status.
and upstream fix at f2f7d43

Am not removing the index->lock here (in row_ins_sec_index_entry_low()). The acquisition of index->lock is still there. As commented here, the index->lock is necessary when index->onine_status is read/changed.

And bulk load insert build doesn't change index->online_status at all (from BtrBulk::init() to Btrbulk::finish()). Please check

laurynas-biveinis · 2019-04-18T09:27:57Z

OK, looks good, only the testcase question remains

mysql-test/suite/innodb/t/percona_bulk_load_insert.test

satya-bodapati requested a review from laurynas-biveinis April 16, 2019 11:59

satya-bodapati self-assigned this Apr 16, 2019

satya-bodapati force-pushed the PS-5.7-3410 branch from b11e341 to 088dd77 Compare April 16, 2019 12:00

laurynas-biveinis suggested changes Apr 17, 2019

View reviewed changes

satya-bodapati force-pushed the PS-5.7-3410 branch from 088dd77 to 7ebaef8 Compare April 17, 2019 13:37

laurynas-biveinis reviewed Apr 18, 2019

View reviewed changes

laurynas-biveinis approved these changes Apr 18, 2019

View reviewed changes

mysql-test/suite/innodb/t/percona_bulk_load_insert.test Show resolved Hide resolved

satya-bodapati merged commit 7855bd5 into percona:5.7 Apr 19, 2019

satya-bodapati deleted the PS-5.7-3410 branch June 14, 2019 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema… #3143

PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema… #3143

satya-bodapati commented Apr 16, 2019

satya-bodapati commented Apr 16, 2019

laurynas-biveinis left a comment

laurynas-biveinis commented Apr 17, 2019

satya-bodapati commented Apr 17, 2019

satya-bodapati commented Apr 17, 2019

laurynas-biveinis left a comment

satya-bodapati commented Apr 18, 2019

laurynas-biveinis commented Apr 18, 2019

PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema… #3143

PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema… #3143

Conversation

satya-bodapati commented Apr 16, 2019

Problem:

Fix:

satya-bodapati commented Apr 16, 2019

laurynas-biveinis left a comment

Choose a reason for hiding this comment

laurynas-biveinis commented Apr 17, 2019

satya-bodapati commented Apr 17, 2019

satya-bodapati commented Apr 17, 2019

laurynas-biveinis left a comment

Choose a reason for hiding this comment

satya-bodapati commented Apr 18, 2019

laurynas-biveinis commented Apr 18, 2019