Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-9144 : Missing rows after ALGORITHM=INPLACE ALTER under same workl… #5354

Closed
wants to merge 1 commit into from

Commits on Jul 16, 2024

  1. PS-9144 : Missing rows after ALGORITHM=INPLACE ALTER under same workl…

    …oad as PS-9092
    
    https://perconadev.atlassian.net/browse/PS-9144
    
    Problem:
    --------
    ALTER TABLE which rebuilds InnoDB table using INPLACE algorithm might sometimes
    lead to row loss if concurrent purge happens on the table being ALTERed.
    
    Analysis:
    ---------
    New implementation of parallel ALTER TABLE INPLACE in InnoDB was introduced in
    MySQL 8.0.27. Its code is used for online table rebuild even in a single-thread
    case.
    This implementation iterates over all the rows in the table, in general case,
    handling different subtrees of a B-tree in different threads. This iteration
    over table rows needs to be paused, from time to time, to commit InnoDB MTR/
    release page latches it holds. This is necessary to give a way to concurrent
    actions on the B-tree scanned or before flushing rows of new version of table
    from in-memory buffer to the B-tree. In order to resume iteration after such
    pause persistent cursor position saved before pause is restored.
    
    The cause of the problem described above lies in PCursor::restore_position()
    method. This method used for two purposes in the code:
    
    1)  To resume iteration after it was paused in the scenario described
        above.
    2)  To initialize cursor when a thread starts iteration through subtree
        it was assigned to process.
    
    In scenario 2) we restore cursor to the record which has not been
    processed yet. If the record, which position was saved originally when
    subtrees/ranges to process were assigned to threads, has been purged
    meanwhile, the cursor will be restored to preceding record. And the
    cursor needs to be moved to the next record, so we don't start processing
    our subtree from the record belonging to a different thread.
    PCursor::restore_position() handles this by detecting situation when saved
    record was purged and moving to the next record in this case.
    
    However, in scenario 1) we actually restore cursor to the record which
    has been processed already and from which will be doing step to the next
    record right after restore. So iterating to the next record if saved record
    was purged like it is done in PCursor::restore_position(), and which is
    necessary in case 2), leads to double step forward, resulting in our scan
    missing record!
    
    Fix:
    ----
    This patch solves the problem by using different logic for restore of
    cursor position in these two cases.
    
    For case 1) we simply restore position which was saved using
    btr_pcur_t::restore_position(). If the record to which cursor is
    supposed to point has been purged meanwhile, this method will point
    the cursor to preceding record. Then the calling code will iterate to
    next record (i.e. successor of purged record) after restore.
    
    For case 2) we keep pre-fix behavior and correct cursor position
    after restoring if the record which position has been saved originally
    has been purged by moving to the next record in subtree to be processed.
    PCursor::restore_position() method which implements handling of this
    case has been renamed to PCursor::restore_position_for_range() and
    greately simplified.
    dlenev committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    3b76373 View commit details
    Browse the repository at this point in the history