Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-9107 : [ERROR] [MY-013183] [InnoDB] Assertion failure: ibuf0ibuf.c… #5257

Merged
merged 1 commit into from
Mar 20, 2024

Conversation

satya-bodapati
Copy link
Contributor

@satya-bodapati satya-bodapati commented Mar 13, 2024

…c:3833:ib::fatal triggered thread

https://perconadev.atlassian.net/browse/PS-9107

Problem:

Two threads are trying to delete the same ibuf rec. The first one succeeds with optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants to position the cursor on the record deleted by the first thread. It cannot. The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks), innodb tries to merge/contract ibuf in background. So it does this by reading actual pages from disk. So it leaves the job to IO threads. On IO read, we apply pending change buffer entries. This thread (on certain conditions), can decide to a sync read vs async read. In our case, it is an async read. Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a change but the ibuf has reached to its max size, so it initiates a ibuf_contract(). See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor at a random record and it can happen that it sees the same space_id:page_no as the IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the actual read of secondary index pages. Hey, the tablespace is gone! whats the point in bringing those pages to buffer pool. Hence it decides to delete all ibuf records belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf records of the same page. IO thread reading a secondary index page and contract ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task) user threads doing ibuf_contract, they process entries belonging to a different tablespace. And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:

ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0, the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed. And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because the window between the actual deletion of the tablespace object and the user drop tablespace command is bigger. In 8.0, space object is still available, so flags is NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr. We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in ibuf_merge_or_delete_for_page(). We will use the same call in ibuf_restore_pos() to determine if the tablespace is being deleted. Additional state of 'being deleted' is handled as well.

@satya-bodapati satya-bodapati self-assigned this Mar 13, 2024
@satya-bodapati
Copy link
Contributor Author

Copy link
Contributor

@dlenev dlenev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Satya!

Here are a couple of comments about your patch.

Also perhaps you can mention what testing was done for this patch in the patch comment? AFAIU we don't have a test case which is reasonably small to be included in the test suite, am I right?

storage/innobase/ibuf/ibuf0ibuf.cc Outdated Show resolved Hide resolved
storage/innobase/ibuf/ibuf0ibuf.cc Outdated Show resolved Hide resolved
@satya-bodapati
Copy link
Contributor Author

satya-bodapati commented Mar 19, 2024

Testing was done using the script provided in PS-9107. Mtr testcase is not possible because of nature of ibuf randomness.
ibuf , to contract, it opens a cursors at a 'random' ibuf index page. Also the IO threads and async reads, with the concurrent drop (at the right moment) makes it way harder. IO thread is background thread without any debug sync control.

Copy link
Contributor

@dlenev dlenev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

…c:3833:ib::fatal triggered thread

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
@satya-bodapati
Copy link
Contributor Author

Thanks for the review @dlenev !

@satya-bodapati satya-bodapati merged commit cff6f3d into percona:8.0 Mar 20, 2024
25 checks passed
satya-bodapati added a commit to satya-bodapati/percona-server that referenced this pull request Mar 21, 2024
…c:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.

(cherry picked from commit cff6f3d)
satya-bodapati added a commit to satya-bodapati/percona-server that referenced this pull request Mar 21, 2024
…c:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.

(cherry picked from commit cff6f3d)
oleksandr-kachan pushed a commit to oleksandr-kachan/percona-server that referenced this pull request Apr 29, 2024
…c:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.

(cherry picked from commit cff6f3d)
oleksandr-kachan pushed a commit to oleksandr-kachan/percona-server that referenced this pull request Apr 29, 2024
…ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.

(cherry picked from commit cff6f3d)
VarunNagaraju pushed a commit to VarunNagaraju/percona-server that referenced this pull request May 29, 2024
…c:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
oleksandr-kachan added a commit to oleksandr-kachan/percona-server that referenced this pull request May 29, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)"

This reverts commit ccd2e08.
VarunNagaraju pushed a commit to VarunNagaraju/percona-server that referenced this pull request May 31, 2024
…c:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
VarunNagaraju pushed a commit to VarunNagaraju/percona-server that referenced this pull request Jun 4, 2024
…c:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
VarunNagaraju pushed a commit to VarunNagaraju/percona-server that referenced this pull request Jun 5, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
VarunNagaraju pushed a commit to VarunNagaraju/percona-server that referenced this pull request Jun 10, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
VarunNagaraju pushed a commit to VarunNagaraju/percona-server that referenced this pull request Jun 12, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
VarunNagaraju pushed a commit to VarunNagaraju/percona-server that referenced this pull request Jun 12, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
inikep pushed a commit that referenced this pull request Sep 25, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
inikep pushed a commit to inikep/percona-server that referenced this pull request Sep 25, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
inikep pushed a commit to inikep/percona-server that referenced this pull request Oct 28, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
inikep pushed a commit that referenced this pull request Oct 30, 2024
…ailure: ibuf0ibuf.cc:3833:ib::fatal triggered thread (#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants