-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engineccl: filesystem mutex held during I/O #98051
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-storage
Storage Team
Comments
jbowens
added
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
A-storage
Relating to our storage engine (Pebble) on-disk storage.
T-storage
Storage Team
labels
Mar 6, 2023
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Apr 25, 2024
Both fs.FileRegistry and engineccl.DataKeyManager held an internal mutex when updating their state, that included write IO to to update persistent state. This would block readers of the state, specifically file reads that need a file registry entry and data key for the file to successfully open and read a file. Blocking these reads due to slow or stalled write IO is not desirable, since the read could succeed if the relevant data is in the page cache. Specifically, with the new WAL failover feature, we expect the store to keep functioning when disk writes are temporarily stalled, since the WAL can failover. This expectation is not met if essential reads block on non-essential writes that are stalled. This PR changes the locking in the FileRegistry and DataKeyManager to prevent writes from interfering with concurrent reads. Epic: none Fixes: cockroachdb#98051 Fixes: cockroachdb#122364 Release note: None
craig bot
pushed a commit
that referenced
this issue
Apr 25, 2024
123057: fs,engineccl: allow reads to continue when writes are stalled r=raduberinde,jbowens a=sumeerbhola Both fs.FileRegistry and engineccl.DataKeyManager held an internal mutex when updating their state, that included write IO to to update persistent state. This would block readers of the state, specifically file reads that need a file registry entry and data key for the file to successfully open and read a file. Blocking these reads due to slow or stalled write IO is not desirable, since the read could succeed if the relevant data is in the page cache. Specifically, with the new WAL failover feature, we expect the store to keep functioning when disk writes are temporarily stalled, since the WAL can failover. This expectation is not met if essential reads block on non-essential writes that are stalled. This PR changes the locking in the FileRegistry and DataKeyManager to prevent writes from interfering with concurrent reads. Epic: none Fixes: #98051 Fixes: #122364 Release note: None Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
sumeerbhola
added a commit
to sumeerbhola/cockroach
that referenced
this issue
Apr 26, 2024
Both fs.FileRegistry and engineccl.DataKeyManager held an internal mutex when updating their state, that included write IO to to update persistent state. This would block readers of the state, specifically file reads that need a file registry entry and data key for the file to successfully open and read a file. Blocking these reads due to slow or stalled write IO is not desirable, since the read could succeed if the relevant data is in the page cache. Specifically, with the new WAL failover feature, we expect the store to keep functioning when disk writes are temporarily stalled, since the WAL can failover. This expectation is not met if essential reads block on non-essential writes that are stalled. This PR changes the locking in the FileRegistry and DataKeyManager to prevent writes from interfering with concurrent reads. Epic: none Fixes: cockroachdb#98051 Fixes: cockroachdb#122364 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-storage
Storage Team
The encryption-at-rest file registry holds a mutex when making changes to the registry. It holds this mutex while writing to and fsyncing the registry. This prevents concurrent file opens from proceeding (eg, from table cache misses or readahead).
Related to #80946.
Jira issue: CRDB-25051
Epic: CRDB-26603
The text was updated successfully, but these errors were encountered: