-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs,engineccl: allow reads to continue when writes are stalled #123057
fs,engineccl: allow reads to continue when writes are stalled #123057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 9 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/storage/fs/file_registry.go
line 87 at r1 (raw file):
writeMu struct { syncutil.Mutex mu struct {
nit: I find the naming of the mutexes confusing, but I don't really have any suggestions. Is there some way to name these to give more of a sense that writeMu
and writeMu.mu
are distinct and convey the critical sections they guard? maybe s/writeMu/metaMu/
and s/mu/entriesMu/
?
Both fs.FileRegistry and engineccl.DataKeyManager held an internal mutex when updating their state, that included write IO to to update persistent state. This would block readers of the state, specifically file reads that need a file registry entry and data key for the file to successfully open and read a file. Blocking these reads due to slow or stalled write IO is not desirable, since the read could succeed if the relevant data is in the page cache. Specifically, with the new WAL failover feature, we expect the store to keep functioning when disk writes are temporarily stalled, since the WAL can failover. This expectation is not met if essential reads block on non-essential writes that are stalled. This PR changes the locking in the FileRegistry and DataKeyManager to prevent writes from interfering with concurrent reads. Epic: none Fixes: cockroachdb#98051 Fixes: cockroachdb#122364 Release note: None
47954b1
to
6518185
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens)
pkg/storage/fs/file_registry.go
line 87 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: I find the naming of the mutexes confusing, but I don't really have any suggestions. Is there some way to name these to give more of a sense that
writeMu
andwriteMu.mu
are distinct and convey the critical sections they guard? maybes/writeMu/metaMu/
ands/mu/entriesMu/
?
I am not super happy with the names either. I prefer to keep writeMu
since it is the one definitely needed for writes, but not necessary for reads. I initially had mu called readMu, but that was slightly misleading since writers also need it, and one can read without it (by holding writeMu). So I ended up with this.
I tweaked the comment a bit to say writeMu.mu
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens and @sumeerbhola)
pkg/storage/fs/file_registry.go
line 87 at r1 (raw file):
Previously, sumeerbhola wrote…
I am not super happy with the names either. I prefer to keep
writeMu
since it is the one definitely needed for writes, but not necessary for reads. I initially had mu called readMu, but that was slightly misleading since writers also need it, and one can read without it (by holding writeMu). So I ended up with this.I tweaked the comment a bit to say
writeMu.mu
.
Do we need getters to be able to read from entries
while holding just writeMu
? If we require readers to just hold the RWMutex, that would make things a lot simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens and @RaduBerinde)
pkg/storage/fs/file_registry.go
line 87 at r1 (raw file):
Do we need getters to be able to read from
entries
while holding justwriteMu
?
Hmm, I don't quite understand. I'll write out the line of thinking that motivated this:
We want:
- a single writer: the writer reads the in-memory state to compute the mutation. Then does IO. Then updates the in-memory state.
- readers to be able to read the in-memory state while the write is in-progress.
So we need a mutex that will be held by the writer throughout its in-progress write. That is writeMu.
We need a separate mutex for reads to continue while the write is in-progress. That is mu. Naturally, mu also needs to be acquired when mutating the in-memory state.
So one can have the pattern:
- writeMu to serialize writers
- mu to read/write the in-memory state.
With writeMu > mu.
But acquiring mu to read during a write is (a) not really needed, (b) it is not providing any needed guarantee, since the writer is making decisions about the mutation based on the read, without holding mu after doing the read. The guarantee of no staleness in the mutation (or lost mutations) is coming from writerMu. So there isn't any need to hold mu when reading. This also improves code readability since we don't have to hide every read of the in-memory state during the write inside a closure with a mu acquisition and deferreed release.
Previously, sumeerbhola wrote…
That makes sense, but then I'd just separate out the entries from under
The RWMutex just protects the entires map, the other data is protected by |
But then it creates confusion if the code accesses An example of this readability issue are the two implementations below (I picked
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I understand now.
What is the hot path for reads here? It's the GetKey function? We could use a sync.Map
(which based on the documentation would probably be faster than a map + RWMutex in our case) for the entries. activeKey could also be an atomic pointer.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens and @sumeerbhola)
There isn't really a hot path, which is why the existing mutex code was good enough until we realized we can't block these reads because of stalled writes, since WAL failover depends on the reads continuing to work. We will do a read when creating a |
TFTRs! |
bors r=raduberinde,jbowens |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 6518185 to blathers/backport-release-24.1-123057: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Both fs.FileRegistry and engineccl.DataKeyManager held an internal mutex when updating their state, that included write IO to to update persistent state. This would block readers of the state, specifically file reads that need a file registry entry and data key for the file to successfully open and read a file.
Blocking these reads due to slow or stalled write IO is not desirable, since the read could succeed if the relevant data is in the page cache. Specifically, with the new WAL failover feature, we expect the store to keep functioning when disk writes are temporarily stalled, since the WAL can failover. This expectation is not met if essential reads block on non-essential writes that are stalled.
This PR changes the locking in the FileRegistry and DataKeyManager to prevent writes from interfering with concurrent reads.
Epic: none
Fixes: #98051
Fixes: #122364
Release note: None