-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock mutex in more client methods. #2567
Conversation
@@ -161,7 +161,7 @@ impl ChainState { | |||
self.pending_blobs.clear(); | |||
} | |||
|
|||
pub fn preparing_block(&self) -> Arc<Mutex<()>> { | |||
self.preparing_block.clone() | |||
pub fn client_mutex(&self) -> Arc<Mutex<()>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this pub
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the ChainClient
has to use it. I can make it pub(super)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole struct is private to linera_core::client
. I'm usually suspicious of pub(super)
, pub(crate)
and friends because, even though they're sometimes necessary (especially for use in macros), they imply non-local knowledge of the structure the code is embedded into. The upshot is that types look different (expose different behaviour) depending on where they're imported, which usually just results in a lot of spurious changes when refactoring, but coupled with conditional compilation could lead to some hard-to-spot compilation breakages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also prefer just using pub
in these cases. Happy to revert this in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a rule, public APIs need to be minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that they are, since ChainState
itself is only visible in the client
module, and not exported further. So effectively all it's methods are pub(super)
anyway.
let mutex = self.state().client_mutex(); | ||
let _guard = mutex.lock_owned().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need two separate lines when lock_owned
is used? (and otherwise why lock_owned
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do! self.state()
returns a ChainGuard
that can't be held across an await point… specifically so we wouldn't ever lock an entry in the chain states map locked.
And it looks like Rust would drop that guard only after the whole expression, i.e. after the await
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mutex is a hack, but I'm happy enough with the hack being extended to cover more cases in anticipation of a larger linera_core::client
refactor. I think it's good that at least the locking logic remains within the client this time.
Motivation
The tests in #2538 fail. A (possibly the) scenario that can cause a
processInbox
mutation to unexpectedly not produce a block is the following:processInbox
is called: It sees that B has already been handled. On the other hand, the inbox is still empty, so it doesn't create a block.Proposal
Use the mutex in the
ChainState
in more places, to ensure that certain tasks don't overlap.Test Plan
I ran the
test_wasm_end_to_end_fungible::storage_service_grpc
test locally 50 times successfully, together with the optimization in #2538 and the fix in #2562.Release Plan
devnet
branch, thentestnet
branch, thenLinks