Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock mutex in more client methods. #2567

Merged
merged 2 commits into from
Oct 4, 2024
Merged

Lock mutex in more client methods. #2567

merged 2 commits into from
Oct 4, 2024

Conversation

afck
Copy link
Contributor

@afck afck commented Oct 3, 2024

Motivation

The tests in #2538 fail. A (possibly the) scenario that can cause a processInbox mutation to unexpectedly not produce a block is the following:

  • Chain 1 creates a block B that sends a message to chain 2.
  • The running node service of the owner of chain 2 receives the notification from the validators.
  • The node service's client listener downloads B and starts processing it. It updates the chain state and puts the message in the outbox…
  • Now processInbox is called: It sees that B has already been handled. On the other hand, the inbox is still empty, so it doesn't create a block.
  • …finally the client listener task puts the messages in chain 2's inbox.

Proposal

Use the mutex in the ChainState in more places, to ensure that certain tasks don't overlap.

Test Plan

I ran the test_wasm_end_to_end_fungible::storage_service_grpc test locally 50 times successfully, together with the optimization in #2538 and the fix in #2562.

Release Plan

  • These changes should be backported to the latest devnet branch, then
    • be released in a new SDK.
  • These changes should be backported to the latest testnet branch, then
    • be released in a new SDK.

Links

@afck afck requested review from Twey and ma2bd October 3, 2024 16:51
@@ -161,7 +161,7 @@ impl ChainState {
self.pending_blobs.clear();
}

pub fn preparing_block(&self) -> Arc<Mutex<()>> {
self.preparing_block.clone()
pub fn client_mutex(&self) -> Arc<Mutex<()>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this pub ?

Copy link
Contributor Author

@afck afck Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the ChainClient has to use it. I can make it pub(super).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole struct is private to linera_core::client. I'm usually suspicious of pub(super), pub(crate) and friends because, even though they're sometimes necessary (especially for use in macros), they imply non-local knowledge of the structure the code is embedded into. The upshot is that types look different (expose different behaviour) depending on where they're imported, which usually just results in a lot of spurious changes when refactoring, but coupled with conditional compilation could lead to some hard-to-spot compilation breakages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also prefer just using pub in these cases. Happy to revert this in a later PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rule, public APIs need to be minimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that they are, since ChainState itself is only visible in the client module, and not exported further. So effectively all it's methods are pub(super) anyway.

Comment on lines +874 to +875
let mutex = self.state().client_mutex();
let _guard = mutex.lock_owned().await;
Copy link
Contributor

@ma2bd ma2bd Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need two separate lines when lock_owned is used? (and otherwise why lock_owned?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do! self.state() returns a ChainGuard that can't be held across an await point… specifically so we wouldn't ever lock an entry in the chain states map locked.

And it looks like Rust would drop that guard only after the whole expression, i.e. after the await.

Copy link
Contributor

@Twey Twey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mutex is a hack, but I'm happy enough with the hack being extended to cover more cases in anticipation of a larger linera_core::client refactor. I think it's good that at least the locking logic remains within the client this time.

@afck afck marked this pull request as ready for review October 4, 2024 09:38
@afck afck merged commit 27c53b0 into linera-io:main Oct 4, 2024
5 checks passed
@afck afck deleted the client-mutex branch October 4, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants