-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Ensure election offchain workers don't overlap #8828
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems good to me, but I want to take a second fresh look
Co-authored-by: Gavin Wood <gavin@parity.io>
Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com>
Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of nits, but I think the fundamental logic is sound.
As pointed out elsewhere, a Drop
implementation on a lock guard might be a good enhancement, but IMO it's not necessary within the scope of this PR.
rewrote the whole thing with the fancier storage lock. Tested well locally, and deploying it now to dry-run on kusama's next election. A new pair of reviews would be good @tomusdrw @coriolinus @thiolliere What I miss is some unit tests, will add again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fundamental logic hasn't changed, just the implementation. I'd just like to see at least one test demonstrating that the OCW won't ever attempt to mine two solutions simultaneously, even if Self::ensure_offchain_repeat_frequency
is satisfied.
// after election finalization, clear OCW solution storage. | ||
if <frame_system::Pallet<T>>::events() | ||
.into_iter() | ||
.filter_map(|event_record| { | ||
let local_event = <T as Config>::Event::from(event_record.event); | ||
local_event.try_into().ok() | ||
}) | ||
.find(|event| { | ||
matches!(event, Event::ElectionFinalized(_)) | ||
}) | ||
.is_some() | ||
{ | ||
unsigned::kill_ocw_solution::<T>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this function is not run on this specific block due to the newly introduced lock ?
It shouldn't harm too much, as it will be removed due to failing feasibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you suggest anywhere else where we would kill the solution?
Thinking about it again, if we simply don't kill this, it will be replaced with a fresh one on the next round. In the past it would have been tricky since we could race around it. But now, the initial OCW will attempt to mine a new solution and replace this, and no other OCW will work in parallel, so it should be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have very convincing suggestion, maybe we could match the phase and remove when the phase is close, but removing on every block when the phase is close is a bit useless too.
anyway I agree with what you say, current implementation is fine, even if not removed, it is not an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
@@ -284,7 +284,7 @@ pub fn testnet_genesis( | |||
}).collect::<Vec<_>>(), | |||
}, | |||
pallet_staking: StakingConfig { | |||
validator_count: initial_authorities.len() as u32 * 2, | |||
validator_count: initial_authorities.len() as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offchain elections are only acceptable if the provide exactly enough validators. So since we have double the initial_authorities
, offchain elections always fail on a dev chain and it produces some warn/erorr logs.
can revert if it offends anything else.
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only a couple of nits in the tests.
bot merge |
Waiting for commit status. |
* Initial version, well tested, should work fine. * Add one last log line * Update frame/election-provider-multi-phase/src/unsigned.rs Co-authored-by: Gavin Wood <gavin@parity.io> * Update frame/election-provider-multi-phase/src/unsigned.rs Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com> * Update frame/election-provider-multi-phase/src/unsigned.rs Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com> * Fix a few more things * fix build * rewrite the whole thing with a proper lock * clean * clean some nits * Add unit tests. * Update primitives/runtime/src/offchain/storage_lock.rs Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com> * Fix test * Fix tests Co-authored-by: Gavin Wood <gavin@parity.io> Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com> Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com> Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
The most common scenario is that we have a cached solution, and it has a worse score. In this case, we do nothing from now on.