feat: support a different timeout for the last replica (backport #1176) #1198
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue(s) this PR fixes:
longhorn/longhorn#8711
What this PR does / why we need it:
The implementation in this PR makes it possible to configure the engine so that there are two
engineReplicaTimeouts
(short and long). The backends lightly coordinate via a newSharedTimeouts
struct to ensure that most of them can time out in the normal way afterengineReplicaTimeoutShort
, but exactly one of them must waitengineReplicaTimeoutLong
to do the same.Note that this PR does NOT actually configure a differentengineReplicaTimeoutLong
. My plan is to do that in a followup after this one is approved and we decide exactly how we want to expose the new capability.Special notes for your reviewer:
Additional documentation or context
Per #1176 (comment), I experimented with a different approach in https://github.com/ejweber/longhorn-engine/tree/8711-last-replica-timeout-previous-attempt. That one didn't work well due to lock contention between I/O operations, replica error handling, and the new logic.
This is an automatic backport of pull request #1176 done by Mergify.