feat: support a different timeout for the last replica (backport #1176) #1198

mergify · 2024-08-23T15:27:58Z

Which issue(s) this PR fixes:

What this PR does / why we need it:

The implementation in this PR makes it possible to configure the engine so that there are two engineReplicaTimeouts (short and long). The backends lightly coordinate via a new SharedTimeouts struct to ensure that most of them can time out in the normal way after engineReplicaTimeoutShort, but exactly one of them must wait engineReplicaTimeoutLong to do the same.

Note that this PR does NOT actually configure a different engineReplicaTimeoutLong. My plan is to do that in a followup after this one is approved and we decide exactly how we want to expose the new capability.

Special notes for your reviewer:

Additional documentation or context

Per #1176 (comment), I experimented with a different approach in https://github.com/ejweber/longhorn-engine/tree/8711-last-replica-timeout-previous-attempt. That one didn't work well due to lock contention between I/O operations, replica error handling, and the new logic.

This is an automatic backport of pull request #1176 done by Mergify.

derekbit

LGTM

Longhorn 8711 Signed-off-by: Eric Weber <eric.weber@suse.com> (cherry picked from commit acd296d)

Longhorn 8711 Signed-off-by: Eric Weber <eric.weber@suse.com> (cherry picked from commit bbf32e0)

…outShort Longhorn 8711 Signed-off-by: Eric Weber <eric.weber@suse.com> (cherry picked from commit 405e96f)

mergify bot mentioned this pull request Aug 23, 2024

feat: support a different timeout for the last replica #1176

Merged

derekbit approved these changes Aug 26, 2024

View reviewed changes

ejweber added 3 commits August 26, 2024 13:24

feat(timeout): enable different timeout for last replica

bd96c1c

Longhorn 8711 Signed-off-by: Eric Weber <eric.weber@suse.com> (cherry picked from commit acd296d)

fix(datconn): don't orphan client goroutine when remote is closed

1b0ec8a

Longhorn 8711 Signed-off-by: Eric Weber <eric.weber@suse.com> (cherry picked from commit bbf32e0)

feat(timeout): make engineReplicaTimeoutLong double engineReplicaTime…

f783f88

…outShort Longhorn 8711 Signed-off-by: Eric Weber <eric.weber@suse.com> (cherry picked from commit 405e96f)

derekbit force-pushed the mergify/bp/v1.7.x/pr-1176 branch from bee6e77 to f783f88 Compare August 26, 2024 05:24

derekbit approved these changes Aug 26, 2024

View reviewed changes

derekbit merged commit d00ec45 into v1.7.x Aug 26, 2024
9 checks passed

derekbit deleted the mergify/bp/v1.7.x/pr-1176 branch August 26, 2024 07:49

ejweber mentioned this pull request Aug 26, 2024

[BACKPORT][v1.7.1][IMPROVEMENT] Resilience handling for the last replica timeout longhorn/longhorn#9275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support a different timeout for the last replica (backport #1176) #1198

feat: support a different timeout for the last replica (backport #1176) #1198

mergify bot commented Aug 23, 2024

derekbit left a comment

feat: support a different timeout for the last replica (backport #1176) #1198

feat: support a different timeout for the last replica (backport #1176) #1198

Conversation

mergify bot commented Aug 23, 2024

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

derekbit left a comment

Choose a reason for hiding this comment