Read index retry #12780

wpedrak · 2021-03-16T10:52:35Z

It is second approach (with first being #12762) to solve #12680

This PR is composed of 2 commit, first being refactor of l-read loop and second being implementation of retry mechanism itself.

Drawbacks of this change (would like to seek your opinion on it):

retry time is hardcoded as 500ms
current 7s timeout resulting in error can become up to 7.5s depending on execution flow

server/etcdserver/v3_server.go

ptabor · 2021-03-16T12:34:58Z

server/etcdserver/v3_server.go

 	lg := s.Logger()
+	errorTimer := time.NewTimer(s.Cfg.ReqTimeout())


I don't understand it...

Would following approach work:

retryTimer instead of errorTimer.

both selects merged into a single select.
?

First, I've decided to use 2 selects instead of one because of this article. In short: it is to guard against picking case at random when multiple channels are unblocked. It seemed reasonable when writing code, however now I can't say any good reason to have it this way, so I'll move it to single select.

Second think is "retryTimer instead of errorTimer". I'm not sure if I understand you correctly, but if you suggest having retryTimer (one that measure 500ms) outside of for loop and putting errorTimer (one that measure 7s) in <-time.After(...), then it would not work as we need to have single errorTimer across potentially multiple retryTimer.

A.d. 1. I see. But I think we don't really care which branch will be taken if it happens that multiple activate exactly at the same time. So would go for merging.

A.d. 2: You are right. Potentially you can use top-level 'ticker' (time.NewTicker) to get periodic notification to refresh the request. The benefit is that it automatically will cancel goroutine.

A.d. 2:
I see Timer more intuitive than Ticker here, as "ping me after 500ms" describes concept of timeout better than "ping me every 500ms". However I don't understand below part

The benefit is that it automatically will cancel goroutine

Could you elaborate?

<-time.After(readIndexRetryTime)

under the cover starts go-routine the sleep's some time and populates the channel.

If the select exits for another reason, the goroutne still exists for 'up to 500ms' to populate the channel (that no-one is waiting for).

Good point. I've changed it to single timer initialised outside of loop and I use retryTimer.Reset(readIndexRetryTime) to refresh it.

It is second approach (with first being etcd-io#12762) to solve etcd-io#12680

gyuho

Nice improvements for read index reliability!

Any chance to try this out for #12680?

ptabor · 2021-03-20T10:16:05Z

Nice improvements for read index reliability!

Any chance to try this out for #12680?

@Cjen1 evaluated the solution in #12680 (comment). Bottom 3 charts show reduction of the delay to ~1s with etcd-retry solution.

etcd-postpone solution was in 2 cases ~700ms and in one ~1300ms.

refactored l-read loop in v3_server.go

4b21e38

wpedrak mentioned this pull request Mar 16, 2021

raft: postpone MsgReadIndex until first commit in the term #12762

Merged

ptabor suggested changes Mar 16, 2021

View reviewed changes

wpedrak force-pushed the read_index_retry branch from 6d492e4 to 1df6caf Compare March 16, 2021 14:01

server: add 500ms retries to ReadIndex requests for l-reads

e977923

It is second approach (with first being etcd-io#12762) to solve etcd-io#12680

wpedrak force-pushed the read_index_retry branch from 1df6caf to e977923 Compare March 16, 2021 15:34

ptabor approved these changes Mar 17, 2021

View reviewed changes

gyuho approved these changes Mar 19, 2021

View reviewed changes

ptabor merged commit 30ce606 into etcd-io:master Mar 23, 2021

wpedrak deleted the read_index_retry branch March 23, 2021 14:04

wpedrak mentioned this pull request Mar 24, 2021

etcdserver: resend ReadIndex request on empty apply request #12795

Merged

serathius mentioned this pull request Jul 21, 2022

Plans for v3.4.20 release #14232

Closed

25 tasks

ahrtr mentioned this pull request Jul 25, 2022

[3.4] etcdserver: resend ReadIndex request on empty apply request #14269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read index retry #12780

Read index retry #12780

wpedrak commented Mar 16, 2021 •

edited

Loading

ptabor Mar 16, 2021

wpedrak Mar 16, 2021

ptabor Mar 16, 2021

wpedrak Mar 16, 2021

ptabor Mar 16, 2021

wpedrak Mar 16, 2021

gyuho left a comment

ptabor commented Mar 20, 2021

		lg := s.Logger()
		errorTimer := time.NewTimer(s.Cfg.ReqTimeout())

Read index retry #12780

Read index retry #12780

Conversation

wpedrak commented Mar 16, 2021 • edited Loading

ptabor Mar 16, 2021

Choose a reason for hiding this comment

wpedrak Mar 16, 2021

Choose a reason for hiding this comment

ptabor Mar 16, 2021

Choose a reason for hiding this comment

wpedrak Mar 16, 2021

Choose a reason for hiding this comment

ptabor Mar 16, 2021

Choose a reason for hiding this comment

wpedrak Mar 16, 2021

Choose a reason for hiding this comment

gyuho left a comment

Choose a reason for hiding this comment

ptabor commented Mar 20, 2021

wpedrak commented Mar 16, 2021 •

edited

Loading