From 94634fc258b808071bdc172353375da2513dad45 Mon Sep 17 00:00:00 2001 From: Chris Wedgwood Date: Mon, 22 Feb 2021 13:30:45 -0800 Subject: [PATCH] etcdserver: when using --unsafe-no-fsync write data There are situations where we don't wish to fsync but we do want to write the data. Typically this occurs in clusters where fsync latency (often the result of firmware) transiently spikes. For Kubernetes clusters this causes (many) elections which have knock-on effects such that the API server will transiently fail causing other components fail in turn. By writing the data (buffered and asynchronously flushed, so in most situations the write is fast) and avoiding the fsync we no longer trigger this situation and opportunistically write out the data. Anecdotally: Because the fsync is missing there is the argument that certain types of failure events will cause data corruption or loss, in testing this wasn't seen. If this was to occur the expectation is the member can be readded to a cluster or worst-case restored from a robust persisted snapshot. The etcd members are deployed across isolated racks with different power feeds. An instantaneous failure of all of them simultaneously is unlikely. Testing was usually of the form: * create (Kubernetes) etcd write-churn by creating replicasets of some 1000s of pods * break/fail the leader Failure testing included: * hard node power-off events * disk removal * orderly reboots/shutdown In all cases when the node recovered it was able to rejoin the cluster and synchronize. --- wal/wal.go | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/wal/wal.go b/wal/wal.go index 1aced50bdd1..2ce3dc4118f 100644 --- a/wal/wal.go +++ b/wal/wal.go @@ -789,14 +789,16 @@ func (w *WAL) cut() error { } func (w *WAL) sync() error { - if w.unsafeNoSync { - return nil - } if w.encoder != nil { if err := w.encoder.flush(); err != nil { return err } } + + if w.unsafeNoSync { + return nil + } + start := time.Now() err := fileutil.Fdatasync(w.tail().File)