forked from alinush/6.824-lecture-notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
l08-harp.html
657 lines (567 loc) · 21.8 KB
/
l08-harp.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
<h1>6.824 2015 Lecture 8: Harp</h1>
<p><strong>Note:</strong> These lecture notes were slightly modified from the ones posted on the 6.824
<a href="http://nil.csail.mit.edu/6.824/2015/schedule.html">course website</a> from Spring 2015.</p>
<p>Paper: <a href="papers/bliskov-harp.pdf">Replication in the Harp File System</a></p>
<ul>
<li>Liskov, Ghemawat, Gruber, Johnson, Shrira, Williams</li>
<li>SOSP 1991</li>
</ul>
<h4>Why are we reading this paper?</h4>
<ul>
<li>Harp was the first complete primary/backup system that dealt w/ partition</li>
<li>It's a complete case study of a replicated service (file server)</li>
<li>It uses Raft-like replication techniques</li>
</ul>
<h4>How could a 1991 paper still be worth reading?</h4>
<ul>
<li>Harp introduced techniques that are still widely used</li>
<li>There are few papers describing complete replicated systems</li>
</ul>
<h4>The paper is a mix of fundamentals and incidentals</h4>
<ul>
<li>We care a lot about replication</li>
<li>We may not care much about NFS specifically
<ul>
<li>But we care a lot about the challenges faced when integrating
a real application with a replication protocol.</li>
</ul></li>
<li>And we care about where optimization is possible.</li>
</ul>
<h4>I'm going to focus on parts of Harp that aren't already present in Raft.</h4>
<ul>
<li>But note that Harp pre-dates Raft by 20+ years.</li>
<li>Raft is, to a great extent, a tutorial on ideas pioneered by Harp.
<ul>
<li>Though they differ in many details.</li>
</ul></li>
</ul>
<h4>What does Harp paper explain that Raft paper does not?</h4>
<ul>
<li>Adapting a complex service to state machine abstraction
<ul>
<li>e.g. possibility of applying an operation twice</li>
</ul></li>
<li>Lots of optimizations
<ul>
<li>pipelining of requests to backup</li>
<li>witness, to reduce the # of replicas</li>
<li>primary-only execution of read-only operations using leases</li>
</ul></li>
<li>Efficient re-integration of re-started server with large state
<ul>
<li>Don't want to do things like copying entire disks</li>
<li>"Catch-up" for re-joining replicas</li>
</ul></li>
<li>Power failure, including simultaneous failure of all servers</li>
<li>Efficient persistence on disk</li>
<li>Freeing of log</li>
</ul>
<h4>Harp authors had not implemented recovery yet</h4>
<ul>
<li>Earlier paper (1988) describes <a href="http://www.pmg.csail.mit.edu/papers/vr.pdf">View-stamped Replication</a></li>
<li><a href="http://pmg.csail.mit.edu/papers/vr-revisited.pdf">Later (2006) paper</a> has clearer description, though a bit different:</li>
</ul>
<h4>Basic setup is familiar</h4>
<ul>
<li>Clients, primary, backup(s), witness(es).</li>
<li>Client -> Primary</li>
<li>Primary -> Backups</li>
<li>Backups -> Primary
<ul>
<li>Primary waits for all backups / promoted witnesses in current view</li>
<li>Commit point</li>
</ul></li>
<li>Primary -> execute and reply to Client</li>
<li>Primary -> tell Backups to Commit</li>
<li><code>2n+1</code> servers, <code>n</code> backups, <code>n</code> witnesses, 1 primary
<ul>
<li>need a majority of <code>n+1</code> servers <code>=></code></li>
<li>tolerate up to <code>n</code> failures</li>
</ul></li>
<li>clients send NFS requests to primary
<ul>
<li>primary forwards each request to all the backups</li>
<li>after all the backups have replied, the primary can
execute the op and apply it to its FS</li>
<li>in a later operation, primary piggybacks an ACK to tell
backups the op. commited</li>
</ul></li>
</ul>
<h4>Why are <code>2b+1</code> servers necessary to tolerate <code>b</code> failures?</h4>
<ul>
<li>(This is review...)</li>
<li>Suppose we have <code>N</code> servers, and execute a write.</li>
<li>Can't wait for more than <code>N-b</code>, since <code>b</code> might be dead.</li>
<li>So let's require waiting for <code>N-b</code> for each operation.</li>
<li>The <code>b</code> we didn't wait for might be live and in another partition.</li>
<li>We can prevent them from proceeding if <code>N-b > b</code>.</li>
<li>I.e. <code>N > 2b => N = 2b + 1</code> is enough.</li>
</ul>
<h4>What are Harp's witnesses?</h4>
<ul>
<li>The primary and backups have FSs</li>
<li>The witnesses don't receive anything from primary and don't have FSs</li>
<li>Suppose we have a <code>P, B</code> and a <code>W</code></li>
<li>If there's a partition <code>P | B, W</code>, the witness acts as a tie breaker
<ul>
<li>whichever one (P or B) can talk to the witness gets to continue and execute client
side operations</li>
<li>witness acts as a tie breaker: whoever can talk to it wins and gets to
act as a primary</li>
</ul></li>
<li>a second use of the witness is to record operations</li>
<li>once a witness is part of the partition <code>B, W</code>, it records operations so
that a majority of nodes have the latest operations</li>
<li>a final function of the witness is that when the primary comes back to life,
the witness has been logging every single operation issued since primary
disappeared, so witness can replay every op to primary and primary will be up to
date w.r.t. all the operations executed
<ul>
<li>efficiently bring primary up to speed</li>
<li>the backup could do that too, but Harp is designed so that
backup dumps the op logs to disk and witnesses keep the logs themselves
so they can quickly send them to primary for reapplying
<ul>
<li>assumption that reapplying witness logs is faster than copying backup disk</li>
<li>assumption that witness logs won't get too big</li>
</ul></li>
</ul></li>
<li>The witnesses are one significant difference from Raft.</li>
<li>The <code>b</code> witnesses do not ordinarily hear about operations or keep state.</li>
<li>Why is that OK?
<ul>
<li><code>b+1</code> of <code>2b+1</code> do have state</li>
<li>So any <code>b</code> failures leaves at least one live copy of state.</li>
</ul></li>
<li>Why are the <code>b</code> witnesses needed at all?
<ul>
<li>If <code>b</code> replicas with state do fail, witnesses give the required <code>b+1</code> majority.</li>
<li>To ensure that only one partition operates -- no split brain.</li>
</ul></li>
<li>So, for a 3-server system, the witness is there to break ties about
which partition is allowed to operate when primary and backup are
in different partitions.
<ul>
<li>The partition with the witness wins.</li>
</ul></li>
</ul>
<h4>Does primary need to send operations to witnesses?</h4>
<ul>
<li>The primary must collect ACKs from a majority of the <code>2b+1</code> for every r/w operation.
<ul>
<li>To ensure that it is still the primary -- still in the majority partition.</li>
<li>To ensure that operation is on enough servers to intersect with any
<ul>
<li>future majority that forms a new view.</li>
</ul></li>
</ul></li>
<li>If all backups are up, primary+backups are enough for that majority.</li>
<li>If <code>m</code> backups are down:
<ul>
<li>Primary must talk to <code>m</code> "promoted" witnesses to get its majority for each op.</li>
<li>Those witnesses must record the op, to ensure overlap with any
<ul>
<li>future majority.</li>
</ul></li>
<li>Thus each "promoted" witness keeps a log.</li>
</ul></li>
<li>So in a <code>2b+1</code> system, a view always has <code>b+1</code> servers that the primary
must contact for each op, and that store each op.</li>
</ul>
<p>Note: somewhat different from Raft</p>
<ul>
<li>Raft keeps sending each op to all servers, proceeds when majority answer
<ul>
<li>So leader must keep full log until failed server re-joins</li>
</ul></li>
<li>Harp eliminates failed server from view, doesn't send to it
<ul>
<li>Only witness has to keep a big log; has special plan (ram, disk, tape).</li>
</ul></li>
<li>The bigger issue is that it can take a lot of work to bring
a re-joining replica up to date; careful design is required.</li>
</ul>
<h4>What's the story about the UPS?</h4>
<ul>
<li>This is one of the most interesting aspects of Harp's design</li>
<li>Each server's power cord is plugged into a UPS</li>
<li>UPS has enough battery to run server for a few minutes</li>
<li>UPS tells server (via serial port) when main A/C power fails</li>
<li>Server writes dirty FS blocks and Harp log to disk, then shuts down</li>
</ul>
<h4>What does the UPS buy for Harp?</h4>
<ul>
<li>Efficient protection against A/C power failure of ALL servers</li>
<li>For failures of up to b servers, replication is enough</li>
<li>If <em>all</em> servers failed and lost state, that's more than b failures,
<ul>
<li>so Harp has no guarantee (and indeed no state!)</li>
</ul></li>
<li>With UPS:
<ul>
<li>Each server can reply <em>without</em> writing disk!</li>
<li>But still guarantees to retain latest state despite simultaneous power fail</li>
</ul></li>
<li>But note:
<ul>
<li>UPS does not protect against other causes of simultaneous failure</li>
<li>e.g. bugs, earthquake</li>
<li>Harp treats servers that re-start after UPS-protected crash differently
<ul>
<li>than those that re-start with crash that lost in-memory state</li>
</ul></li>
<li>Because the latter may have forgotten <em>committed</em> operations</li>
</ul></li>
<li>For independent failures Harp has powerful guarantees, for stuff like software
bugs that will cause a cascade of crashes, it doesn't really have solutions</li>
</ul>
<p><strong>Larger point</strong>, faced by every fault-tolerant system</p>
<ul>
<li>Every replicated system tends to need to have a commit point</li>
<li>Replicas <em>must</em> keep persistent state to deal w/ failure of all servers
<ul>
<li>Committed operations</li>
<li>Latest view number, proposal number, &c</li>
</ul></li>
<li>Must persist this state before replying</li>
<li>Writing every commit to disk is very slow!
<ul>
<li>10 ms per disk write, so only 100 ops/second</li>
</ul></li>
<li>So there are a few common patterns:
<ol>
<li>Low throughput</li>
<li>Batching, high delay
<ul>
<li>batch a lot of writes and do them all at the same time to amortize the cost of each write</li>
<li>but now you need to make clients wait for their write to finish more
<ul>
<li>because they are also waiting for other clients' writes to finish</li>
</ul></li>
</ul></li>
<li>Lossy or inconsistent recovery from simultaneous failure
<ul>
<li>no guarantees after crashes</li>
</ul></li>
<li>Batteries, flash, SSD w/ capacitor, &c</li>
</ol></li>
</ul>
<h4>Let's talk about Harp's log management and operation execution</h4>
<ul>
<li>Primary and backup must apply client operations to their state</li>
<li>State here is a file system -- directories, file, owners, permissions, &c</li>
<li>Harp must mimic an ordinary NFS server to the client
<ul>
<li>i.e. not forget about ops for which it has sent a reply</li>
</ul></li>
</ul>
<h4>What is in a typical log record?</h4>
<ul>
<li>Not just the client-issued op., like <code>chmod</code></li>
</ul>
<p>Log record stores:</p>
<ul>
<li>Client's NFS operation (write, mkdir, chmod, &c)</li>
<li>Shadow state: modified i-nodes and directory content <em>after</em> execution
<ul>
<li>(i.e. the results after executing the operation)</li>
</ul></li>
<li>Client RPC request ID, for duplicate detection
<ul>
<li>Primary might repeat an RPC if it thinks the backup has failed</li>
</ul></li>
<li>Reply to send to client, for duplicate detection</li>
</ul>
<h4>Why does Harp have so many log pointers?</h4>
<ul>
<li>FP most recent client request</li>
<li>CP commit point (real in primary, latest heard in backup)</li>
<li>AP highest update sent to disk</li>
<li>LB disk has finished writing up to here</li>
<li>GLB all nodes have completed disk up to here</li>
</ul>
<h4>Why the FP-CP gap?</h4>
<ul>
<li>So primary doesn't need to wait for ACKs from each backup
<ul>
<li>before sending next operation to backups</li>
</ul></li>
<li>Primary pipelines ops CP..FP to the backups.</li>
<li>Higher throughput if concurrent client requests.</li>
</ul>
<h4>Why the AP-LB gap?</h4>
<ul>
<li>Allows Harp to issue many ops as disk writes before waiting for disk</li>
<li>The disk is more efficient if it has lots of writes (e.g. arm scheduling)</li>
</ul>
<h4>What is the LB?</h4>
<ul>
<li>This replica has everything <code><= LB</code> on disk.</li>
<li>So it won't need those log records again.</li>
</ul>
<h4>Why the LB-GLB gap?</h4>
<ul>
<li>GLB is min(all servers' LBs).</li>
<li>GLB is earliest record that <em>some</em> server might need if it loses memory.</li>
</ul>
<h4>When does Harp execute a client operation?</h4>
<p>There are two answers!</p>
<ol>
<li>When operation arrives, primary figures out exactly what should happen.
<ul>
<li>Produces resulting on-disk bytes for modified i-nodes, directories, &c.</li>
<li>This is the shadow state.</li>
<li>This happens before the CP, so the primary must consult recent
operations in the log to find the latest file system state.</li>
</ul></li>
<li>After the operation commits, primary and backup can apply it to
their file systems.
<ul>
<li>They copy the log entry's shadow state to the file system;</li>
<li>they do not really execute the operation.</li>
<li>And now the primary can reply to the client's RPC.</li>
</ul></li>
</ol>
<h4>Why does Harp split execution in this way?</h4>
<ul>
<li>If a server crashes and reboots, it is brought up to date by replaying
log entries it might have missed. Harp can't know exactly what the
last pre-crash operation was, so Harp may repeat some. It's not
correct to fully execute some operations twice, e.g. file append.</li>
<li>So Harp log entries contain the <em>resulting</em> state, which is what's applied.</li>
</ul>
<p>Append example:</p>
<pre><code> /---> picks up the modified inode
/
--------------------------------
| Append1 | Append2 |
---- * -------------------------
/ \ \
| \-> new inode stored here
CP
</code></pre>
<p>If backup crashes after it writes A1 to disk but before replying to primary, when
the backup reboots there's no obvious way of telling whether it executed A1. As
a result, it has to reexecute it. Thus, these log records have to be "repeatable."</p>
<p><em>Actually,</em> a lot of replication systems have to cope with this, and this is one
way to deal with it. It also illustrates how non-straightforward replication can be.</p>
<ul>
<li>Harp needs to be aware of FS-level inodes for instance</li>
</ul>
<h4>The point: multiple replay means replication isn't transparent to the service.</h4>
<ul>
<li>Service must be modified to generate and accept the state
modifications that result from client operations.</li>
<li>In general, when applying replication to existing services,
the service must be modified to cope with multiple replay.</li>
</ul>
<h4>Can Harp primary execute read-only operations w/o replicating to backups?</h4>
<ul>
<li>e.g. reading a file.</li>
<li>Would be faster -- after all, there's no new data to replicate.</li>
<li>The reason we forward read only operations to backup is to make sure
we find out if we were partitioned and 1000 ops were executed that
we don't know about: make sure we don't reply with an old write of
the data we are reading</li>
<li>What's the danger?</li>
<li>Harp's idea: leases</li>
<li>Backups promise not to form a new view for some time.
<ul>
<li>(i.e. not to process any ops as a primary)</li>
</ul></li>
<li>Primary can execute read-only ops locally for that time minus slop.
<ul>
<li>because it knows backup won't execute ops as primary during that
time (backup promised this!)</li>
</ul></li>
<li>Depends on reasonably <em>synchronized clocks</em>:
<ul>
<li>Robert Morris: "Not really happy about this."</li>
<li>Primary and backup must have bounded disagreement
on how fast time passes.</li>
<li>This really requires a bounded frequency skew
<ul>
<li>Apparently hardware is really bad at providing this</li>
</ul></li>
</ul></li>
</ul>
<h4>What state should primary use when executing a read-only operation?</h4>
<ul>
<li>Does it have to wait for all previously arrived operations to commit?
<ul>
<li>No! That would be almost as slow as committing the read-only op.</li>
</ul></li>
<li>Should it look at state as of operation at FP, i.e. latest r/w operation?
<ul>
<li>No! That operation has not committed; not allowed to reveal its effects.</li>
</ul></li>
<li>Thus Harp executes read-only ops with state as of CP.</li>
<li>What if client sends a WRITE and (before WRITE finishes) a READ of same data?
<ul>
<li>READ may see data <em>before</em> the WRITE!</li>
<li>Why is that OK?</li>
<li>The client sent the READ and the WRITE concurrently. It has no right to expect one order or the other. So if the READ doesn't see the WRITE's effects, that's acceptable -- it's the same answer you'd get if the READ had moved through the network faster than the WRITE, which could happen. You can only expect a READ to see a WRITE's effect if you issue the WRITE, wait for a reply to the WRITE, and then issue the READ.</li>
</ul></li>
</ul>
<h4>How does failure recovery work?</h4>
<ul>
<li>I.e. how does Harp recover replicated state during view change?</li>
</ul>
<p>Setup for the following scenarios:</p>
<p>5 servers: S1 is usually primary, S2+S3 are backups, S4+S5 witnesses</p>
<p>Scenario:</p>
<pre><code> S1+S2+S3; then S1 crashes
S2 is primary in new view (and S4 is promoted)
</code></pre>
<ul>
<li>Will S2 have every committed operation?
<ul>
<li>Yes.</li>
</ul></li>
<li>Will S2 have every operation S1 received?
<ul>
<li>No. No, maybe op didn't reach S2 from S1 and then S1 crashed.</li>
</ul></li>
<li>Will S2's log tail be the same as S3's log tail?
<ul>
<li>Not necessarily.
<ul>
<li>Maybe op reached S2 but not S3 and then S1 crashed.</li>
<li>Maybe op reached S2, and S3 crashed, so S4 was promoted. Then S3
came back up?</li>
</ul></li>
</ul></li>
<li>How far back can S2 and S3 log tail differ?
<ul>
<li>Not up to the CP, because committed ops could be committed w/ help
of promoted witness <code>=></code> backup logs differ</li>
</ul></li>
<li>How to cause S2 and S3's log to be the same?
<ul>
<li>Must commit ops that appeared in both S2+S3 logs</li>
<li>What about ops that appear in only one log?
<ul>
<li>In this scenario, can discard since could not have committed</li>
<li>But in general committed op might be visible in just one log</li>
</ul></li>
</ul></li>
<li>From what point does promoted witness have to start keeping a log?</li>
</ul>
<h4>What if S1 crashed just before replying to a client?</h4>
<ul>
<li>Will the client ever get a reply?</li>
</ul>
<p>After S1 recovers, with intact disk, but lost memory.</p>
<ul>
<li>It will be primary, but Harp can't immediately use its state or log.
<ul>
<li>Unlike Raft, where leader only elected if it has the best log.</li>
<li>Harp must replay log from promoted witness (S4)</li>
</ul></li>
<li>Could S1 have executed an op just before crashing
that the replicas didn't execute after taking over?
<ul>
<li>No, execution up to CP only, and CP is safe on S2+S3.</li>
</ul></li>
</ul>
<p>New scenario: S2 and S3 are partitioned (but still alive)</p>
<ul>
<li>Can S1+S4+S5 continue to process operations?
<ul>
<li>Yes, promoted witnesses S4+S5</li>
</ul></li>
<li>S4 moves to S2/S3 partition</li>
<li>Can S1+S5 continue?
<ul>
<li>No, primary S1 doesn't get enough backup ACKs</li>
</ul></li>
<li>Can S2+S3+S4 continue?
<ul>
<li>Yes, new view copies log entries S4->S2, S4->S3, now S2 is primary</li>
</ul></li>
<li>Note:
<ul>
<li>New primary was missing many committed operations</li>
<li>In general some <em>committed</em> operations may be on only one server</li>
</ul></li>
</ul>
<p>New scenario: S2 and S3 are partitioned (but still alive)</p>
<ul>
<li>S4 crashes, loses memory contents, reboots in S2/S3 partition</li>
<li>Can they continue?
<ul>
<li>Only if there wasn't another view that formed and committed more ops</li>
</ul></li>
<li>How to detect?
<ul>
<li>Depends on what S4's on-disk view # says.</li>
<li>OK if S4's disk view # is same as S2+S3's.
<ul>
<li>No new views formed.</li>
<li>S2+S3 must have heard about all committed ops in old view.</li>
</ul></li>
</ul></li>
</ul>
<h4>Everybody (S1-5) suffers a power failure.</h4>
<ul>
<li>S4 disk and memory are lost, but it does re-start after repair.</li>
<li>S1 and S5 never recover.</li>
<li>S2 and S3 save everything on disk, re-start just fine.</li>
<li>Can S2+S3+S4 continue?</li>
<li>(harder than it looks)
<ul>
<li>Believe the answer is "No": cannot be sure what state S4 had before failure.</li>
<li>Might have formed a new view with S1+S5, and committed some ops.
<ul>
<li>Can S2 and S3 know about the previous view? Not always.</li>
</ul></li>
</ul></li>
</ul>
<h4>When can Harp form a new view?</h4>
<ol>
<li>No other view possible.</li>
<li>Know view # of most recent view.</li>
<li>Know all ops from most recent view.</li>
</ol>
<p>Details:</p>
<ul>
<li>(1) is true if you have n+1 nodes in new view.</li>
<li>(2) true if you have n+1 nodes that did not lose view # since last view.
<ul>
<li>View # stored on disk, so they just have to know disk is OK.</li>
<li>One of them <em>must</em> have been in the previous view.</li>
<li>So just take the highest view number.</li>
</ul></li>
<li>And #3?
<ul>
<li>Need a disk image, and a log, that together reflect all operations
through the end of the previous view.</li>
<li>Perhaps from different servers, e.g. log from promoted witness,
disk from backup that failed multiple views ago.</li>
</ul></li>
</ul>
<h4>Does Harp have performance benefits?</h4>
<ul>
<li>In Fig 5-1, why is Harp <em>faster</em> than non-replicated server?</li>
<li>How much win would we expect by substituting RPC for disk operations?</li>
</ul>
<h4>Why graph x=load y=response-time?</h4>
<ul>
<li>Why does this graph make sense?</li>
<li>Why not just graph total time to perform X operations?</li>
<li>One reason is that systems sometimes get more/less efficient w/ high load.</li>
<li>And we care a lot how they perform w/ overload.</li>
</ul>
<h4>Why does response time go up with load?</h4>
<ul>
<li>Why first gradual...
<ul>
<li>Queuing and random bursts?</li>
<li>And some ops more expensive than others, cause temp delays.</li>
</ul></li>
<li>Then almost straight up?
<ul>
<li>Probably has hard limits, like disk I/Os per second.</li>
<li>Queue length diverges once offered load > capacity</li>
</ul></li>
</ul>