Lecture notes from 6.824, taught by Prof. Robert T. Morris. These lecture notes are slightly modified from the ones posted on the 6.824 course website.
- Lecture 1: Introduction: distributed system definition, motivations, architecture, implementation, performance, fault-tolerance, consistency, MapReduce
- Lecture 2: Remote Procedure Calls (RPCs): RPC overview, marshalling, binding, threads, "at-least-once", "at-most-once", "exactly once", Go's RPC, thread synchronization
- Lecture 3: Fault tolerance: primary-backup replication, state transfer, "split-brain", Remus (NSDI 2008),
- Lecture 4: Flat datacenter storage: flat datacenter storage, bisection bandwidth, striping
- Lecture 5: Paxos: Paxos, consensus algorithms
- Lecture 6: Raft: Raft, a more understandable consensus algorithm
- Lecture 7: Google Go guest lecture by Russ Cox
- Lecture 8: Harp: distributed file system, "the UPS trick", witnesses
- Lecture 9: IVY: distributed shared memory, sequential consistency
- Lecture 10: TreadMarks: userspace distributed shared memory system, vector timestamps, release consistency (lazy/eager), false sharing, write amplification
- Lecture 11: Ficus: optimistic concurrency control, vector timestamps, conflict resolution
- Lecture 12: Bayou: disconnected operation, eventual consistency, Bayou
- Lecture 13: MapReduce: MapReduce, scalability, performance
- Lecture 14: Spark guest lecture by Matei Zaharia: Resilient Distributed Datasets, Spark
- Lecture 15: Spanner guest lecture by Wilson Hsieh, Google: Spanner, distributed database, clock skew
- Lecture 16: Memcache at Facebook: web app scalability, look-aside caches, Memcache
- Lecture 17: PNUTS Yahoo!: distributed key-value store, atomic writes
- Lecture 18: Dynamo: distributed key-value store, eventual consistency
- Lecture 19: HubSpot guest lecture
- Lecture 20: Two phase commit (2PC): two-phase commit, Argus
- Lecture 21: Optimistic concurrency control
- Lecture 22: Peer-to-peer, trackerless Bittorrent and DHTs: Chord, routing
- Lecture 23: Bitcoin: verifiable public ledgers, proof-of-work, double spending
- Lab 1: MapReduce, [assign]
- Lab 2: A fault-tolerant key/value service, [assign], [notes]
- Lab 3: Paxos-based Key/Value Service, [assign], [notes]
- Lab 4: Sharded Key/Value Service, [assign], [notes]
- Lab 5: Persistent Key/Value Service, [assign]
Papers we read in 6.824 (directory here):
- MapReduce
- Remus
- Flat datacenter storage
- Paxos
- Raft
- Harp
- Shared virtual memory
- TreadMarks
- Ficus
- Bayou
- Spark
- Spanner
- Memcached at Facebook
- PNUTS
- Dynamo
- Akamai
- Argus, Guardians and actions
- Kademlia
- Bitcoin
- AnalogicFS
Other papers:
- Impossibility of Distributed Consensus with One Faulty Process
- See page 5, slide 10 here to understand Lemma 1 (commutativity) faster
- See this article here for an alternative explanation.
- Practical Byzantine Fault Tolerance (PBFT)
- A brief history of consensus, 2PC and transaction commit
- Distributed systems theory for the distributed systems engineer
- Distributed Systems: For fun and Profit
- You can't choose CA out of CAP, or "You can't sacrifice partition tolerance"
- Notes on distributed systems for young bloods
- Paxos Explained From Scratch
Prep for quiz 1 here