Skip to content

Replication

Jens Alfke edited this page Dec 2, 2015 · 2 revisions

Introduction

This is an overview of the current Couchbase Mobile replication protocol, as implemented by Couchbase Lite and Couchbase Sync Gateway. It's aimed at developers working on mobile apps and the servers that support them. While it isn't necessary to know how the replication protocol works in order to use it, we've found that some knowledge can be useful for troubleshooting and performance testing.

History

The replication protocol originated with Apache CouchDB, and our products are still compatible with it, as well as other CouchDB-compatible databases like PouchDB and Cloudant. We've added some extensions, however, and our replicator implementations won't make exactly the same sequences of calls as other implementations.

Disclaimer

The replication protocol will evolve over time as we add new functionality and optimizations. We intend to retain compatibility, in part through runtime version checking, but when current Couchbase products sync with each other, they may use new methods or parameters. We'll try to keep this document up to date, but be aware that if you use the information here to build things -- like traffic simulators or filters -- you may have to update those periodically.

Overview & Terminology

Documents, Revisions, Sequences

A document is identified by an ID string that's unique in the database. Creating, changing or deleting a document produces a new revision, which has a unique revision ID. Revisions form a history of the document, which is actually a tree if you consider the possibility of conflicts.

Note: A deletion is a revision whose JSON body contains a "_deleted":true property. This is often called a "tombstone".

A current revision is one that has not yet been replaced; it's a leaf node in the revision tree. Most of the time a document has only one current revision, but multiple current revisions can exist and that's called a conflict.

Whenever a document is updated, it's assigned a new sequence ID, a sort of timestamp or serial number. In Couchbase Lite, sequence IDs are simply consecutive integers (similar to an auto-incrementing primary key in a relational table), but Sync Gateway has a more complex system for reasons of scalability. The important thing is that a database (local or remote) can be queried to find all documents whose sequence IDs are newer than a specific ID.

Replication

The primary goal of replication is, given a source and a target database, to identify all current document revisions (including deletions) in the source that do not exist in the target, and copy them -- with their contents, attachments and revision histories -- to the target. Afterwards, all current revisions in the source exist at the target and have the same revision histories there.

A secondary, but important, goal is to do this without redundantly transferring the contents of any revisions that already exist at the target.

In practice, the replication algorithm runs on the computer (and in the process) that owns one of the databases, so one database is local and the other is remote.

A replication with a local source database is called a push, and one with a local target database is called a pull.

The active process is usually a client app running Couchbase Lite, and the passive process is usually Sync Gateway. However, it's possible for other databases to sync with Sync Gateway, and there are unofficial extensions that can allow instances of Sync Gateway to actively sync to one another.

Algorithm

This algorithm runs in the active process. It comes in different flavors for push and pull replications, although some steps are the same.

Most, but not all, of these steps manifest as HTTP requests sent to the remote database. These are described in italics below.

These steps are described as sequential, but in actual implementation they run in parallel, with different revisions being in different steps at the same time. You can imagine them as a data flow through which revisions pass; revisions may travel at different rates depending on how long asynchronous tasks like HTTP requests and database queries take.

Because of this, and also because multiple HTTP requests are sent at once over multiple sockets (usually 4-8), the actual HTTP traffic (or resulting log messages) isn't nearly as simple or clear as the description here.

Note: There's also a ladder diagram illustrating this, in an appendix below.

Push

  1. If this isn't the first replication between these databases, fetch checkpoints from the local and the remote database. These record where replication left off before, i.e. the latest source sequence ID that's been successfully replicated. If the checkpoints are missing on one side or don't match, ignore them.
    (This sends a GET request to /db/_local/CHECKPOINTID, where CHECKPOINTID is a hex string that uniquely identifies the source/target databases and other replication parameters.
    Note: this will return a 404 Not Found status if the remote database has no record of this checkpoint. That's not an error; it just means this is a first sync, or perhaps the remote database has been erased or restored from a backup since the last sync.)

  2. Query the local database for all document revisions with a sequence ID newer than the checkpoint. Or if there's no valid checkpoint, get all current revisions in order of sequence.

  3. Ask the remote database whether it already has these revisions.
    (This sends one or more POSTs to /db/_revs_diff, each containing up to 100 or so revisions.)

  4. Load the bodies of the revisions not in the remote database, and add them to it (together with any attachments that have changed.)
    (This sends one or more POSTs to /db/_bulk_docs, each containing up to 100 or so revisions. The request body is formatted as MIME multipart (q.v.), with each part being one document. A document with attachments is itself formatted as multipart, with the first part being the JSON body and the rest being attachments.)

  5. Periodically after revisions are successfully uploaded, and when complete, update the local and remote checkpoints.
    (This sends a PUT to /db/_local/CHECKPOINTID.)

  6. Continue until all new local revisions have been uploaded.

  7. If this is a non-continuous replication, stop (saving checkpoints if necessary).

  8. If this is a continuous replication, wait for a new revision to be added to the local database, then go back to step 2 to push it.

Pull

  1. Fetch checkpoints, as described in the Push section.

  2. Ask the remote database for all document revisions with a sequence ID newer than the checkpointed one. Or if there's no valid checkpoint, ask for all current revisions in order of sequence.
    (This sends a POST or GET request to /db/_changes, with the since parameter value equal to the checkpoint ID. Various feed parameter values may be used: generally the first call will use normal feed.)

  3. Query the local database to find which of these revisions don't exist locally.

  4. Request the bodies of the not-existing revisions, and any changed attachments.
    (When both local and remote processes are Couchbase software, this sends one or more POSTs to /db/_bulk_get (an API extension), each requesting up to 100 or so revisions; the response is in MIME multipart format (q.v.), with one part per document. In other cases, it sends an individual GET for each revision, to /db/DOCID?rev=REVID. In some situations Couchbase Lite can use a POST to _all_docs when talking to non-Couchbase servers.)

  5. As revisions are downloaded, add them to the local database.

  6. Periodically as revisions are successfully downloaded, and when complete, update the local and remote checkpoints as described in the Push section.

  7. Continue until the end of the _changes feed is reached and all revisions have been downloaded and inserted.

  8. If this is a non-continuous replication, stop (saving checkpoints if necessary).

  9. If this is a continuous replication, go back to step 2, opening a new _changes feed. This time use ?feed=longpoll or ?feed=websocket, either of which allows for server push of new revisions. This type of feed never ends, it may just go idle (leaving the socket open) until there are new revisions to announce.

Appendices

Ladder Diagram

REST APIs Used

These are the REST API calls that the replication algorithm (usually running in Couchbase Lite) makes to the remote database (usually Sync Gateway)

  • GET /db /_local/checkpointid — To read the last checkpoint
  • PUT /db /_local/checkpointid — To save a new checkpoint

Push Only:

  • PUT /db — If told to create remote database (not applicable to Sync Gateway)
  • POST /db /_revs_diff — To find which revisions are not known to the remote db
  • POST /db /_bulk_docs — To upload multiple revisions
  • POST /db /docid ?new_edits=false — To upload a single revision with attachments (not used with Sync Gateway)

Pull Only:

  • POST /db /_changes?style=all_docs&feed=feed &since=since &limit=limit &heartbeat=heartbeat — To find changes since the last pull. feed will be normal, longpoll, or websocket. (Note: verb may be GET.)
  • GET /db /docid ?rev=revid &revs=true&attachments=true&atts_since=lastrev — To download a single doc with attachments
  • POST /db /_bulk_get?revs=true&attachments=true — To download documents in bulk (nonstandard; implemented by Sync Gateway)
  • POST /db /_all_docs?include_docs=true — To download first-generation revisions in bulk (not used with Sync Gateway)

Multipart Encoding

The _bulk_docs and _bulk_get REST API calls use MIME multipart format to upload or download multiple docs, where each doc can have multiple attachments. This is more efficient than a JSON-encoded body, especially since binary attachments can be sent in raw form without base64 encoding. Since most developers aren't familiar with multipart, here's an overview.

Multipart data begins with two hyphens and a "boundary string", then a CRLF. The boundary is just an arbitrary string chosen by the code that generates the data; the only requirement is that it can't appear anywhere inside any part, since it's used to denote where a part ends. It's usually a UUID or other random data.

After the boundary string can come zero or more headers (just like HTTP or email headers) that apply to the next part. The headers end with an empty line (two CRLFs in a row), then comes the raw data of the part.

This can repeat: the next appearance of two hyphens and the same boundary string ends the part, and the next part's headers begin.

The end of the multipart data is signaled by a boundary string that ends not with a newline but with two more hyphens.

Here's a simple example of a document:

--8345697C-3CBD-4A4A-B59B-759FC8B576AA
Content-Type: application/json

{
	"_id": "THX-1138",
	"_rev": "1-foobar",
	"_attachments": {
		"mary.txt": {"type": "text/doggerel", "length": 52, "follows": true}
	}
}
--8345697C-3CBD-4A4A-B59B-759FC8B576AA  
Content-Type: text/doggerel
Content-Disposition: attachment; filename=mary.txt

Mary had a little lamb
Its fleece was white as snow
--8345697C-3CBD-4A4A-B59B-759FC8B576AA--

The first part is the document body; after that come the attachments. Each attachment uses a Content-Disposition header to identify itself; the filename parameter of the header matches the key in the _attachments dictionary.

The attachment shown here happens to be ASCII, but attachments can be binary, with no special encoding. If this document had a JPEG attachment, there would literally be raw JPEG data in between the boundary strings and headers. Be careful when dumping multipart data to a terminal!

An actual _bulk_get or _bulk_docs body contains multiple documents, each inside its own part (yes, multipart can nest.) That means there's one top-level boundary string that's being used to delimit messages, and in between each pair of those boundaries is a document, beginning with its own different boundary string.

(Actually, if a document has no attachments it isn't formatted as multipart, just as a plain MIME body of type application/json.)

Clone this wiki locally