Skip to content

Concepts

nestorpersist edited this page Jan 12, 2013 · 25 revisions

This page describes the major OStore concepts.

Database

A database is a set of tables.

Table

A table is an ordered set of items.

Item

An item is a key-value pair. A table can have at most one item for a specific key. The items in a table are ordered by their keys. Keys and values are both represented in Json. Json values in memory are represented using Scala immutable types (see JsonOps.scala). Json can also be mapped to user specified case classes (see JsonMapper.scala).

Key

Keys have three possible Json forms.

  1. String. Standard Java/Scala 2 byte per character strings. The characters \u0000' and \uFFFF` may not be used.
  2. Number. Standard Java/Scala 64 bit signed long values.
  3. Array. Zero or more elements, each of which must have a valid key form.

Keys are ordered. Strings and numbers are ordered in the usual way. For arrays they are ordered first by the first element and last by the last element. For example ["a","b"] < ["b","a"]. Across forms

  23 < "23" < [23]

One key can be a prefix of another key.

  1. "abc" is a prefix of "abcd"
  2. ["ab",23] is a prefix of ["ab",23,"x"]
  3. [23,"abc"] is not a prefix of [23,"abcd"]
  4. [] is a prefix of ["abc"]
  5. A key is not a prefix of itself
  6. ["a",["b"]] is not a prefix of ["a",["b","c"]]

Value

Values can be arbitrary Json except object field names that start with $ are reserved for special uses.

Server

A physical or virtual machine with a processor (1 or more cores), main memory, storage (disk and/or ssd) and an ip address.

Ring

Each database can have one or more rings. Each ring has a complete copy of all the data for all the tables. Each ring contains a circularly linked set of nodes. The key-values for each table are evenly distributed across the nodes in key order. The highest key is followed by the lowest key to complete the cycle.

Node

Each node is assigned to a specific server. For testing, all nodes can be assigned to a single server. In production each node would typically be assigned to its own server. The range of keys on each node will be different for each table.

Name

Databases, rings, and nodes each have a name. A name is a sequence of one or more characters that are either letter (a-z,A-Z) or digits (0-9). The first character must be a letter. Names are case sensitive. Servers are named by their host plus port.

APIs

There are four major APIs available.

  1. Scala synchronous api (see Table.scala)
  2. Scala asynchronous api (see AsyncTable.scala)
  3. REST api
  4. Web console (implemented with Vaadin)

Local Storage Engines

Plugable local storage engines are supported. The system currently support jdbm3 and an in-memory store. Other stores are planned.

Conflict Detection and Resolution

Vector clocks are used for conflict detection. There are currently three conflict resolution strategies.

  1. Last write wins.
  2. A single Json value is produced with both conflicting alternatives preserved.
  3. User supplied resolution code.

In case 2 and 3, where possible, 3-way conflict resolution is used. Here the two conflicting values and a common ancestor of both are used to do resolution.

There is no guarantee that 3 way resolution is always possible. In these cases 2 way resolution is used.

Consistency

Suppose there are N rings. We can then specify that every write must be confirmed to at least W rings and each read must get data from at least R rings.

  1 <= R <= N
  1 <= W <= N

There is a fast mode where writes are acknowledged after main memory is changed but before the disk commit is complete. The following setting ensures against a single point of failure.

{"fast":true,"w":2} 

Optimistic concurrency control is supported.

  1. Get an item value and vector clock
  2. Create a new value
  3. Put the new value and old vector clock
  4. Succeeds only if vector clock has not changed

Background processes

OStore continuously runs a set of background processes.

  1. Anti-entropy. To find and repair inconsistencies.
  2. Garbage collection. To remove tombstones, shorten vector clocks, and remove expired items.
  3. Balancing. Keeps the number of items in each node of a ring the same.
  4. Add ring. When a new ring is added the copy to it runs in the background.
Clone this wiki locally