-
Notifications
You must be signed in to change notification settings - Fork 3
Concepts
This page describes the major OStore concepts.
A database is a set of tables.
A table is an ordered set of items.
An item is a key-value pair. A table can
have at most one item for a specific key.
The items in a table are ordered by their keys.
Keys and values are both represented in Json.
Json values in memory are represented using Scala
immutable types (see JsonOps.scala
).
Json can also be mapped to user specified case
classes (see JsonMapper.scala
).
Keys have three possible Json forms.
- String. Standard Java/Scala 2 byte per character
strings. The characters
\u0000' and
\uFFFF` may not be used. - Number. Standard Java/Scala 64 bit signed long values.
- Array. Zero or more elements, each of which must have a valid key form.
Keys are ordered. Strings and numbers are
ordered in the usual way. For arrays they are
ordered first by the first element and last by
the last element. For example ["a","b"] < ["b","a"]
.
Across forms
23 < "23" < [23]
One key can be a prefix of another key.
-
"abc"
is a prefix of"abcd"
-
["ab",23]
is a prefix of["ab",23,"x"]
-
[23,"abc"]
is not a prefix of[23,"abcd"]
-
[]
is a prefix of["abc"]
- A key is not a prefix of itself
-
["a",["b"]]
is not a prefix of["a",["b","c"]]
Values can be arbitrary Json except object
field names that start with $
are reserved for
special uses.
A physical or virtual machine with a processor (1 or more cores), main memory, storage (disk and/or ssd) and an ip address.
Each database can have one or more rings. Each ring has a complete copy of all the data for all the tables. Each ring contains a circularly linked set of nodes. The key-values for each table are evenly distributed across the nodes in key order. The highest key is followed by the lowest key to complete the cycle.
Each node is assigned to a specific server. For testing, all nodes can be assigned to a single server. In production each node would typically be assigned to its own server. The range of keys on each node will be different for each table.
Databases, rings, and nodes each have a name. A name is a sequence of one or more characters that are either letter (a-z,A-Z) or digits (0-9). The first character must be a letter. Names are case sensitive. Servers are named by their host plus port.
There are four major APIs available.
- Scala synchronous api (see
Table.scala
) - Scala asynchronous api (see
AsyncTable.scala
) - REST api
- Web console (implemented with Vaadin)
Plugable local storage engines are supported. The system currently support jdbm3 and an in-memory store. Other stores are planned.
Vector clocks are used for conflict detection. There are currently three conflict resolution strategies.
- Last write wins.
- A single Json value is produced with both conflicting alternatives preserved.
- User supplied resolution code.
In case 2 and 3, where possible, 3-way conflict resolution is used. Here the two conflicting values and a common ancestor of both are used to do resolution.
There is no guarantee that 3 way resolution is always possible. In these cases 2 way resolution is used.
Suppose there are N
rings.
We can then specify that every write
must be confirmed to at least W
rings
and each read must get data from at least
R
rings.
1 <= R <= N
1 <= W <= N
There is a fast mode where writes are acknowledged after main memory is changed but before the disk commit is complete. The following setting ensures against a single point of failure.
{"fast":true,"w":2}
Optimistic concurrency control is supported.
- Get an item value and vector clock
- Create a new value
- Put the new value and old vector clock
- Succeeds only if vector clock has not changed
OStore continuously runs a set of background processes.
- Anti-entropy. To find and repair inconsistencies.
- Garbage collection. To remove tombstones, shorten vector clocks, and remove expired items.
- Balancing. Keeps the number of items in each node of a ring the same.
- Add ring. When a new ring is added the copy to it runs in the background.