Skip to content
hbbio edited this page Nov 10, 2014 · 11 revisions

Hello, database

In this chapter we will explore data storage possibilities. For complex, data-intense applications, requiring scalable database with data replication and complex querying capabilities, Opa provides extensive support for the popular MongoDB database. This chapter focuses on MongoDB.

MongoDB: Quickstart

In this section we will write a simple counter example running with MongoDB.

First, unless done already, you need to download, install & run the MongoDB server. Installation essentially means unpacking the downloaded archive. And you can run the server with a command such as:

$ mkdir -p ~/mongodata/opa
$ {INSTALL_DIR}/mongod --noprealloc --dbpath ~/mongodata/opa > ~/mongodata/opa/log.txt 2>&1

which creates directory ~/mongodata/opa for the data (1) and then runs MongoDB server using that location and writing logs to ~/mongodata/opa/log.txt (2). The server will take a moment to start and once done you can verify that it works by running mongo (client) in another terminal. The response should look something like this (of course version number may vary):

$ {INSTALL_DIR}/mongo
MongoDB shell version: 2.0.2
connecting to: test
>

Let us see a small example for counting clicks :

import stdlib.themes.bootstrap

database int /counter = 0;

function action(_) {
    /counter++;
    #msg = <div>Thank you, user number {/counter}!</div>
}

function page() {
    <h1 id="msg">Hello</h1>
    <a class="btn" onclick={action}>Click me</a>
}

Server.start(
    Server.http,
    { ~page, title: "Database Demo" }
)

compile: opa hello-opa.opa

run: ./hello-opa.js

and... voilà, you have your first Opa application running on MongoDB!

The above setup assumes that Mongo is running on local machine on its default port 27017; if that's not the case you can run your application (not the compiler!) with the option --db-remote:db_name host[:port]. For instance if Mongo was running locally but on port 4242, we'd need to run the application with:

./counter.js --db-remote:mydb localhost:4242

Incidentally, if you use authentication with your MongoDB databases, you can provide the authentication parameters on the command line as follows:

./counter.js --db-remote:mydb myusername:mypassword@localhost:4242

Note, however, that we have a slightly simplified view of MongoDB authentication in that we only support one database per connection. MongoDB implements authentication on a per-database basis and can have authentication to multiple databases over the same connection. If you have more than one database at the same server you wil have to issue multiple --db-remote options, one for each target database. A potentially useful feature is that if you authenticate with the admin database then it acts like ''root'' access for databases and you can then access all databases over that connection.

So using Mongo is very easy. It is worth noting that although Mongo itself is "Schema free", the Opa compiler ensures adherence to program database definitions, therefore providing a safety layer and ensuring type safety for all database operations.

What are the gains of using such a database? Apart from running on an industry standard DBMS it opens doors for more complex querying, which we will explore in the following section.

Database declarations

One example is worth a thousand words, so in the remainder of this section we will use a running example of a database for a movie rating service. First we need data for service users, which will just be a set of all users, each of them having her id of type user_id (here explicit int, though in a real-life example it would probably be abstracted away), an int age field and a status which is a simple variant type (enumeration).

type user_status = {regular} or {premium} or {admin}
type user_id = int
type user = { user_id id, string name, int age, user_status status }

database users {
  user /all[{id}]
  /all[_]/status = { regular }
}

First thing to notice is the notation for declaring database sets. Writing user /user declares a single value of type user at path /user, however user /all[{id}] declares a set of values of type user, where the record field id is the primary key.

Second important thing is that all database paths in Opa have associated defaults values. For int values that is 0, for string values that is an empty string. For custom values we either need to provide it explicitly, which is what /all[_]/status = { regular } does, or we need to explicitly state that a given records will never be modified partially, which can be accomplished with the following declaration /all[_] full -- more on partial record updates below.

Now we are ready to declare movies. First we introduce a type for a (movie) person, which in this simple example just consists of a name and the year of birth, birthyear.

type person = { string name, int birthyear }

Then we introduce a type for movie cast, including director and a list of stars playing in the movie (in credits order).

type cast = { person director, list(person) stars }

Now a (single) movie has its title (string), view_counter (int; how many times a given movie was looked at in the system), its cast and a set of ratings which are mappings from int (representing the rating) to list(user_id) (representing the list of users who gave that rating). It's not very realistic to just represent one single movie, but for our illustration purposes this will do.

database movie {
  string /title
  int /view_counter
  cast /cast
  intmap(list(user_id)) /ratings
}

It is worth noticing that we can store complex (non-primitive) values in the database, without any extra effort -- cast is a record, which itself contains a list as one of its fields.

Here we notice that complex types -- records (cast), lists (stars field of cast) and intmap -- can be declared as entries in the database, in the same way as primitive types.

In the rest of this section we will explore how to read, query and update values of different types.

Basic types

We can read the title and the view counter of a single movie simply with:

string title = /movie/title;
int view_no = /movie/view_counter;

The /movie/title and /movie/view_counter are called paths and are composed of the database name (movie) and field names (title/view_counter).

Updates can be performed using the database write operator path <- value:

/movie/title <- "The Godfather";
/movie/view_counter <- 0;

What will happen if one tries to read unitialized value from the database? Say we do:

int view_no = /movie/view_counter;

when /movie/view_counter was never initialized; what's the value of view_no? We already mentioned default values above and the default value for a given path is exactly what will be given, if it was not written before.

This default value can be overwritten in the path declaration as follows:

database movies {
  ...
  int /view_counter = 10 // default value will be 10
}

Another option is to use question mark (?) before the path read, which will return an optional value, {none} indicating that the path has not been written to yet, as in:

match (?/movies/view_counter) {
case {none}: "No views for this movie...";
case {some: count}: "The movie has been viewed {count} times";
}

For integer fields we can also use some extra operators:

/movie/view_counter++;
/movie/view_counter += 5;
/movie/view_counter -= 10;

Records

With records we can do complete reads/updates in the same manner as for basic types:

cast complete_cast = /movie/cast;
/movie/cast <- { director: { name: "Francis Ford Coppola", birthyear: 1939 }
               , stars: [ { name: "Marlon Brando", birthyear: 1924 }, { name: "Al Pacino", birthyear: 1940 } ]
               };

However, unless a given path has been declared with for full modifications only (full modificator explained above), we can also cross record boundaries and access/update only chosen fields, by including them in the path:

person director = /movie/cast/director;
/movie/cast/director/name <- "Francis Ford Coppola"

Also with updates we can only update some of the fields:

 // Notice the stars field below, which is not given and hence will not be updated
/movie/cast <- { director: { name: "Francis Ford Coppola", birthyear: 1939 } }

Lists

List in Opa are just (recursive) records and can be manipulated as such:

/movie/cast/stars <- []
person first_billed = /movie/cast/stars/hd
list(person) non_first_billed = /movie/cast/stars/tl

However, as it's a frequently used data-type, Opa provides a numer of extra operations on them:

// removes first element of the list
/movie/cast/stars pop
// removes last element of the list
/movie/cast/stars shift
// appends one element to the list
/movie/cast/stars <+ { name: "James Caan", birthyear: 1940 }

person actor1 = ...
person actor2 = ...
// appends several elements to the list
/movie/cast/stars <++ [ actor1, actor2 ]

Sets and Maps

Sets and maps are two types of collections that allow to organize multiple instances of data in the database. Sets represent a number of items of a given type, with no order between the items (as opposed to lists, which are ordered). Maps represent associations from keys to values.

We can fetch a single value from a given set by referencing it by its primary key:

user some_user = /users/all[{id: 123}]

Similarly for maps:

list(user_id) gave_one = /movie/ratings[1]

We can also make various queries using the following operators (comma separated).

  • field == expr: value of field is equal to the expression expr.
  • field =~ expr: regexp pattern matching to expression expr.
  • field != expr: value field is not equal to the expression expr.
  • field < expr: value of field is smaller than that of expr (you can also use: <=, > and >= with ).
  • field in list_expr: value of field is one of those of the list list_expr.
  • q1 or q2: satisfies query q1 or q2.
  • q1 and q2: satisfies queries q1 and q2.
  • not q: does not satisfy query q.
  • {f1 q1, f2 q2, ...}: field f1 satisfies q1, field f2 satisfies q2 etc.

and possibly some options (semicolon separated)

  • skip n: skip the first n results (n an expression of type int)
  • limit n: limit the result to the maximum of n results (n an expression of type int)
  • order fld (, fld)+: order the results by given fields. Possible variants include: fld, +fld (sort ascending), -fld (sort descending), fld=expr (where expr needs to be of type {up} or {down}).

Note that if the result of a query is not fully constained by the primary key and hence can contain multiple values then the result of the query is of type dbset (for sets) or of the initial map type (for maps, giving a sub-map). You can manipulate a dbset with DbSet.iterator and Iter.

Examples for sets:

user /users/all[id == 123]  // accessing a single entry by primary key
dbset(user, _) john_does = /users/all[name == "John Doe"]  // return a set of values
it = DbSet.iterator(john_does) // then use Iter module

dbset(user, _) underages = /users/all[age < 18]
dbset(user, _) non_admins = /users/all[status in [{regular}, {premium}]]
dbset(user, _) /users/all[age >= 18 and status == {admin}]
dbset(user, _) /users/all[not status == {admin}]
// showing second 50 results for users that are below 18 or above 62,
// sorted by age (ascending) and then id (descending)
dbset(user, _) users1 = /users/all[age <= 18 or age >= 62; skip 50; limit 50; order +age, -id]

Examples for maps:

// users who rated the movie 10
list(user_id) loved_it = /movie/ratings[10]
// users who rated the movie between 7 and 9 (inclusive)
list(user_id) liked_it = /movie/ratings[>= 7 and <= 9]