Skip to content

Low level MongoDB support

Frederic Ye edited this page Sep 3, 2012 · 4 revisions

Low-level MongoDB support

WARNING:

This chapter is about advanced uses of MongoDB in Opa and details low-level access to MongoDB in Opa. For most applications, you should only read this chapter instead.

Introduction

In this chapter, we describe the current state of support for MongoDB in the Opa standard library. We assume some familiarity with MongoDB concepts and particularly with the MongoDB shell. This familiarization can be gained by reading the MongoDB tutorial.

MongoDB is a server-based document-oriented non-relational database intended to be scalable and fast. Documents are stored in a binary JSON-like format called BSON. Although BSON has a richer set of types than JSON it is 100% compatible with JSON. For speed, MongoDB does not implement joins but is instead provided with a powerful query language of its own and almost anything that can be done with a relational database can be implemented in MongoDB with a little bit of effort (see MongoDB's page on SQL compatibility).

In addition, MongoDB allows multiple indices into its data although these are not automatic and have to be initiated in client code. MongoDB is intended to be deployed in reliable large-scale web-based applications and thus has features which facilitate scalability such as sharding and master-slave arrangements of servers along with features for reliability such as replicated servers with fail-over.

Backups of MongoDB data are usually done either offline on a slave server in the network using external tools or to redundant nodes in the MongoDB server network.

Setting-up MongoDB

If you are not familiar with the MongoDB database, here are some quick instructions to get you going. Firstly, make sure that you have MongoDB installed on your system:

% which mongod

Note that MongoDB doesn't come with any major packages such as Ubuntu, yet, but installation is trivial, download the latest version from the MongoDB downloads site and unpack the files locally. You should then just have to add the bin directory to your path and you should be up and running.

To run a MongoDB server, you first have to create a directory to store the database files. In fact, you need a directory for each node you wish to run, see the MongoDB documentation for how to create replica sets, sharding etc. At its simplest, start a mongod server with:

% mkdir -p ~/mongodata/master
% mongod --rest --oplogSize 500 --noprealloc --master --dbpath ~/mongodata/master > ~/mongodata/master/log.txt 2>&1 &

Use the --oplogsize and --noprealloc options to limit the initial allocated disk space (the default is about 1Gb). The --rest option allows you to monitor your database via the http interface (found at the port number plus 1000). If you wish to run the server on a different port, use the --port 27017 option, the default MongoDB server port is 27017. Note, however, that to run the MongoDB shell on a non-default port you also need the --port option:

% mongo --port 27017
MongoDB shell version: 2.0.1
connecting to: test
>

For the MongoDB Opa drivers we recommend version 1.6.0 or greater since much of the current functionality was mature by that version. We always recommend the current MongoDB stable version (at the time of writing 2.0.2) but for the most part the driver is quite stable with respect to backwards compatibility.

Overview

The Opa support for MongoDB consists of a hierarchy of modules leading to successively higher-level programming.

Bson

Support for the BSON binary format is in the form of the Bson module, all other modules are built on top of this one. In general, BSON values are handled by the Mongo.document Opa data-type but we also provide the Bson.opa2doc and Bson.doc2opa functions to allow conversion between Opa types and BSON documents.

MongoCommon

This contains general support routines for dealing with replies from the MongoDB server. These include:

  • printing results to meaningful strings
  • testing results for error status
  • handling tag lists instead of bit-mapped integers
  • extracting fields and Opa types from MongoDB replies

MongoConnection

The code which talks to the MongoDB server is in the private MongoDriver module. This includes support for replica sets with automatic reconnection on fail-over and cursors but for programming at this level we provide a single all-purpose module called MongoConnection.

Advanced programmers wishing to use some of the more obscure features of MongoDB can use the driver code directly but this is not recommended. MongoDB has a complex API involving over 70 functions and many of the simple access commands have numerous options. Our intention with this driver is to make accessing MongoDB databases as simple and logical as possible while still exposing the power and flexibility of the MongoDB engine.

MongoCommands

As an adjunct to the low-level programming interface we provide a module containing a large (but still incomplete) number of the MongoDB command set called MongoCommands. These encompass most functions that will be required for meta-programming the MongoDB database, such as dropDatabase, repairDatabase, createCollection and so on plus functions associated with normal database access operations such as getLastError. The more advanced MongoDB functionality is also supported here, including findAndModify and the very powerful mapReduce function.

These commands occur in two flavors, those which return Bson.document values and those which convert their results into Opa types. If you are only looking for a single value out of a large and complex reply document then using the Bson module access functions on the raw BSON may be more efficient. If you intend complex analysis of the reply then the Opa types may be more convenient. At the present time only partial support is provided for Opa types. Some command results may never be treated this way because they include arbitrary field names which we can't safely convert into Opa types.

MongoCollection

This module represents a type-safe view of the low-level routines in MongoConnection. Here, we insist upon Opa types as arguments and results from MongoDB operations. This necessarily limits what we can put into the database since the BSON documents stored in the database have to be consistent with the Opa types they represent.

To achieve this, we have implemented the MongoSelect and MongoUpdate modules which enforce a type discipline upon the arguments to, for example, MongoCollection.insert. The type safety is implemented as run-time type checks so there is a significant performance penalty for using these routines. In the future, however, we will provide fully type-safe compile-time type checks along the lines of the Opa internal database.

Programming

Here, we provide some notes on programming with the Opa MongoDB driver. The full interface is too large for complete coverage here, refer to the online Opa API documentation for detailed notes on each function.

Using BSON types in Opa

The full Opa BSON data-type is as follows:

/**
 * A BSON value encapsulates the types used by MongoDB.
 **/
type Bson.value =
    { float Double }
 or { string String }
 or { Bson.document Document }
 or { Bson.document Array }
 or { string Binary }
 or { string ObjectID }
 or { bool Boolean }
 or { Date.date Date }
 or { Null }
 or { (string, string) Regexp }
 or { string Code }
 or { string Symbol }
 or { (string, Bson.document) CodeScope }
 or { int Int32 }
 or { int32 RealInt32 }
 or { (int, int) Timestamp }
 or { int Int64 }
 or { int64 RealInt64 }
 or { Min }
 or { Max }

/**
 * A BSON element is a named value.
 **/
type Bson.element = { string name, Bson.value value }

/**
 * The main exported type, a BSON document is just a list of elements.
 */
type Bson.document = list(Bson.element)

While values of this type can be constructed manually:

doc = Bson.document
      [{name: "$eval", value: {Code:"function(x,y) \{return x*y;}"}},
       {name: "args", value:{Array:[{name:"0", value:{Int32:6}},
                                    {name:"1", value:{Int32:7}}]}}]

there are two more convenient ways of constructing BSON values. Firstly, we provide a set of abbreviations in the Bson.Abbrevs module:

H = Bson.Abbrevs
doc = Bson.document [H.code("$eval","function(x,y) \{return x*y;}"),
                     H.valarr("args",[{Int32:6},{Int32:7}])]

Secondly, we can construct the values in Opa and use Bson.opa2doc:

doc = Bson.opa2doc({`$eval`:(Bson.code "function(x,y) \{return x*y;}"),
                    args:(list(Bson.int32) [6,7])})

Notice that to get a field with non-alphanumeric characters we have to back-quote the field name in the Opa value and that to control the representation in the BSON type we can apply helper types, for example Bson.code is just a string but it instructs Bson.opa2doc to treat it as code. Remember also to escape curly brackets in strings. Note that to get Int32 values you need the Bson.int32 type, the default for int is actually Bson.int64.

There are several such types provided by the Bson module but some merit special mention:

  • Optional types have a special significance with respect to Bson.doc2opa in that if a field value is missing in the document it will appear in the Opa type as {none}. The alternate direction does not apply, {none} values are represented in the BSON document as { none : null }.
type Bson.register('a) = {'a present} or {absent}
  • We take this one step further, however, with the Bson.register type, which actually behaves much as option('a) except that when we call Bson.doc2opa any {absent} values are omitted from the resulting document altogether. Note that there is a module Bson.Register which provides the same functionality for Bson.register as the Option module does for type option.
  • Care should be taken in dealing with integer values which may have been placed into the database outside of Opa. Opa uses, internally, the OCaml integer representation int which is actually 31 bits wide on 32-bit systems and 63 bits wide on 64-bit systems (the spare bit is reserved by the garbage collector). Now MongoDB actually uses fully 32-bit and 64-bit integers which means that it is possible to find an integer value in a MongoDB database which is too large for the Opa representation (remember that all values generated by Opa and stored in the database are guaranteed to be within range). Currently, Opa only has 32-bit and 64-bit integers as abstract values. Such values can be stored in Opa as an external type (int32 and int64) but no operations are possible on these values (they are sometimes needed by external libraries). We handle this situation in the MongoDB driver by automatically detecting overflow values and storing them as RealInt32 and RealInt64 when returning Bson.document types from the driver. While these values may appear to be invisible to the Bson module functions such as find_int, you can detect overflows by inspecting the document values:
match (value) {
  case {RealInt32:_}: error("overflow");
  case {Int32:i}: i;
  default: error("not an int");
}
  • The Bson.meta type is intended to support situations where MongoDB can return a field of different types depending upon the nature of the command executed. A good example of this is the out option to the mapReduce function which can be either a string or a document type. We cast the parameter as Bson.meta which allows us to control the type at the function's application. We can also apply this trick to the result type from mapReduce calls:
mr = MC.mapReduceSimple(mongodb,map,reduce,{String:"example1"})

/* or */

mr = MC.mapReduceSimple(mongodb,map,reduce,{Document:[H.str("reduce","session_stat")]})
  • Two other cases should be mentioned. Both list and intmap are mapped onto Array values in BSON. The difference is that list is mapped to consecutive-numbered elements in the Array document whereas intmap allows sparse arrays.

As a rough guide to Bson.opa2doc and Bson.doc2opa, the following simple schema shows the mapping:

/* We use a "natural" mapping of constant types */
float <-> Double
string <-> String
Bson.binary <-> Binary
Bson.oid <-> ObjectID
bool <-> Boolean
Date.date <-> Date
void <-> Null
Bson.regexp <-> Regexp
Bson.code <-> Code
Bson.symbol <-> Symbol
Bson.codescope <-> CodeScope
Bson.int32 <-> Int32
Bson.realint32 <-> Int32
Bson.timestamp <-> Timestamp
Bson.realint64 <-> Int64
Bson.min <-> Min
Bson.max <-> Max

 /* Basic record scheme */
{a:'a; b:'b} <-> { a: 'a, b: 'b }

 /* Sum types */
{a:'a} / {b:'b} <-> { a: 'a } <or> { b: 'b }

 /* Non-record types are called "value" */
'a <-> { value: 'a }

 /* Special cases */

 /* Default for int is Int64 */
int <-> Int64

 /* Overflow */
Bson.realint32 <- Int32 /* when integer exceeds range */
Bson.realint64 <- Int64 /* when integer exceeds range */

 /* Options */
option('a):
  {some=a} <-> { some : 'a }
  {none} <-> { none : null }
  {none} <- { }

 /* Registers */
Bson.register('a):
  {present=a} <-> { present : 'a }
  {absent} <- { absent : null }
  {absent} <-> { }

 /* Lists are consecutive arrays */
list('a) <-> { Array=(<label>,{ 0:'a; 1:'a; ... }) }

 /* Intmaps are non-consecutive arrays */
ordered_map(int,'a) <or>
intmap('a) <-> { Array=(<label>,{ 1:'a; 3:'a; ... }) }

 /* Bson.document is treated verbatim (including labels) */
Bson.document <-> Bson.document

 /* Bson.meta is treated as a variable type */
int:Bson.meta <-> { Int64:int }
string:Bson.meta <-> { String:string }
bool:Bson.meta <-> { Boolean:bool }
etc.

Notes:

  • For ObjectID values, there are a couple of routines which convert between (hex value) strings and the BSON representation, Bson.oid_of_string and Bson.oid_to_string. You can also create a BSON-style OID value with Bson.new_oid.
  • Bson.document types are completely write-through, i.e. they are not processed at all.
  • In case you're wondering, Min and Max are used in sharded databases to indicate infimum and supremum bounds on sharding regions, respectively.

//TODO: other functions find_xyz, to_pretty, error stuff

Using the low-level interface

Connecting to and using the low-level drivers should be done using the MongoConnection module. This gathers together various low-level features in a single module.

Opening a connection to the MongoDB server

The preferred method is to use the system of named connections which can be defined from the command line or setup internally using the Mongo.param type and the MongoConnection.add_named_connection function.

Initially, there is one default connection (called ''default'') which is set to localhost:27017, the default port for MongoDB servers on the local machine. To open this connection use:

mongodb =
  match (MongoConnection.open("default")) {
    case {success:mongodb}: mongodb
    case {~failure}: ... /* take action on error */
  }

/* or */

mongodb = MongoConnection.openfatal("default")

The MongoConnection.open function returns an outcome of either the connection or the standard Mongo.failure type whereas the MongoConnection.openfatal function returns just the connection but treats a failed connection as a fatal error.

To setup the connection from the command line the following options are defined:

Option Abbrev Type Description
`--mongo-name` `(--mn) ` Name for the MongoDB server connection
`--mongo-repl-name` `(--mr) ` Replica set name for the MongoDB server
`--mongo-buf-size` `(--mb) ` Hint for initial MongoDB connection buffer size
`--mongo-socket-pool` `(--mp) ` Number of sockets in socket pool (>=2 enables socket pool)
`--mongo-seed` `(--ms) {:}` Add a seed to a replica set, allows multiple seeds
`--mongo-host` `(--mh) {:}` Host name of a MongoDB server, overwrites any previous hosts
`--mongo-log` `(--ml) ` Enable MongoLog logging
`--mongo-log-type` `(--mt) ` Type of logging: stdout, stderr, logger, none
`--mongo-auth` `(--ma) ` Define user name and password for database dbname

So, for example, to connect to the default connection at machinexyz:12345 you would use:

% prog.js --mh machinexyz:12345

This remains a single connection, to connect to a replica set you also need to define a name for the replica set plus some seeds:

% prog.js --mn blort --mr blort --ms machinexyz:27017 --ms machineuvw:27017

Here we have defined a connection called ''blort'' to a replica set also called ''blort'' with two seed machines. Remember that you only really need one seed which is active in the set, the connection logic queries the seeds for the actual host list and then polls the hosts until it finds the current primary server. From then on reconnection will be attempted if the current primary goes down.

Note that you can define as many named connections as you like, this example still retains the default connection.

Note also that you can clone a connection such that the connection itself will not be closed until all clones have already been closed.

Handling concurrency within an Opa program is done by a socket pool. This means that a pool of open connections is maintained to the same server such that blocking only occurs if there are no more available connections in the pool (set with --mp 2, for example). If you ensure that the pool size is at least as big as the number of threads in your code then no blocking will occur.

Named connections can also be defined within the program:

MongoConnection.add_named_connection({
  name: "blort",
  replname: {some: "blort"},
  bufsize: 50*1024,
  pool_max: 2,
  log: false,
  seeds:[("localhost",10001),("localhost",10002)],
  auth:[{dbname:"mydb",user:"me",password:"secret"}]
})

mongodb2 = N.openfatal("blort")

Once a connection has been opened, it can be pointed to different databases and collections using a functional interface. The default database is ''db'' and the default collection is ''collection'' but we can make a connection to a different collection without re-opening the connection as follows:

mongodb_wiki = MongoConnection.namespace(mongodb,"db","wiki")

This mechanism also applies to the flags that some of the MongoDB operations can take, for example to set the Upsert flag for all insert operations:

mongodb3 = MongoConnection.upsert(mongodb)

This method is quite flexible since you can define these flags once when the connection is made, making the flags globally persistent, or you can add these function calls at the point of calling the operation, i.e. locally defined flags (there are examples below). All of the MongoDB flags are supported in this way.

One particular flag is worth mentioning, the log flag which can be set on the command line and can actually be overridden in this way allowing you to generate logs for targeted sections of code. In fact, you can change any of the command line options this way but bear in mind that some of them, for example, seed lists, will not take effect until the connection is reconnected.

Authentication

As you can see, you can add the MongoDB authentication parameters for a given database either on the command line using the --mongo-auth argument which is of the form: user:password@database_name or by placing the authentication parameters in the auth field in the add_named_connection function argument.

Alternatively, you can call the MongoCommands.authenticate function to perform an additional, external authentication. Note that if you are connecting to a replica set then the driver needs to re-authenticate after connecting to the new host so the authentication parameters are built into the low-level Mongo datatype. This means that if you call this function you should perform all subsequent operations on the returned Mongo datatype, not on the original which won't have the parameters built in.

Remember that authentication in MongoDB is to a database, not to a connection so you can have multiple user names and passwords associated with a single connection. If you want to authenticate with all of the databases over a connection you need to authenticate with the admin database which acts a bit like ''root'' access for databases.

Basic operations

The basic database access operations are the same as the MongoDB protocol operations, i.e. insert, update, query, get_more, delete, kill_cursors and msg. So, for example, to insert a document:

/* A couple of documents */
p1 = [H.str("name","Joe1"), H.i32("age",44)]
p2 = [H.str("name","Joe2"), H.i32("age",55)]

/* Insert the documents */
MongoConnection.insert(mongodb,p1)
MongoConnection.insert_batch(mongodb,[p1,p2])

The basic write operations come in three types:

  • insert is the write-and-forget operation where the insert message is sent and a boolean value is returned which simply states that the correct number of bytes were written to the socket.
  • inserte is a ''safe'' operation where the insert message has a getlasterror query piggy-backed onto it and then the raw optional reply is returned.
  • insert_result does an inserte and then analyzes the reply, turning it into a standard Mongo.result type.

All of the basic write operations have these three forms. The Mongo.result type is an outcome of either success as a Bson.document type or failure as a Mongo.failure type. The Mongo.failure type looks like:

type Mongo.failure =
    {OK}
 or {string Error}
 or {Bson.document DocError}
 or {Incomplete}
 or {NotFound}

This defines either a raw document error {DocError:doc} which is an error as reported by the MongoDB server, a driver error {Error:str} which is a message generated by the Opa driver or a few special-purpose errors returned under specific circumstances ({OK} is simply a connection that has never been used).

Post-processing of results may include checking for errors:

error = MongoConnection.insert_result(MongoConnection.upsert(mongodb),[H.i32("i",n)])
println("insert error={MongoCommon.is_error(error)}")

or extracting specific fields from the reply:

println("errmsg={MongoCommon.result_string(error,"errmsg")}")

noting that we also support the MongoDB dot notation syntax:

println("indexSizes._id_={MongoCommon.dotresult_int(collStats,"indexSizes._id_")}")

Closing a connection is as simple as:

MongoConnection.close(mongodb)

Remember that the connection will only close once all of the clones have also been closed.

Cursors

Handling queries in MongoDB has the complication that, for efficiency, cursors are stored on the server which entails tracking them at the client side. While the bare MongoConnection.query and MongoConnection.get_more operations can be used to handle queries in conjunction with the reply support code in MongoCommon they are a bit inconvenient.

For this purpose we have defined cursor operations in the MongoCursor module and re-exported the most important ones into the MongoConnection.Cursor module. A cursor object itself contains all the parameters needed to manage the cursor at the server side and, in fact, duplicates some of the information in the connection object. Using the re-exported functions reduces the number of parameters to the basic functions since this information can be lifted from the connection into the cursor object.

Here is an example of a low-level cursor dialog:

cursor = MongoConnection.Cursor.init(mongodb)
cursor = MongoConnection.Cursor.set_query(cursor,{some:[H.str("name","Joe")]})
cursor = MongoConnection.Cursor.set_limit(cursor,3)
cursor = MongoConnection.Cursor.set_fields(cursor,{some:[H.i32("_id",0)]})
cursor = MongoConnection.Cursor.next(cursor)
result = MongoConnection.Cursor.check_cursor_error(cursor)
println("result 1 = {MongoCommon.pretty_of_result(result)}")
println("valid 1 ={MongoConnection.Cursor.valid(cursor)}")
cursor = MongoConnection.Cursor.next(cursor)
result = MongoConnection.Cursor.check_cursor_error(cursor)
println("result 2 = {MongoCommon.pretty_of_result(result)}")
println("valid 2 = {MongoConnection.Cursor.valid(cursor)}")
MongoConnection.Cursor.reset(cursor)

The cursor is initialized with init and then the parameters for the query are setup. The next function generates the query (or get_more) call to the server and places the next document internally in the cursor object along with any error status. The check_cursor_error function is a convenient way of extracting either the current document or the error as a Mongo.result. Subsequent calls to next will either return the next document from the previous reply or issue a get_more call to re-populate the cursor. The end of the matching documents (or if no document matches) is signaled with NotFound and if you try to read past the end of matching documents you will get an ''end of data'' error from the driver. The valid function is used to poll whether there is any remaining data. Finally, the call to reset is important here because it doesn't just end the query, it will issue a kill_cursors operation to the server to tell it to delete the cursor (cursors time out after 10 minutes by default on the MongoDB server).

This method works fine but this logic has been wrapped up into some convenience functions:

  • find_one returns the first matching document as a Mongo.result
  • find_all gives all the matches as a list of documents (use the limit function to limit the number of replies).

For example:

/* Find all objects in db.session, excluding the _id field */
mongo_session_no_id =
  MongoConnection.fields(MongoConnection.namespace(mongodb,"db","session"),{some:[H.i32("_id",0)]})
println("findAll: {CM.pretty_of_results(MongoConnection.Cursor.find_all(mongo_session_no_id,[]))}")

You can also define custom loops over the matches using start (or find) in conjunction with next and valid. (Note that you must use the MongoConnection.Cursor.for loop instead of the more usual for function in the Opa stdlib, you need to check for valid and only call next if still valid at that point, otherwise you will miss the last document in the list of matches).

Collections

While you can achieve anything that MongoDB is capable of using the low-level drivers, there are no guarantees of type safety while converting between BSON documents and Opa values. You can of course base your entire project around BSON values and eliminate the need for converting between MongoDB's documents and Opa types altogether but this may not be very convenient depending upon what is happening elsewhere in your application. Secondly, to use the low-level drivers requires an investment in learning MongoDB's powerful but rather complex interface (which may be new to users of relational databases) in order to exploit what MongoDB has to offer. Finally, basing your application on MongoDB's API will tie your application to MongoDB and you may at some point in the future wish to migrate to other database solutions.

Ultimately, the intention is to provide an abstract view of the database which is general enough to encompass several of the existing database solutions, of which MongoDB is an important player, and support this with compiler-generated syntax in the manner of the Opa inbuilt database. This support is still not available but we can offer an intermediate layer of programming MongoDB whereby we assume collections of Opa types and support type-safety by performing run-time type-checks on operations over these collections. This support is in the form of the MongoCollection module plus some support modules for generating values suitable to be applied to these functions.

The collection type

The central idea in the MongoCollection module is a collection (in the MongoDB terminology sense) of Opa values. This is embodied in the Mongo.collection type which is extremely simple, it's just a MongoConnection value cast to the specific type of the values to be stored in the collection:

type Mongo.collection('a) = {
  Mongo.mongodb db /* the mongodb connection */
}

When a value is stored in the collection it is automatically converted from its Opa type into a matching BSON document and vice versa for queries.

While this sounds simple there are a number of pitfalls to watch out for. We assume that any offline modifications of the collection will not create any incompatible values. If, for example, we add or delete a field from a record then the entry can no longer be represented as an Opa type.

To overcome this problem we place checks in the code to verify the suitability of documents read from the collection and an error will be generated if any such values are found. We also provide features to allow handling of this situation in some specific circumstances, for example, if you type a field in the collection as Bson.register it will allow you to successfully read in values with missing fields but this is not recommended for collections. Ultimately, it is up to the maintainer of the database to ensure that the values stored there are consistent with the application's usage of the collection.

Despite these provisos, using a collection is very simple and gives the programmer the ability to integrate Opa types with the MongoDB system without having to understand the underlying complexity of the database and with a modest level of type-safety. The cost, for the moment, is the overhead of the run-time type-checks which will slow down database operations.

Programming with collections

A simple dialog for creating and manipulating a collection might be as follows:

/* The type of our first collection */
type t = {int i}

/* Create a collection of type t */
Mongo.collection(t) c1 = MongoCollection.openfatal("default","db","collection")

/* Put a single value into the collection */
result = MongoCollection.insert_result(c1,{i:0})

/* Finally, destroy the collection */
MongoCollection.destroy(c1)

We define a type for the collection (type t) so that when we open a connection to the database we can cast the resulting collection object and thus install the correct run-time representation of the type. The openfatal function returns a collection and treats a connection failure as fatal. There are several variants of the open function.

A collection is a pointer to a specific collection in the database (here, db.collection) and we create a connection to the MongoDB server using the connection name (in this instance, default).

Inserting a value into the collection is trivial, the value is simply passed as it is to the insert function (here we use the safe insert_result function which also returns the result of a getlasterror call). The insert has exactly the same effect as a call to MongoConnection.insert but with the value automatically converted into a BSON document using the scheme outlined above.

The call to MongoCollection.destroy should not be forgotten because this closes the underlying connection.

While the insert function is trivial, we need more care with update and delete. The problem is that to maintain our level of type-safety we need to match select (and update) documents with the type of the collection they are applied to. We do this with a system of run-time type-checks applied to the select documents. For example:

/* Create pre-typed select and update generation functions */c
MongoSelect.create reatest = Bson.document -> Mongo.select(t)
MongoUpdate.create createut = Bson.document -> Mongo.update(t)

/* Generate the select documents */
select = createst(MongoSelectUpdate.int64(MongoSelectUpdate.empty(),"i",0))
update = createut(MongoSelectUpdate.inc(MongoSelectUpdate.int64(MongoSelectUpdate.empty(),"i",1)))

/* We can now apply update to these documents */
result = MongoCollection.update_result(c1,select,update)

Firstly, we use the MongoSelectUpdate module to generate the basic documents. Note that we could also have used the Bson.opa2doc function to achieve the same result:

select = createst(Bson.opa2doc({i:0}))
update = createut(Bson.opa2doc({`$inc`:{i:1}}))

The choice between these two styles may depend upon the type of document being generated. The Opa type-based versions are more readable but the MongoSelectUpdate ones are much faster since no conversion is required.

The select documents have to be correctly typed for the collection they apply to so we generate a couple of convenience functions createst and createut to do the casting for us.

Secondly, once we have these documents we can apply the update function to them but note that although a select document is just a typed Bson.document it triggers a set of suitability tests. These tests are complex and probably do not cover all possible MongoDB operations but briefly, the select document is scanned by a knowledge-base of the types of MongoDB field types, for example $inc only applies to updates, $and only applies to selects whereas $comment can apply to both. Once the status (select/update/both) is determined, the type of the resulting values is determined from the select document and is verified to be a subtype of the type of the collection. So, for example, {int a} is a subtype of {int a, string b} but {int a, bool c} is not. Presently, we only print a suitable warning but in future, once these routines have fully matured we may return an error value.

All of the basic database write operations occur in both send-and-forget and in send-with-getlasterror forms: insert, insert_result, insert_batch, insert_batch_result, update, update_result, delete and delete_result.

As an aside, notice that we use a similar functional interface for flags as for the low-level code:

MongoCollection.delete(MongoCollection.singleRemove(c1),createst(Bson.opa2doc({i:104})))

The select mechanism applies to queries as well but in this case we have to be careful what types we return from the database:

result = MongoCollection.find_one(c1,createst(Bson.op12doc({`$where`:(Bson.code "this.i > 106")})))
match (result) {
  case {success:{~i}}: println("i={i}")
  case {~failure}: println("error={MongoCommon.string_of_failure(failure)}")
}

This example returns the first value in the collection for which i is greater than 106, it expresses the select as a JavaScript expression. Many of the MongoDB query methods are perfectly safe with collections such as the $where example here but some methods are not safe in that they return documents which contain fields other than those in the Opa type, a good example being the http://www.mongodb.org/display/DOCS/Explain[`$explain`] documents which are a set of statistical data concerning the given query (see the Mongo.explainType type in MongoCommands). In general, we attempt to support such features with special purpose functions rather than via the normal database operations.

The usual simplified query functions are present in MongoCollection, find_one and find_all. There are also two functions which return the bare Bson.document representation of the result, find_one_doc and find_all_doc which may be useful in the above situation where the result of the query is not compatible with Opa types. For more general query scanning, the cursor-based routines are available. For example, the following code scans the results of a MongoCollection query

query = createst(Bson.opa2doc({i:{`$gt`:102, `$lt`:106}}))
match (MongoCollection.query(MongoCollection.limit(c1,0),query)) {
  case {success:cc1}:
    cc1 =
      while(cc1,(function(cc1) {
                   match (MongoCollection.next(cc1)) {
                     case (cc1,{success={~i}}):
                        println("i={v}")
                        (cc1,MongoCollection.has_more(cc1))
                     case (cc1,{~failure}):
                        println("error={MongoCommon.string_of_failure(failure)}")
                        (cc1,false))})
    MongoCollection.kill(cc1)
  case {~failure}:
    println("error={MongoCommon.string_of_failure(failure)}")
}

In this code, we create a Mongo.collection_cursor object using MongoCollection.query to which we can then apply the collection-specific cursor functions MongoCollection.next and MongoCollection.has_more. This allows arbitrary processing of collection queries. Remember, as with the low-level cursors above, that the MongoCollection.kill function does not just end the scan, it also sends a kill_cursors message to the MongoDB server to tell it to destroy the cursor.

Another aside in this code is that we set the limit value to 0 which means ''use the default number of documents per reply''. If we had set this to 1 we would only ever get one document in the reply because MongoDB treats this as a special case, i.e. ''just return one document''.

Again, to help with the situation where return values may be incompatible with Opa types, we provide the _unsafe variants of the query functions. These, for example query_unsafe, take an additional boolean flag, ignore_incomplete which instructs the driver to simply ignore any return documents which have missing fields and are thus not compatible with Opa types. MongoDB will actually return partial documents if the document meets the query document but does not contain all of the fields (an exception is the _id field which is always returned unless specifically excluded with the return field selector document). These functions should be used with care.

Apart from the support described here the MongoCollection module also provides a few convenience functions such as creating indexes using collection objects and some direct support for some of the aggregation functions (count, distinct and group). Finally, one of the variants of the open function, openpkg and openpkgfatal supplies a set of pre-cast versions of MongoSelect.create and MongoUpdate.create.

Example: Hello, MongoDB wiki

In this section, we describe how to convert the hello_wiki example described in the previous chapter to using the MongoDB database. This is actually a simple process and uses MongoDB as a simple key-value storage database.

// TODO: more realistic example

The first task is to open a connection to the database. We are going to use collections and in fact, we will use the version of open which also gives us the casting functions for selects:

/**
 * The basic info. about the database and table location.
 */
type page = {
  string _id,
  Bson.int32 _rev,
  string content
}

/**
 * We work at level 1, run-time type-checked storage of a collection of Opa values.
 * The Mongo.pkg type provides convenience functions for building select and update documents.
 **/
Mongo.pkg(page) (wiki_collection,wiki_pkg) = MongoCollection.openpkgfatal("default","db","wiki");
function pageselect(v) { wiki_pkg.select(Bson.opa2doc(v)); }
function pageupdate(v) { wiki_pkg.update(Bson.opa2doc(v)); }

The _rev field has been cast to Bson.int32 so we can use 32-bit integers for this field (it is unlikely we will ever have more than 4 giga-revisions of any value in the database!). We then open our connection using the default named connection and connect to the collection db.wiki. This returns a collection object plus a package of values which we use to build our select documents.

Next we are actually going to search for documents including the _rev field so we can't just use the default index for our collection (the _id field):

/**
 * Indexes aren't automatic in MongoDB apart from the non-removable _id index.
 * Since we're searching on _rev as well, we need a separate index.
 **/
MongoCollection.create_index(wiki_collection, "db.wiki", Bson.opa2doc({_id:1; _rev:1}), 0)

The get_content function can then be modified using a simple call to MongoCollection.find_one:

function get_content(docid) {
  default_page = "This page is empty. Double-click to edit."
  function extract_content(page record) { record.content }
  /* Order by reverse _rev to get highest numbered _rev. */
  orderby = {some:Bson.opa2doc({_rev:-1})}
  match (MongoCollection.find_one(MongoCollection.orderby(wiki_collection,orderby),pageselect({_id:docid}))) {
    case {success:page}: extract_content(page)
    case {failure:{NotFound}}: default_page
    case {~failure}:
      jlog("hello_wiki_mongo: failure={MongoCommon.string_of_failure(failure)}")
      default_page
  }
}

We search the database for the given _id value but we want the highest-numbered _rev field so we sort by inverse order on that field (the default ordering for numerical fields is in increasing order). A missing document is signaled by the NotFound failure condition, other failure values are errors.

Finally, the save_source function becomes a call to MongoCollection.update_result:

exposed function save_source(topic, source) {
  select = pageselect({_id:topic})
  update = pageupdate({`$set`:{content:source}, `$inc`:{_rev:(Bson.int32 1)}})
  /* Upsert this so we create it if it isn't there */
  result = MongoCollection.update_result(MongoCollection.upsert(wiki_collection),select,update);
  if MongoCommon.is_error(result)
    then <>Error: {MongoCommon.pretty_of_result(result)}</>;
    else load_rendered(topic);
}

In this case, we select only the _id field and we update the document by setting the content field and incrementing the _rev field. Note that we use the Upsert flag which tells MongoDB to insert the document if it isn't already present in the collection. We test the result for errors using the safe update operation but apart from that the code is identical to the existing ''Hello wiki'' example.

Clone this wiki locally