Skip to content

Query Builder

snej edited this page Dec 16, 2014 · 16 revisions

The CBLQueryBuilder class provides a medium-level interface for defining database queries, somewhat like Core Data's NSFetchRequest. Instead of thinking about map functions and key ranges, you provide "select", "where" and "order by" expressions; the query builder uses these to define a view and configure the query. However, it's not as high-level as a full query language like SQL or N1QL.

CBLQueryBuilder does not make CBLView obsolete! It doesn't support everything you can do with map/reduce, so there are cases where you'll run into its limits and need to define your own view and queries. But it does take care of most common types of queries.

Compatibility: CBLQueryBuilder is a new class in Couchbase Lite 1.1. (An earlier version was formerly available in version 1.0.3 as an unsupported 'extra', but not built into the library.) It is iOS/Mac specific, since it's based on Cocoa classes like NSPredicate and NSSortDescriptor. Comparable functionality will be available on other platforms too, with APIs that are idiomatic for those platforms.

1. Overview

A CBLQueryBuilder defines a query or a family of related queries (since it can contain variables whose values are substituted later.) It has four attributes:

  1. The database to query.
  2. A "where" predicate (boolean-valued function) that chooses which documents match the query.
  3. An array of values to select from the matching documents. Each value is an expression based on properties of the document.
  4. An optional array of sort descriptors that determine the order the results appear.

(This may look something like a SQL SELECT statement, but the details are different.)

Based on these attributes, the query builder will create a view and register its map function (or reuse an existing compatible view.) You don't need to worry about this. You just ask the query builder to create a pre-configured CBLQuery object for you, and all you need to do is run it and use the result.

Often a query's predicate will depend on specific values that aren't known until runtime, such as a minimum or maximum property value. You can specify these as as placeholder variables; in a predicate string they're $-prefixed identifiers. When creating a query, you provide specific values for these variables.

IMPORTANT: The best practice is to create a query builder once, ahead of time, keeping a reference to it in a global variable or long-lived instance variable. This reduces the overhead of parsing the attributes and creating the view. If you're creating a query builder each time you run a query, you're probably doing it wrong.

2. Example

Data model

Let's assume a simple blog database that we want to search for posts tagged with a given keyword. The posts will look like this:

{
    "type": "post",
    "title": "Introducing CBLQueryBuilder",
    "author": "Jens Alfke",
    "date": "2014-12-11",
    "body": "Yo, 'sup. Check out this new class...",
    "tags": ["couchbase lite", "iOS", "queries", "views", "API"]
}

Creating the query builder

When the application's search controller object is initialized, it creates a query builder:

    self.tagQueryBuilder = [[CBLQueryBuilder alloc]
            initWithDatabase: self.database
                      select: @[@"title", @"body", @"author", @"date"]
                       where: @"type == 'post' and tags contains $TAG"
                     orderBy: @[@"-date"]
                       error: &error];

We are limiting the search to documents whose type property is "post", since there are likely to be other types of documents (comments, authors...) Also, we don't know the specific tag to search for, so we leave it as a variable $TAG.

Note: The language used in the where: parameter is Apple's predicate syntax, not SQL or N1QL or any other query language.

The select: parameter value is an array of the values we need from the query to display the search results. (It's more efficient to get these values directly from the CBLQueryRow than to ask each row to load and parse its CBLDocument.) We've simply given them as property name strings, but they could be more complicated expressions given as NSExpression objects.

The orderBy: array is using the string @"-date" as a shortcut meaning "in descending order of date property". This could also have been specified as an NSSortDescriptor object.

Querying

Then when a tag is given, the controller can perform a query:

- (void) queryForTag: (NSString*)tag {
    NSError* error;
    CBLQueryEnumerator* e = [self.tagQueryBuilder runQueryWithContext: @{@"TAG": tag}
                                                                error: &error];
    if (e) {
        [self displayRows: e.allObjects];
    } else {
        [self handleError: error];
    }
}

To create a query, we have to give a context dictionary that provides a value for each variable, in this case TAG.

Important: The $ prefix used in the predicate is not part of the variable name itself, and isn't used in the dictionary key. If you put it in by accident, you'll get a runtime exception complaining that the variable TAG doesn't have a value.

Interpreting the query results

Since we asked the query builder to return four results (@"title", @"body", @"author", @"date"), each CBLQueryRow's value property will be an array of four objects corresponding to those four values. (If we'd asked for only a single result, the value would just be the result, not wrapped in an array.)

3. Configuring Query Builders

As stated before, a query builder has a predicate, an array of values to select, and an optional array of sort descriptors.

Predicate ("where")

The predicate is a boolean-valued function implemented as an NSPredicate object. It takes a document's properties as input and returns true if the document should appear in the query results, or false if it shouldn't.

NSPredicate is a complex class. An NSPredicate is actually the root of a tree of NSExpression objects and nested NSPredicates, representing a parse tree or Abstract Syntax Tree (AST) of a predicate in Apple's syntax. Fortunately you usually don't have to worry about this: although you can construct predicates by hand from trees of objects, you usually just give them in a string form that gets parsed into the tree automatically.

The basics of creating the predicate are pretty simple:

  • Refer to document properties by name; for example tags in the predicate refers to the value of a document JSON property named "tags".
  • Standard relational operators (<, ==, etc.) are available and work as you'd expect on numbers and strings.
  • You can combine multiple conditions with AND or &&.
  • You can look for a value in an array property using CONTAINS (as in the example).
  • There are several operators for string matching. BEGINSWITH can be implemented efficiently as a prefix match; the others require post-filtering, including ENDSWITH, LIKE (simple pattern matching using * and ?) and MATCHES (full ICU regex matching).
  • To check for string containment, use the IN operator with the property on the right and the variable substring you're looking for on the left, e.g. $KEYWORD IN title.
  • String comparisons and matching are, by default, case and diacritic sensitive. You can modify an operator using the key characters c and d within square braces to specify case and diacritic insensitivity respectively, for example firstName BEGINSWITH[cd] $FIRST_NAME.
  • You can see if a value matches multiple candidates using IN (for example, dept IN $departments, where $departments is an array supplied at query time.)

See the Apple documentation for details.

Note: The CONTAINS operator is a bit confusing. NSPredicate allows the left-hand side to be either a string or an array at runtime, and does a substring match or an array lookup accordingly. But CBLQueryBuilder has to plan the map function and query differently for those two cases, so it has to decide ahead of time. We chose to make CONTAINS imply array containment, leaving the synonymous (but reversed) IN operator for strings. If you forget and use CONTAINS for string matching, you'll get a warning at runtime when the map function runs.

Unsupported or inefficient predicate features

Some NSPredicate features aren't supported, either because they can't be performed efficiently or because we just haven't had time to implement them yet.

  • You can't combine sub-predicates using OR or NOT (only AND) if they refer to variables.
  • You can't use the CONTAINS operator on strings; the query builder assumes you mean it as array containment and will generate the map function accordingly. (For example, title CONTAINS $TXT will not do what you expect since it will assume title is an array.)

Values ("select")

The select: parameter is an array describing the values that you want from a query: it directly determines what each CBLQueryRow's value property will contain.

Each item in the array can be an NSString or an NSExpression.

An NSString simply represents a property name or key path, relative to the document. So @"price" denotes the value of the price property, and @"address.city" denotes the city sub-property of the JSON object in the address property. Key paths also support some aggregation operators that can be used with array-valued document properties. (This is not the same as aggregation of multiple rows, aka reducing, which is described below.)

An NSExpression is an object similar to an NSPredicate (in fact, expressions appear as child nodes in a predicate's tree.) It represents an expression that can be evaluated at map time from the properties of the document. Like a predicate, it can be parsed from a string or constructed as an object tree.

When you iterate over the result rows, each one's value property will correspond to the builder's select: parameter -- it will be an array whose elements correspond one-to-one. The exception is if you only specified a single value; in that case the row's value property will simply be that value, not an array.

Computing aggregates, or, reducing

You can compute an aggregate numeric value across all rows, such as an average, by specifying a value as an NSExpression that invokes one of the supported reduce functions: count, sum, min, max, average, median, stddev. For example:

    NSExpression* avg = [NSExpression expressionWithFormat: @"average(price)"];
    CBLQueryBuilder* b = [[CBLQueryBuilder alloc] initWithDatabase: db
                                                            select: @[avg]
                                                             where: @"type == 'book'"
                                                           orderBy: nil
                                                             error: &error];

(You can't just use the string literal @"average(price)" in the where: parameter; it has to be parsed into an NSExpression.)

Behind the scenes this creates a view with a map function that emits doc.price, and a reduce function that computes the average of the values. The query will return one row whose value is the average. You'd run it like so:

    CBLQueryEnumerator* e = [b runQueryWithContext: nil error: &error];
    double average = [[e.nextObject value] doubleValue];

Sort Descriptors

You can optionally specify an ordering for the query result rows. If you don't care about the order, leave the orderBy: parameter nil; this can make the query faster.

Sorting is specified in Cocoa fashion, using an array of NSSortDescriptor objects. The first one represents the primary sort order, the second one would be a secondary order, etc. As above, the key paths in the sort descriptors are interpreted relative to the document.

4. Troubleshooting

Errors

If the CBLQueryBuilder initializer returns nil, that means it can't support the specific query you asked for. In this case check the localizedDescription in the NSError returned via the final parameter; this should hopefully identify what it wasn't able to handle. (See above for some discussion of unsupported predicate features.)

If you need to enhance an existing query builder but find that you've exceeded the limits of what CBLQueryBuilder can handle, you can use the explanation property (see below) of the existing query as a guide to re-implement it using CBLView and CBLQuery, and then make modifications to the map function or the query setup to reach your goal.

Warnings

CBLQueryBuilder: '%@ contains...' expected %@ to be an array but it's a string; remember not to use 'contains' as a string operation! -- You'll get this warning during view indexing if you use the CONTAINS predicate operator to test for string containment instead of array containment. NSPredicate allows the operator to do both, but CBLQueryBuilder has to implement the two differently, so it reserves CONTAINS for arrays. To help make the mistake more obvious, it logs this warning if it sees that the left-hand side of CONTAINS is a string. Solution: Instead of string CONTAINS $substr, use $substr IN string.

Wrong Results

If the resulting query returns the wrong results, it can be helpful to look at the query builder's explanation property: this returns a string containing a human-readable multi-line description of how the query is implemented. It gives a pseudo-code listing of the underlying view's map function, and the values of some of the query properties like startKey.

Here's the explanation of the query builder used in the previous example section:

// view "builder-+nWvuTVG43pphUMxD0HgzuDYhYU=":
view.map = {
    if (type == "post")
        for (i in tags)
            emit([i, date], [title, body, author]);
};
query.startKey = [$TAG];
query.endKey = [$TAG];
query.prefixMatchLevel = 1;

From this you can tell that the view index's key consists of an array containing a tag and a date, and the corresponding value contains the title, body and author. The query will return the rows from the index that start with the requested tag.

Note: If the explanation looks incorrect, it's possible CBLQueryBuilder hasn't translated your input correctly. Please file an issue and we'll investigate.

Performance

The explanation string can give you some clues for performance problems:

  • If the map function has no if test, it's going to emit a row or rows for every document in the database. This is usually not what you want, unless the query genuinely applies to all documents. It's usually a side effect of specifying a predicate that can't be resolved during indexing and has to be deferred to a postFilter.
  • A query.postFilter predicate runs at query time; it's called on every row and can slow down the query if the result set is large.
  • Similarly, a query.sortDescriptor is also applied after the query completes (and after any postFilter), and can be slow if there are a lot of rows returned.

Some NSPredicate features are implemented by post-processing the results from the index, which could make the query slower if a lot of results are returned and have to be checked one by one:

  • Multiple inequality tests against variables (e.g. date >= $SINCE and price < $MAXPRICE) -- only the first one will be handled by a range lookup from the index, the other(s) will be checked afterwards.
  • The string operators ENDSWITH, LIKE or MATCHES, where the right-hand side is a variable (e.g. name like $PATTERN).
Clone this wiki locally