babbage
is a lightweight implementation of an OLAP-style database
query tool for PostgreSQL. Given a database schema and a logical model
of the data, it can be used to perform analytical queries against that
data - programmatically or via a web API.
It is heavily inspired by Cubes but has less ambitious goals, i.e. no pre-computation of aggregates, or multiple storage backends.
babbage
is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).
babbage
will normally included as a PyPI dependency, or installed via
pip
:
$ pip install babbage
People interested in contributing to the package should instead check out the
source repository and then use the provided Makefile
to install the
library (this requires virtualenv
to be installed):
$ git clone https://github.com/openspending/babbage.git
$ cd babbage
$ make install
$ pip install tox
$ export BABBAGE_TEST_DB=postgresql://postgres@localhost:5432/postgres
$ make test
babbage
is used to query a set of existing database tables, using an
abstract, logical model to query them. A sample of a logical model can be
found in tests/fixtures/models/cra.json
, and a JSON schema specifying
the model is available in babbage/schema/model.json
.
The central unit of babbage
is a Cube
, i.e. a OLAP cube that uses the provided model metadata to construct queries
against a database table. Additionally, the application supports managing
multiple cubes at the same time via a CubeManager
, which can be
subclassed to enable application-specific ways of defining cubes and where
their metadata is stored.
Futher, babbage
includes a Flask Blueprint that can be used to expose
a standard API via HTTP. This API is consumed by the JavaScript babbage.ui
package and it is very closely modelled on the Cubes and OpenSpending HTTP
APIs.
Let's assume you have an existing database table of procurement data and
want to query it using babbage
in a Python shell. A session might look
like this:
import json
from sqlalchemy import create_engine
from babbage.cube import Cube
from babbage.model import Measure
engine = create_engine('postgresql://localhost/procurement')
model = json.load(open('procurement_model.json', 'r'))
cube = Cube(engine, 'procurement', model)
facts = cube.facts(page_size=5)
# There are 17201 rows in the table:
assert facts['total_fact_count'] == 17201
# There's a field called 'total_value':
assert 'total_value' in facts['fields']
# We can get metadata about it:
concept = cube.model['total_value']
assert isinstance(concept, Measure)
assert concept.label == 'Total Value'
# And there's some actual data:
assert len(facts['data']) == 5
fact_0 = facts['data'][0]
assert 'total_value' in fact_0
# For dimensions, we can get all the distinct values:
members = cube.members('supplier', cut='year:2015', page_size=500)
assert len(members['data']) <= 500
assert members['total_member_count']
# And, finally, we can aggregate by specific dimensions:
aggregate = cube.aggregate(aggregates='total_value.sum',
drilldowns='supplier|authority'
cut='year:2015|authority.country:GB',
page_size=500)
# This translates to:
# Aggregate the procurement data by summing up the 'total_value'
# for each unique pair of values in the 'supplier' and 'authority'
# dimensions, and filter for only those entries where the 'year'
# dimensions key attribute is '2015' and the 'authority' dimensions
# 'country' attribute is 'GB'. Return the first 500 results.
assert aggregate['total_cell_count']
assert len(aggregate['cells']) <= 500
aggregate_0 = aggregate['cells'][0]
assert 'total_value.sum' in aggregate_0
# Note that these attribute names are made up for this example, they
# should be reflected from the model:
assert 'supplier.code' in aggregate_0
assert 'supplier.label' in aggregate_0
assert 'authority.code' in aggregate_0
assert 'authority.label' in aggregate_0
The HTTP API for babbage
is a simple Flask Blueprint used to expose a small set of calls that correspond to
the cube functions listed above. To include it into an existing Flask
application, you would need to create a CubeManager
and then
configure the API like this:
from flask import Flask
from sqlalchemy import create_engine
from babbage.manager import JSONCubeManager
from babbage.api import configure_api
app = Flask('demo')
engine =
models_directory = 'models/'
manager = JSONCubeManager(engine, models_directory)
blueprint = configure_api(app, manager)
app.register_blueprint(blueprint, url_prefix='/api/babbage')
app.run()
Of course, you can define your own CubeManager
, for example if
you wish to retrieve model metadata from a database.
When enabled, the API will expose a number of JSON(P) endpoints
relative to the given url_prefix
:
/
, returns the system status and version./cubes
, returns a list of the available cubes (name only)./cubes/<name>/model
, returns full metadata for a given cube (i.e. measures, dimensions, aggregates etc.)/cubes/<name>/facts
is used to return individual entries from the cube in a non-aggregated form. Supports filters (cut
), a set offields
to return and asort
(field_name:direction
), as well aspage
andpage_size
./cubes/<name>/members
is used to return the distinct set of values for a given dimension, e.g. all the suppliers mentioned in a procurement dataset. Supports filters (cut
), a and asort
(field_name:direction
), as well aspage
andpage_size
./cubes/<name>/aggregate
is the main endpoint for generating aggregate views of the data. Supports specifying theaggregates
to include, thedrilldowns
to aggregate by, a set of filters (cut
), a and asort
(field_name:direction
), as well aspage
andpage_size
.