Building real systems means caring about quality control, robustness and correctness. With the right quality assurance mechanisms in place, well-written code can feel like a precision machine, with all functions performing their tasks exactly as specified. There is no sloppiness around the edges, and the final result can be code that is self-explanatory, obviously correct – the kind of code that inspires confidence.
In Haskell, we have several tools at our disposal for building such precise systems. The most obvious tool, and one built into the language itself, is the expressive type-system, which allows for complicated invariants to be enforced statically–making it impossible to write code violating chosen constraints. In addition, purity and polymorphism encourage a style of code that is modular, refactorable and testable. This is the kind of code that just doesn’t go wrong.
Testing plays a key role in keeping code on the straight-and-narrow path. The main testing mechanisms in Haskell are traditional unit testing (via the HUnit library), and its more powerful descendant: type-based “property” testing, with QuickCheck, an open source testing framework for Haskell. Property-based testing encourages a high level approach to testing in the form of abstract invariants functions should satisfy universally, with the actual test data generated for the programmer by the testing library. In this way code can be hammered with thousands of tests that would be infeasible to write by hand, often uncovering subtle corner cases that wouldn’t be found otherwise.
In this chapter we’ll look at how to use QuickCheck to establish invariants in code and then re-examine the pretty printer developed in previous chapters, testing it with QuickCheck. We’ll also see how to guide the testing process with GHC’s code coverage tool: HPC.
To get an overview of how property-based testing works, we’ll begin with a simple scenario: you’ve written a specialised sorting function and want to test its behaviour.
First, we import the QuickCheck library, and any other modules we need:
import Test.QuickCheck
import Data.List
And the function we want to test–a custom sort routine:
qsort :: Ord a => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort lhs ++ [x] ++ qsort rhs
where lhs = filter (< x) xs
rhs = filter (>= x) xs
This is the classic Haskell sort implementation: a study in functional programming elegance, if not efficiency (this isn’t an inplace sort). Now, we’d like to check that this function obeys the basic rules a good sort should follow. One useful invariant to start with, and one that comes up in a lot of purely functional code, is /idempotency/–applying a function twice has the same result as applying it only once. For our sort routine, a stable sort algorithm, this should certainly be true, or things have gone horribly wrong! This invariant can be encoded as a property simply:
prop_idempotent xs = qsort (qsort xs) == qsort xs
We’ll use the QuickCheck convention of prefixing test properties
with prop_
to distinguish them from normal code. This
idempotency property is written simply as a Haskell function
stating an equality that must hold for any input data that is
sorted. We can check this makes sense for a few simple cases by
hand:
ghci> prop_idempotent []
True
ghci> prop_idempotent [1,1,1,1]
True
ghci> prop_idempotent [1..100]
True
ghci> prop_idempotent [1,5,2,1,2,0,9]
True
Looking good. However, writing out the input data by hand is
tedious, and violates the moral code of the efficient functional
programmer: let the machine do the work! To automate this the
QuickCheck library comes with a set of data generators for all the
basic Haskell data types. QuickCheck uses the Arbitrary
type class to present a uniform interface to (pseudo-)random data
generation with the type system used to resolve which generator to
use. QuickCheck normally hides the data generation plumbing,
however we can also run the generators by hand to get a sense for
the distribution of data QuickCheck produces. For example, to
generate a random list of boolean values:
ghci> :m +Test.QuickCheck.Arbitrary
ghci> :m +Test.QuickCheck.Gen
ghci> :m +Test.QuickCheck.Random
ghci> unGen arbitrary (mkQCGen 2) 10 :: [Bool]
[False,True,True,True,False,True,False,False,True,False]
QuickCheck generates test data like this and passes it to the
property of our choosing, via the quickCheck
function. The type
of the property itself determines which data generator is used.
quickCheck
then checks that for all the test data produced, the
property is satisfied. Now, since our idempotency test is
polymorphic in the list element type, we need to pick a particular
type to generate test data for, which we write as a type
constraint on the property. To run the test, we just call
quickCheck
with our property function, set to the required data
type (otherwise the list element type will default to the
uninteresting ()
type):
ghci> :type quickCheck
quickCheck :: Testable prop => prop -> IO ()
ghci> quickCheck (prop_idempotent :: [Integer] -> Bool)
+++ OK, passed 100 tests.
For the 100 different lists generated, our property held–great!
When developing tests, it is often useful to see the actual data
generated for each test. To do this, we would replace quickCheck
with its sibling, verboseCheck
, to see (verbose) output for each
test. Now, let’s look at more sophisticated properties that our
function might satisfy.
Good libraries consist of a set of orthogonal primitives having sensible relationships to each other. We can use QuickCheck to specify the relationships between functions in our code, helping us find a good library interface by developing functions that are interrelated via useful properties. QuickCheck in this way acts as an API “lint” tool–it provides machine support for ensuring our library API makes sense.
The list sorting function should certainly have a number of
interesting properties that tie it to other list operations. For
example: the first element in a sorted list should always be the
smallest element of the input list. We might be tempted to specify
this intuition in Haskell, using the List
library’s minimum
function:
prop_minimum xs = head (qsort xs) == minimum xs
Testing this, though, reveals an error:
ghci> quickCheck (prop_minimum :: [Integer] -> Bool)
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test): []
The property failed when sorting an empty list–for which head
and minimum
aren’t defined, as we can see from their definition:
head :: [a] -> a
head (x:_) = x
head [] = error "Prelude.head: empty list"
minimum :: (Ord a) => [a] -> a
minimum [] = error "Prelude.minimum: empty list"
minimum xs = foldl1 min xs
So this property will only hold for non-empty lists. QuickCheck,
thankfully, comes with a full property writing embedded language,
so we can specify more precisely our invariants, filtering out
values we don’t want to consider. For the empty list case, we
really want to say: if the list is non-empty, then the first
element of the sorted result is the minimum. This is done by using
the (==>)
implication function, which filters out invalid data
before running the property:
prop_minimum' xs = not (null xs) ==> head (qsort xs) == minimum xs
The result is quite clean. By separating out the empty list case, we can now confirm the property does in fact hold:
ghci> quickCheck (prop_minimum' :: [Integer] -> Property)
+++ OK, passed 100 tests.
Note that we had to change the type of the property from being a
simple Bool
result to the more general Property
type (the
property itself is now a function that filters non-empty lists,
before testing them, rather than a simple boolean constant).
We can now complete the basic property set for the sort function
with some other invariants that it should satisfy: that the output
is ordered (each element should be smaller than, or equal to, its
successor); that the output is a permutation of the input (which
we achieve via the list difference function, (\\)
); that the
last sorted element should be the largest element; and if we find
the smallest element of two different lists, that should be the
first element if we append and sort those lists. These properties
can be stated as:
prop_ordered xs = ordered (qsort xs)
where ordered [] = True
ordered [x] = True
ordered (x:y:xs) = x <= y && ordered (y:xs)
prop_permutation xs = permutation xs (qsort xs)
where permutation xs ys = null (xs \\ ys) && null (ys \\ xs)
prop_maximum xs =
not (null xs) ==>
last (qsort xs) == maximum xs
prop_append xs ys =
not (null xs) ==>
not (null ys) ==>
head (qsort (xs ++ ys)) == min (minimum xs) (minimum ys)
Another technique for gaining confidence in some code is to test it against a model implementation. We can tie our implementation of list sort to the reference sort function in the standard list library, and, if they behave the same, we gain confidence that our sort does the right thing.
prop_sort_model xs = sort xs == qsort xs
This kind of model-based testing is extremely powerful. Often developers will have a reference implementation or prototype that, while inefficient, is correct. This can then be kept around and used to ensure optimised production code conforms to the reference. By building a large suite of these model-based tests, and running them regularly (on every commit, for example), we can cheaply ensure the precision of our code. Large Haskell projects often come bundled with property suites comparable in size to the project itself, with thousands of invariants tested on every change, keeping the code tied to the specification, and ensuring it behaves as required.
Testing individual functions for their natural properties is one of the basic building blocks that guides development of large systems in Haskell. We’ll look now at a more complicated scenario: taking the pretty printing library developed in earlier chapters, and building a test suite for it.
Recall that the pretty printer is built around the Doc
, an
algebraic data type that represents well-formed documents:
data Doc = Empty
| Char Char
| Text String
| Line
| Concat Doc Doc
| Union Doc Doc
deriving (Show,Eq)
The library itself is implemented as a set of functions that build and transform values of this document type, before finally rendering the finished document to a string.
QuickCheck encourages an approach to testing where the developer
specifies invariants that should hold for any data we can throw at
the code. To test the pretty printing library, then, we’ll need a
source of input data. To do this, we take advantage of the small
combinator suite for building random data that QuickCheck provides
via the Arbitrary
class. The class provides a function,
arbitrary
, to generate data of each type, and with this we can
define our data generator for our custom data types.[fn:1]
class Arbitrary a where
arbitrary :: Gen a
One thing to notice is that the generators run in a Gen
environment, indicated by the type. This is a simple state-passing
monad that is used to hide the random number generator state that
is threaded through the code. We’ll look thoroughly at monads in
later chapters, but for now it suffices to know that, as Gen
is
defined as a monad, we can use do
syntax to write new generators
that access the implicit random number source. To actually write
generators for our custom type we use any of a set of functions
defined in the library for introducing new random values and
gluing them together to build up data structures of the type we’re
interested in. The types of the key functions are:
elements :: [a] -> Gen a
choose :: Random a => (a, a) -> Gen a
oneof :: [Gen a] -> Gen a
The function elements
, for example, takes a list of values, and
returns a generator of random values from that list. choose
and
oneof
we’ll use later. With this, we can start writing
generators for simple data types. For example, if we define a new
data type for ternary logic:
import Test.QuickCheck
data Ternary
= Yes
| No
| Unknown
deriving (Eq,Show)
we can write an Arbitrary
instance for the Ternary
type by
defining a function that picks elements from a list of the
possible values of Ternary
type:
instance Arbitrary Ternary where
arbitrary = elements [Yes, No, Unknown]
Another approach to data generation is to generate values for one
of the basic Haskell types and then translate those values into
the type you’re actually interested in. We could have written the
Ternary
instance by generating integer values from 0 to 2
instead, using choose
, and then mapping them onto the ternary
values:
instance Arbitrary Ternary where
arbitrary = do
n <- choose (0, 2) :: Gen Int
return $ case n of
0 -> Yes
1 -> No
_ -> Unknown
For simple sum types, this approach works nicely, as the integers map nicely onto the constructors of the data type. For product types (such as structures and tuples), we need to instead generate each component of the product separately (and recursively for nested types), and then combine the components. For example, to generate random pairs of random values:
instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where
arbitrary = do
x <- arbitrary
y <- arbitrary
return (x, y)
So let’s now write a generator for all the different variants of
the Doc
type. We’ll start by breaking the problem down, first
generating random constructors for each type, then, depending on
the result, the components of each field. We choose a random
integer to represent which document variant to generate, and then
dispatch based on the result. To generate concat or union document
nodes, we just recurse on arbitrary
, letting type inference
determine which instance of Arbitrary
we mean:
module QC where
import Prettify
import Data.List
import Test.QuickCheck
instance Arbitrary Doc where
arbitrary = do
n <- choose (1,6) :: Gen Int
case n of
1 -> return Empty
2 -> do x <- arbitrary
return (Char x)
3 -> do x <- arbitrary
return (Text x)
4 -> return Line
5 -> do x <- arbitrary
y <- arbitrary
return (Concat x y)
6 -> do x <- arbitrary
y <- arbitrary
return (Union x y)
That was fairly straightforward, and we can clean it up some more
by using the oneof
function, whose type we saw earlier, to pick
between different generators in a list (we can also use the
monadic combinator, liftM
to avoid naming intermediate results
from each generator):
-- import Control.Monad
instance Arbitrary Doc where
arbitrary =
oneof [ return Empty
, liftM Char arbitrary
, liftM Text arbitrary
, return Line
, liftM2 Concat arbitrary arbitrary
, liftM2 Union arbitrary arbitrary ]
The latter is more concise, just picking between a list of generators, but they describe the same data either way. We can check that the output makes sense, by generating a list of random documents (seeding the pseudo-random generator with an initial seed of 2):
ghci> unGen arbitrary (mkQCGen 2) 10 :: [Doc]
[Empty,Union (Char 't') Line,Line,Union Line Empty,Concat (Char '\9930')
(Text "\DEL"),Line,Text "\263060\ACKJ@e",Empty,Char '\367759',Concat Line
(Text ")\385036N\332758D(")]
Looking at the output we see a good mix of simple, base cases, and some more complicated nested documents. We’ll be generating hundreds of these each test run, so that should do a pretty good job. We can now write some generic properties for our document functions.
Two of the basic functions on documents are the null document
constant (a nullary function), empty
, and the append function.
empty :: Doc
(<>) :: Doc -> Doc -> Doc
Together, these should have a nice property: appending or prepending the empty list onto a second list, should leave the second list unchanged. We can state this invariant as a property:
prop_empty_id x = empty <> x == x && x <> empty == x
Confirming that this is indeed true, we’re now underway with our testing:
ghci> quickCheck prop_empty_id
+++ OK, passed 100 tests.
To look at what actual test documents were generated (by replacing
quickCheck
with verboseCheck
). A good mixture of both simple
and complicated cases are being generated. We can refine the data
generation further, with constraints on the proportion of
generated data, if desirable.
Other functions in the API are also simple enough to have their behaviour fully described via properties. By doing so we can maintain an external, checkable description of the function’s behaviour, so later changes won’t break these basic invariants.
prop_char c = char c == Char c
prop_text s = text s == if null s then Empty else Text s
prop_line = line == Line
prop_double d = double d == text (show d)
These properties are enough to fully test the structure returned by the basic document operators. To test the rest of the library will require more work.
Higher order functions are the basic glue of reusable programming,
and our pretty printer library is no exception–a custom fold
function is used internally to implement both document
concatenation and interleaving separators between document chunks.
The fold
defined for documents takes a list of document pieces,
and glues them all together with a supplied combining function:
fold :: (Doc -> Doc -> Doc) -> [Doc] -> Doc
fold f = foldr f empty
We can write tests in isolation for specific instances of fold easily. Horizontal concatenation of documents, for example, is easy to specify by writing a reference implementation on lists:
prop_hcat xs = hcat xs == glue xs
where
glue [] = empty
glue (d:ds) = d <> glue ds
It is a similar story for punctuate
, where we can model
inserting punctuation with list interspersion (from Data.List
,
intersperse
is a function that takes an element and interleaves
it between other elements of a list):
prop_punctuate s xs = punctuate s xs == intersperse s xs
While this looks fine, running it reveals a flaw in our reasoning:
ghci> quickCheck prop_punctuate
*** Failed! Falsifiable (after 4 tests):
Empty
[Char '\DC3',Empty]
The pretty printing library optimises away redundant empty
documents, something the model implementation doesn’t, so we’ll
need to augment our model to match reality. First, we can
intersperse the punctuation text throughout the document list,
then a little loop to clean up the Empty
documents scattered
through, like so:
prop_punctuate' s xs = punctuate s xs == combine (intersperse s xs)
where
combine [] = []
combine [x] = [x]
combine (x:Empty:ys) = x : combine ys
combine (Empty:y:ys) = y : combine ys
combine (x:y:ys) = x `Concat` y : combine ys
Running this in GHCi, we can confirm the result. It is reassuring to have the test framework spot the flaws in our reasoning about the code–exactly what we’re looking for:
ghci> quickCheck prop_punctuate'
+++ OK, passed 100 tests.
We can put all these tests together in a single file, and run them simply by using one of QuickCheck’s driver functions. Several exist, including elaborate parallel ones. The basic batch driver is often good enough, however. All we need do is set up some default test parameters, and then list the functions we want to test:
import QC
import Test.QuickCheck
import Control.Monad (forM_)
options = stdArgs { maxSuccess = 200, maxSize = 200}
type Run = Args -> IO ()
run :: Testable prop => prop -> Run
run = flip quickCheckWith
runTests :: String -> Args -> [Run] -> IO ()
runTests name opts tests =
putStrLn ("Running " ++ name ++ " tests:") >>
forM_ tests (\ rn -> rn opts)
main = do
runTests "simple" options
[ run prop_empty_id
, run prop_char
, run prop_text
, run prop_line
, run prop_double
]
runTests "complex" options
[ run prop_hcat
, run prop_punctuate
]
We’ve structured the code here as a separate, standalone test script, with instances and properties in their own file, separate to the library source. This is typical for library projects, where the tests are kept apart from the library itself, and import the library via the module system. The test script can then be compiled and executed:
ghci> :l Run.hs
[1 of 4] Compiling SimpleJSON ( SimpleJSON.hs, interpreted )
[2 of 4] Compiling Prettify ( Prettify.hs, interpreted )
[3 of 4] Compiling QC ( QC.hs, interpreted )
[4 of 4] Compiling Main ( Run.hs, interpreted )
Ok, four modules loaded.
*Main> main
Running simple tests:
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
+++ OK, passed 1 tests.
+++ OK, passed 200 tests.
Running complex tests:
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
A total of 1201 individual tests were created, which is comforting. We can increase the depth easily enough, but to find out exactly how well the code is being tested we should turn to the built in code coverage tool, HPC, which can state precisely what is going on.
HPC (Haskell Program Coverage) is an extension to the compiler to observe what parts of the code were actually executed during a given program run. This is useful in the context of testing, as it lets us observe precisely which functions, branches and expressions were evaluated. The result is precise knowledge about the percent of code tested, that’s easy to obtain. HPC comes with a simple utility to generate useful graphs of program coverage, making it easy to zoom in on weak spots in the test suite.
To obtain test coverage data, all we need to do is add the -fhpc
flag to the command line, when compiling the tests:
$ ghc -fhpc Run.hs --make
Then run the tests as normal;
$ ./Run
Running simple tests:
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
+++ OK, passed 1 tests.
+++ OK, passed 200 tests.
Running complex tests:
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
During the test run the trace of the program is written to .tix
and .mix files in the current directory. Afterwards, these files
are used by the command line tool, hpc
, to display various
statistics about what happened. The basic interface is textual. To
begin, we can get a summary of the code tested during the run
using the report
flag to hpc
. We’ll exclude the test programs
themselves, (using the --exclude
flag), so as to concentrate
only on code in the pretty printer library. Entering the following
into the console:
$ hpc report Run --exclude=Main --exclude=QC
17% expressions used (30/176)
0% boolean coverage (0/3)
0% guards (0/3), 3 unevaluated
100% 'if' conditions (0/0)
100% qualifiers (0/0)
17% alternatives used (8/46)
0% local declarations used (0/4)
30% top-level declarations used (10/33)
we see that, on the last line, 30% of top level definitions were
evaluated during the test run. Not too bad for a first attempt. As
we test more and more functions from the library, this figure will
rise. The textual version is useful for a quick summary, but to
really see what’s going on it is best to look at the marked up
output. To generate this, use the markup
flag instead:
$ hpc markup Run --exclude=Main --exclude=QC
This will generate one html file for each Haskell source file, and
some index files. Loading the file hpc_index.html
into a
browser, we can see some pretty graphs of the code coverage:
Not too bad. Clicking through to the Prettify
module itself, we
see the actual source of the program, marked up in bold yellow for
code that wasn’t tested, and code that was executed simply bold.
It is important to remove the old .tix file after you make modifications or an error will occur as HPC tries to combine the statistics from separate runs:
$ ghc -fhpc Run.hs --make -no-recomp
$ ./Run
in module 'Prettify'
Hpc failure: module mismatch with .tix/.mix file hash number
(perhaps remove Run.tix file?)
$ rm *.tix
$ ./Run
Running simple tests:
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
+++ OK, passed 1 tests.
+++ OK, passed 200 tests.
Running complex tests:
+++ OK, passed 200 tests.
+++ OK, passed 200 tests.
Another two hundred tests were added to the suite, and our coverage statistics improves to 52 percent of the code base:
HPC ensures that we’re honest in our testing, as anything less than 100% coverage will be pointed out in glaring color. In particular, it ensures the programmer has to think about error cases, and complicated branches with obscure conditions, all forms of code smell. When combined with a saturating test generation system, like QuickCheck’s, testing becomes a rewarding activity, and a core part of Haskell development.
[fn:1] The class also defines a method, coarbitrary
, which given
a value of some type, yields a function for new generators. We can
disregard for now, as it is only needed for generating random
values of function type. One result of disregarding coarbitrary
is that GHC will warn about it not being defined, however, it is
safe to ignore these warnings.