Hailstorm is a distributed stream computation system that uses exactly once semantics.
Written by Thomas Dimson (@cosbynator) and Milind Ganjoo (@mganjoo).
The architecture of Hailstorm is based on Apache Storm (which is also the inspiration for the name).
The exactly-one semantics implemented in Hailstorm are based on a high-level description in an essay by @jasonjckn.
Hailstorm requires Apache Zookeeper, its C
bindings, and its Haskell bindings
hzk
to run.
On OSX, the zookeeper
package on homebrew contains the
binaries and C bindings for Zookeeper. You can install it as follows:
brew install --c zookeeper
On Ubuntu, we recommend following the official instructions to obtain and set up the Zookeeper binaries. To install the C bindings:
sudo apt-get install libzookeeper-mt-dev
Finally, to build and install hzk
on Mac OS X, run the following command in
your cabal sandbox:
cabal install --extra-include-dirs=/usr/local/include/zookeeper hzk
The above command is required on Mac OS X because of the non-standard include
directory location. On Ubuntu, cabal install hzk
should work.
Hailstorm requires Apache Kafka to be installed and operating. See the official instructions for details.
Hailstorm uses
Haskakafka, the Haskell bindings
written by our very own @cosbynator. Haskakfka, in
turn, depends on librdkafka
(see the Haskakafka project page for installation
instructions).
Haskakafka itself is not available yet on Cabal, so install
cabalg
and
c2hs
into your sandbox:
cabal install c2hs
cabal install cabalg
.cabal-sandbox/bin/cabalg https://github.com/cosbynator/haskakafka.git
On OS X, you may get 'stdio.h'
errors, in which case you should try:
.cabal-sandbox/bin/cabalg https://github.com/cosbynator/haskakafka.git -- --with-gcc=gcc-4.8
And you are done!
First, start an instance of Zookeeper (zkServer start
on Mac or zkServer.sh start
on Ubuntu).
Before running Hailstorm for the first time, you will have to initialize your
topology using the zk_init
subcommand:
hailstorm zk_init
Finally, run a sample topology:
hailstorm -f data/test.txt run_sample
While it is running, you can extract debug metadata by executing
hailstorm zk_show