HyperLogLog in golang. Docs.
A go implementation of HypeLogLog data structure with a twist. See HyperLogLog in Practice paper by Stefan Heule, Marc Nunkesser, Alex Hall.
There is no need to serialize/deserialize hll. Everything is stored in a byte slice, which can be memory mapped, passed around over the network as is etc.
- sparse representation. this implementation does exact counting for small sets.
- fixed memory usage (even for empty HLL). HLL of a given precision P uses fixed (8 + 3*2^(P-2), 8 byte header + 6 bits per register) size in bytes.
- thresholds are tuned. different from Sub-Algorithm Threshold.
I wanted an HLL implementation that is
- simple
- reasonably fast
- (almost) non-allocating
- exact when number of unique elements is small
- memory-mapped file friendly
- well tested (90+% coverage)
Get go-hll:
go get github.com/sasha-s/go-hll
Use it:
s, err := SizeByP(16)
if err != nil {
log.Panicln(err)
}
h := make(HLL, s)
...
for _, x := range []string{"alpha", "beta"} {
h.Add(siphash.Hash(2, 57, []byte(x)))
}
log.Println(h.EstimateCardinality())
Use good hash (otherwise accuracy would be poor). Some options:
Benchmark results on my MacBook Pro (Mid 2014).
Add-8 9.68ns ± 1%
Estimate-8 27.3µs ± 1%
Merge-8 38.0µs ± 1%
AddDense-8 6.73ns ± 3%
MergeDense-8 37.9µs ± 1%
EstimateDense-8 22.9µs ± 1%
Sort-8 108µs ± 1%
AddSparse-8 10.2ns ± 3%
Merge/Estimate etc. are done for P=14.