Simple algorithm to estimate distinct elements in an unbounded stream using bounded space. The estimate is the upper bound on the element's actual count.
Add it to you mix.exs
deps
{:spacesaving, "~> 0.0.2"}
Init with 3 spaces, so we track 3 elements
import Spacesaving
state = init(3)
Push some elements
state = state
|> push(:foo) |> push(:foo) |> push(:foo) |> push(:foo)
|> push(:bar) |> push(:bar) |> push(:bar)
|> push(:baz) |> push(:baz)
|> push(:buzz)
Get the top k elements
top(state, 2) # This will be [foo: 4, bar: 3]
top(state, 3) # This will be [foo: 4, bar: 3, buzz: 3], so the inaccuracy starts to come into play when an element is kicked out, and the estimate is the upper bound
Merge two states
left = init(4) |> push(:foo) |> push(:bar)
right = init(4) |> push(:foo) |> push(:baz)
merge(left, right)
|> top(3) # Would be [foo: 2, bar: 1, baz: 1]