Skip to content

rozap/spacesaving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spacesaving

Simple algorithm to estimate distinct elements in an unbounded stream using bounded space. The estimate is the upper bound on the element's actual count.

Docs on hex

Usage

Add it to you mix.exs deps

{:spacesaving, "~> 0.0.2"}

Init with 3 spaces, so we track 3 elements

import Spacesaving

state = init(3)

Push some elements

state = state
|> push(:foo) |> push(:foo) |> push(:foo) |> push(:foo)
|> push(:bar) |> push(:bar) |> push(:bar)
|> push(:baz) |> push(:baz)
|> push(:buzz)

Get the top k elements

top(state, 2) # This will be [foo: 4, bar: 3]
top(state, 3) # This will be [foo: 4, bar: 3, buzz: 3], so the inaccuracy starts to come into play when an element is kicked out, and the estimate is the upper bound

Merge two states

left  = init(4) |> push(:foo) |> push(:bar)
right = init(4) |> push(:foo) |> push(:baz)

merge(left, right)
|> top(3) # Would be [foo: 2, bar: 1, baz: 1]

References

Original Paper

About

stream count distinct element estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages