Skip to content

Commit

Permalink
Remove Paths type and Iterator interface (#21)
Browse files Browse the repository at this point in the history
Breaking changes:
* Removed type Paths
* Remove interface Iterator
* Rename type bytesTreeIter to Iterator
* Rename type BytesIterator to Stepper

Rename/depricate:
* Rename type Byte to Tree

Other:
* Reorganize benchmarks
* Update README and examples
* Download benchmark data from Internet
  • Loading branch information
gammazero authored Aug 4, 2022
1 parent 82e763b commit 0a3cb48
Show file tree
Hide file tree
Showing 17 changed files with 1,691 additions and 2,977 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,6 @@ _testmain.go
*.test
*.prof
*.out

web2
web2a
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,22 @@
[![codecov](https://codecov.io/gh/gammazero/radixtree/branch/master/graph/badge.svg)](https://codecov.io/gh/gammazero/radixtree)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

Package `radixtree` implements multiple forms of an Adaptive [Radix Tree](https://en.wikipedia.org/wiki/Radix_tree), aka compressed [trie](https://en.wikipedia.org/wiki/Trie) or compact prefix tree. This data structure is useful to quickly lookup data by key, find find data whose keys have a common prefix, or find data whose keys are a prefix (i.e. found along the way) of a search key.
Package `radixtree` implements an Adaptive [Radix Tree](https://en.wikipedia.org/wiki/Radix_tree), aka compressed [trie](https://en.wikipedia.org/wiki/Trie) or prefix tree. It is adaptive in the sense that nodes are not constant size, having as few or many children as needed to branch to all subtrees.

The implementations are optimized for Get performance and allocate 0 bytes of heap memory for any read operation (Get, Walk, WalkPath, etc.); therefore no garbage to collect. Once a radix tree is built, it can be repeatedly searched quickly. Concurrent searches are safe since these do not modify the data structure. Access is not synchronized (not concurrent safe with writes), allowing the caller to synchronize, if needed, in whatever manner works best for the application.
This package implements a radix-256 tree where each key symbol (radix) is a byte, allowing up to 256 possible branches to traverse to the next node.

The implementation is optimized for Get performance and allocates 0 bytes of heap memory per Get; therefore no garbage to collect. Once the radix tree is built, it can be repeatedly searched quickly. Concurrent searches are safe since these do not modify the radixtree. Access is not synchronized (not concurrent safe with writes), allowing the caller to synchronize, if needed, in whatever manner works best for the application.

Package `radixtree` implements an Adaptive [Radix Tree](https://en.wikipedia.org/wiki/Radix_tree), aka compressed [trie](https://en.wikipedia.org/wiki/Trie) or compact prefix tree. This data structure is useful to quickly lookup data by key, find values whose keys have a common prefix, or find values whose keys are a prefix (i.e. found along the way) of a search key.

The implementation is optimized for Get performance and allocate 0 bytes of heap memory for any read operation (Get, Walk, WalkPath, etc.); therefore no garbage to collect. Once a radix tree is built, it can be repeatedly searched quickly. Concurrent searches are safe since these do not modify the data structure. Access is not synchronized (not concurrent safe with writes), allowing the caller to synchronize, if needed, in whatever manner works best for the application.

This radix tree offers the following features:

- Multiple types of radix tree: Bytes, Paths
- Efficient: Operations for all types of radix tree are O(k). Zero memory allocation for all read operations.
- Compact: When values are stored using keys that have a common prefix, the common part of the key is only stored once. Consider this when keys are similar to a timestamp, OID, filepath, geohash, network address, etc. Nodes that do not branch or contain values are compressed out of the tree.
- Adaptive: This radix tree is adaptive in the sense that nodes are not constant size, having only as many children that are needed, from zero to the maximum possible number of different key segments.
- Iterators: An iterator for each type of radix tree allows a tree to be traversed one key segment at a time. This is useful for incremental lookup. Iterators can be copied in order to branch a search, and iterate the copies concurrently.
- Efficient: Operations are O(k). Zero memory allocation for all read operations.
- Compact: When values are stored using keys that have a common prefix, the common part of the key is only stored once. Consider this when keys are similar to a timestamp, OID, filepath, geohash, network address, etc. Only the minimum number of nodes are kept to branch at the points where keys differ.
- Adaptive: This radix tree is adaptive in the sense that nodes are not constant size, having only as many children that are needed. This is radix-256 tree where each key symbol (radix) is a byte, allowing from zero up to 256 branches to traverse to the next node.
- Iterator: An `Iterator` returns each key-value pair in the tree. A `Stepper` type of iterator traverses the tree one specified byte at a time. It is useful for incremental lookup, and can be copied in order to branch a search and iterate the copies concurrently.
- Able to store nil values: Get differentiates between nil value and missing value.
- Ordered iteration: Walking and iterating the tree is done in lexical order, making the output deterministic.

Expand Down Expand Up @@ -86,4 +91,3 @@ func main() {
## License

[MIT License](LICENSE)

234 changes: 91 additions & 143 deletions bench_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,88 +2,74 @@ package radixtree

import (
"bufio"
"fmt"
"io"
"net/http"
"os"
"testing"
)

const (
wordsPath = "/usr/share/dict/words"
web2aPath = "/usr/share/dict/web2a"
// web2: Webster's Second International Dictionary, all 234,936 words worth.
web2URL = "https://raw.githubusercontent.com/openbsd/src/master/share/dict/web2"
web2Path = "web2"
// web2a: hyphenated terms as well as assorted noun and adverbial
// phrasesfrom Webster's Second International Dictionary.
web2aURL = "https://raw.githubusercontent.com/openbsd/src/master/share/dict/web2a"
web2aPath = "web2a"
)

//
// Benchmarks
//
func BenchmarkWordsBytesGet(b *testing.B) {
benchmarkBytesGet(wordsPath, b)
}

func BenchmarkWordsBytesPut(b *testing.B) {
benchmarkBytesPut(wordsPath, b)
}

func BenchmarkWordsBytesWalk(b *testing.B) {
benchmarkBytesWalk(wordsPath, b)
}

func BenchmarkWordsBytesWalkPath(b *testing.B) {
benchmarkBytesWalkPath(wordsPath, b)
}
func BenchmarkGet(b *testing.B) {
err := getWords()
if err != nil {
b.Skip(err.Error())
}

// ----- Web2a -----
func BenchmarkWeb2aBytesGet(b *testing.B) {
benchmarkBytesGet(web2aPath, b)
}
b.Run("Words", func(b *testing.B) {
benchmarkGet(b, web2Path)
})

func BenchmarkWeb2aBytesPut(b *testing.B) {
benchmarkBytesPut(web2aPath, b)
b.Run("Web2a", func(b *testing.B) {
benchmarkGet(b, web2aPath)
})
}

func BenchmarkWeb2aBytesWalk(b *testing.B) {
benchmarkBytesWalk(web2aPath, b)
}
func BenchmarkPut(b *testing.B) {
b.Run("Words", func(b *testing.B) {
benchmarkPut(b, web2Path)
})

func BenchmarkWeb2aBytesWalkPath(b *testing.B) {
benchmarkBytesWalkPath(web2aPath, b)
b.Run("Web2a", func(b *testing.B) {
benchmarkPut(b, web2aPath)
})
}

func BenchmarkWeb2aPathsPut(b *testing.B) {
benchmarkPathsPut(web2aPath, b)
}
func BenchmarkWalk(b *testing.B) {
b.Run("Words", func(b *testing.B) {
benchmarkWalk(b, web2Path)
})

func BenchmarkWeb2aPathsGet(b *testing.B) {
benchmarkPathsGet(web2aPath, b)
b.Run("Web2a", func(b *testing.B) {
benchmarkWalk(b, web2aPath)
})
}

func BenchmarkWeb2aPathsWalk(b *testing.B) {
benchmarkPathsWalk(web2aPath, b)
}
func BenchmarkWalkPath(b *testing.B) {
b.Run("Words", func(b *testing.B) {
benchmarkWalkPath(b, web2Path)
})

func BenchmarkWeb2aPathsWalkPath(b *testing.B) {
benchmarkPathsWalkPath(web2aPath, b)
b.Run("Web2a", func(b *testing.B) {
benchmarkWalkPath(b, web2aPath)
})
}

func benchmarkBytesPut(filePath string, b *testing.B) {
func benchmarkGet(b *testing.B, filePath string) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
b.ResetTimer()
b.ReportAllocs()
for n := 0; n < b.N; n++ {
tree := new(Bytes)
for _, w := range words {
tree.Put(w, w)
}
}
}

func benchmarkBytesGet(filePath string, b *testing.B) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
tree := new(Bytes)
tree := new(Tree)
for _, w := range words {
tree.Put(w, w)
}
Expand All @@ -98,95 +84,27 @@ func benchmarkBytesGet(filePath string, b *testing.B) {
}
}

func benchmarkBytesWalk(filePath string, b *testing.B) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
tree := new(Bytes)
for _, w := range words {
tree.Put(w, w)
}
b.ResetTimer()
b.ReportAllocs()
var count int
for n := 0; n < b.N; n++ {
count = 0
tree.Walk("", func(k string, value interface{}) bool {
count++
return false
})
}
if count != len(words) {
panic("wrong count")
}
}

func benchmarkBytesWalkPath(filePath string, b *testing.B) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
tree := new(Bytes)
for _, w := range words {
tree.Put(w, w)
}
b.ResetTimer()
b.ReportAllocs()
var count int
for n := 0; n < b.N; n++ {
count = 0
for _, w := range words {
tree.WalkPath(w, func(key string, value interface{}) bool {
count++
return false
})
}
}
if count <= len(words) {
panic("wrong count")
}
}

func benchmarkPathsPut(filePath string, b *testing.B) {
func benchmarkPut(b *testing.B, filePath string) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
b.ResetTimer()
b.ReportAllocs()
for n := 0; n < b.N; n++ {
tree := new(Paths)
tree := new(Bytes)
for _, w := range words {
tree.Put(w, w)
}
}
}

func benchmarkPathsGet(filePath string, b *testing.B) {
func benchmarkWalk(b *testing.B, filePath string) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
tree := new(Paths)
for _, w := range words {
tree.Put(w, w)
}
b.ResetTimer()
b.ReportAllocs()
for n := 0; n < b.N; n++ {
for _, w := range words {
tree.Get(w)
}
}
}

func benchmarkPathsWalk(filePath string, b *testing.B) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
tree := new(Paths)
tree := new(Tree)
for _, w := range words {
tree.Put(w, w)
}
Expand All @@ -199,34 +117,33 @@ func benchmarkPathsWalk(filePath string, b *testing.B) {
count++
return false
})
if count != len(words) {
panic("wrong count")
}
}
if count != len(words) {
b.Fatalf("Walk wrong count, expected %d got %d", len(words), count)
}
}

func benchmarkPathsWalkPath(filePath string, b *testing.B) {
func benchmarkWalkPath(b *testing.B, filePath string) {
words, err := loadWords(filePath)
if err != nil {
b.Skip(err.Error())
}
tree := new(Paths)
tree := new(Tree)
for _, w := range words {
tree.Put(w, w)
}
b.ResetTimer()
b.ReportAllocs()
var count int
for n := 0; n < b.N; n++ {
count = 0
found := false
for _, w := range words {
tree.WalkPath(w, func(key string, value interface{}) bool {
count++
found = true
return false
})
}
if count < len(words) {
panic("wrong count")
if !found {
b.Fatal("Walk did not find word")
}
}
}
Expand All @@ -239,13 +156,11 @@ func loadWords(wordsFile string) ([]string, error) {
defer f.Close()

scanner := bufio.NewScanner(f)
var word string
var words []string

// Scan through line-dilimited words.
for scanner.Scan() {
word = scanner.Text()
words = append(words, word)
words = append(words, scanner.Text())
}

if err := scanner.Err(); err != nil {
Expand All @@ -254,3 +169,36 @@ func loadWords(wordsFile string) ([]string, error) {

return words, nil
}

func getWords() error {
err := downloadFile(web2URL, web2Path)
if err != nil {
return err
}
return downloadFile(web2aURL, web2aPath)
}

func downloadFile(fileURL, filePath string) error {
_, err := os.Stat(filePath)
if err == nil {
return nil
}
rsp, err := http.Get(fileURL)
if err != nil {
return err
}
defer rsp.Body.Close()

if rsp.StatusCode != 200 {
return fmt.Errorf("error response getting file: %d", rsp.StatusCode)
}

file, err := os.Create(filePath)
if err != nil {
return err
}
defer file.Close()

_, err = io.Copy(file, rsp.Body)
return err
}
Loading

0 comments on commit 0a3cb48

Please sign in to comment.