Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeekLowerBound not working correctly #37

Closed
tonistiigi opened this issue Jun 16, 2021 · 9 comments · Fixed by #39
Closed

SeekLowerBound not working correctly #37

tonistiigi opened this issue Jun 16, 2021 · 9 comments · Fixed by #39

Comments

@tonistiigi
Copy link

I was trying to replace our custom Seek behavior in my fork with SeekLowerBound described in here #29 (comment) but hit some issues.

First thing I discovered was that an empty key (root) confuses the seeker. Tried to fix this in 30c5337

But then I discovered some more cases that behave weirdly. Added tests for them in 5bed3aa

    --- FAIL: TestIterateLowerBound/case015 (0.00s)
        iradix_test.go:1712: keys=[ aaa bbb] search=aa want=[aaa bbb]
        iradix_test.go:1730: mis-match: key=aa
              got=[]
              want=[aaa bbb]
    --- FAIL: TestIterateLowerBound/case016 (0.00s)
        iradix_test.go:1712: keys=[ aaa bbb] search=aaab want=[bbb]
        iradix_test.go:1730: mis-match: key=aaab
              got=[]
              want=[bbb]
    --- PASS: TestIterateLowerBound/case017 (0.00s)
        iradix_test.go:1712: keys=[ aaa bbb] search= want=[ aaa bbb]
    --- FAIL: TestIterateLowerBound/case018 (0.00s)
        iradix_test.go:1712: keys=[a aa bbb] search=aa want=[aa bbb]
        iradix_test.go:1730: mis-match: key=aa
              got=[bbb]
              want=[aa bbb]
    --- FAIL: TestIterateLowerBound/case019 (0.00s)
        iradix_test.go:1712: keys=[a aaa bbb] search=aa want=[aaa bbb]
        iradix_test.go:1730: mis-match: key=aa
              got=[bbb]
              want=[aaa bbb]

@banks @thaJeztah

@thaJeztah
Copy link
Contributor

@tonistiigi Thanks! Perhaps worth opening a PR for the first one, and a draft PR for the second (to discuss a fix)?

@banks
Copy link
Member

banks commented Jun 16, 2021

@tonistiigi thanks for sharing this!

I think both of these are due to a quirk of the design of iradix that has for some reason never been clearly documented.

It's described here:

// This specifies a property where each call adds a new random key to the radix
// tree (with a null byte appended since our tree doesn't support one key
// being a prefix of another and treats null bytes specially).

Basically keys can never be an exact prefix of each other. Many iradix implementations have the property but some hide it by automatically appending a null byte to each key. C-based implementations have that property naturally due to the convention of null-terminated strings. I don't know why we didn't document it or why we decided to go with that surprising API, but my guess would be that we choose to leave that to the user to avoid quietly reallocating and copying key strings on every operation - callers that arrange to have keys null terminated some other way can therefore choose where to pay that cost and potentially optimize the allocations in a way they choose. I can only guess why it was never documented - this lib was built as the lower level to hashicorp/go-memdb which does always append null bytes so perhaps it was just missed in the public API?

For some reason we've never gotten around to documenting that properly but we should!

That explains both of your cases I think - the empty string is invalid since it is a prefix of every other key in the tree which is not allowed, the other examples all contain keys where one is prefix of another: the first three have an empty string key, the last one has both a and aaa.

I think changing the design to allow for prefixes would involve major rewrite or major performance regression so the best approach is to actually document this major limitation properly!

I've updated the issue title to reflect that but I'll work on at least some basic callouts in the docs for this!

@banks
Copy link
Member

banks commented Jun 16, 2021

I'm actually not super sure about all that I just said above now. It possibly turns out that the library didn't have this limitation before SeekLowerBound was added. I need to remember the context there, will update the PR/this issue one way or the other!

@banks
Copy link
Member

banks commented Jun 16, 2021

Ha ha, what a learning journey. Thanks @tonistiigi for the report. My PR above should fix this and we in fact don't have a horrible gotcha that went undocumented as I'd previously thought 😌 .

@banks banks closed this as completed in #39 Jun 28, 2021
@banks
Copy link
Member

banks commented Jun 28, 2021

I just merged and created https://github.com/hashicorp/go-immutable-radix/releases/tag/v1.3.1 to include these fixes. Thanks again for the report!

@vivekpatani
Copy link

@banks @tonistiigi so how does one fetch all the keys from the tree? I've tried using seekLowerBound/largestPrefix with an empty key but I'm not getting the iterator to all keys that the tree consists of.

@banks
Copy link
Member

banks commented Jul 28, 2022

@vivekpatani without thinking about it too hard, that sounds like a bug. A lower bound of an empty byte slice should sort before any valid key an iterator pointing to the minimum key. If it doesn't, please open another issue.

That said, the canonical way to iterate the whole tree is not to use this method anyway and instead use tree.Walk().

@vivekpatani
Copy link

@banks Thanks for your response.

I did open another issue and closed (because I figured out how it works): #45

From that issue, quoting myself here:

func GetAObjs() (objs []*treeObj) {
	it := tree.Root().Iterator()
	for _, val, ok := it.Next(); ok; _, val, ok = it.Next() {
		if val != nil {
			objs = append(objs, val.(*treeObj))
		}
	}
	return objs
}

This is what I did to capture all the keys and store them in a list, is there a downside to doing this?^

@banks
Copy link
Member

banks commented Aug 2, 2022

That should work fine. It's equivalent to use Walk in the sense that an Iterator at the root node that has not had Seek* called will simply traverse all child nodes in order. It does use a different code path to Walk and I'm not 100% sure we have regression tests covering that behaviour, but I don't think that's a big deal.

The alternative implementation for the record would be:

func GetAObjs (objs []*treeObj) {
    tree.Root().Walk(func(k []byte, v interface{}) bool {
        // You shouldn't need a nil check unless you are explicitly storing nil values for keys
        objs = append(objs, v.(*treeObj))
        return false
    })
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants