Make BTree's internals safer and do more checks at compile time instead of run time #19782

gereeter · 2014-12-12T20:26:23Z

Before:

test btree::map::bench::find_rand_100                      ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_rand_10_000                   ... bench:        13 ns/iter (+/- 1)
test btree::map::bench::find_seq_100                       ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:        11 ns/iter (+/- 1)
test btree::map::bench::insert_rand_100                    ... bench:       106 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000                 ... bench:       326 ns/iter (+/- 8)
test btree::map::bench::insert_seq_100                     ... bench:       198 ns/iter (+/- 1)
test btree::map::bench::insert_seq_10_000                  ... bench:       312 ns/iter (+/- 3)
test btree::map::bench::iter_1000                          ... bench:     16563 ns/iter (+/- 173)
test btree::map::bench::iter_100000                        ... bench:   1686508 ns/iter (+/- 108592)
test btree::map::bench::iter_20                            ... bench:       365 ns/iter (+/- 25)

After:

test btree::map::bench::find_rand_100                      ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_rand_10_000                   ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_seq_100                       ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::insert_rand_100                    ... bench:        89 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000                 ... bench:       121 ns/iter (+/- 3)
test btree::map::bench::insert_seq_100                     ... bench:       149 ns/iter (+/- 0)
test btree::map::bench::insert_seq_10_000                  ... bench:       228 ns/iter (+/- 1)
test btree::map::bench::iter_1000                          ... bench:     16965 ns/iter (+/- 220)
test btree::map::bench::iter_100000                        ... bench:   1687836 ns/iter (+/- 18746)
test btree::map::bench::iter_20                            ... bench:       366 ns/iter (+/- 21)

gereeter · 2014-12-12T20:28:40Z

cc @gankro

Also, the docs on this change are coming, but not yet done.

Gankra · 2014-12-12T20:44:36Z

😲

CC @huonw @cgaebel

cgaebel · 2014-12-12T20:47:14Z

src/libcollections/btree/map.rs

+        top: node::Handle<*mut Node<K, V>, KV, LeafOrInternal>,
+    }
+
+    struct LeafifiedSearchStack<'a, K:'a, V:'a> {


This struct probably needs a comment, and a better name. Does LeafSearchStack still describe what this does?

Yes, though LeafOccupiedSearchStack would be more accurate. However, maybe I should just make a single generic FullSearchStack with two extra parameters for the type of handle at the top.

gereeter · 2014-12-12T23:14:52Z

Fixed @cgaebel's issues and added docs.

cgaebel · 2014-12-12T23:19:50Z

-32 net LoC AND it's faster? Welp, I'm sold.

cgaebel · 2014-12-12T23:21:04Z

src/libcollections/btree/node.rs

@@ -428,8 +428,10 @@ impl<K: Clone, V: Clone> Clone for Node<K, V> {
    }
 }

-/// A reference to a key/value pair in the middle of a `Node`. Methods are provided for removing
-/// the pair and accessing the pair and the adjacent edges.
+/// A reference to a something in the middle of a `Node`. There are two `Type`s of `Handle`s,


A reference to a something?

cgaebel · 2014-12-12T23:30:25Z

Random comment on the benchmarks:

It's really weird that find_rand_10_000 takes 12 ns, but iter_100_000 takes 17 ns per element. Does that imply that we could make iteration faster by just calling find in a loop? That seems a little crazy to me.

And since iter_1000 would be iterating through ~ 24 KB, and iter_100_000 would be iterating through 2.4 MB, I'd expect a bigger timing difference since we're stepping through two cache boundaries. Since we aren't, this leads to be believe iteration is speed limited by something other than cache, and is therefore a good candidate for optimization. Alternatively, Intel could just be prediction wizards and there's just no cache misses.

Gankra · 2014-12-12T23:37:54Z

src/libcollections/btree/map.rs

+            PartialSearchStack {
+                map: self.map,
+                stack: self.stack,
+                next: edge.edge_mut() as *mut _,


Are these by-value semantics necessary/useful anymore?

Yes, simply because Pusher is distinct from PartialSearchStack.

gereeter · 2014-12-12T23:51:27Z

On that subject, why are the sequential runs slower than the random runs?

cgaebel · 2014-12-12T23:56:11Z

Maybe it's an artifact of btree splitting that's degrading into worst case performance? Someone should probably break out some pencil and paper and figure out why that's happening.

Hmm maybe it's because all the node linear searches are always taking the longest case.

Gankra · 2014-12-12T23:56:38Z

Sequential insertion isn't necessarily something the (academic) BTree is good at. You fill up a node, split it, then never touch the left child again. This creates a big pool of half-empty nodes, and more allocations. Whereas random insertion should produce "fuller" nodes.

I am vaguely aware that Google's BTree implementation does some tricks to optimize for sequential insertions. A substantial refactor would also potentially consider this fancy kind of BTree that shares data between pairs of children. Each child must be ~2/3's full or something. Can't remember the name.

Gankra · 2014-12-12T23:58:54Z

The iterator code is definitely not optimized. iirc if you toss in some #[inline]s it picks up in the benchmark, but that's not necessarily accurate for real-world usage.

I have a suspicion that BTree might be a tragic example of external doubled-ended iteration being inferior to internal iteration.

Gankra · 2014-12-13T00:01:44Z

src/libcollections/btree/map.rs

-                let right_edge_raw = right_edge.as_raw();
-                match right_edge.edge_mut() {
-                    None => {
+                match top.leaf() {


You don't seem to be using the Result-ness at all. I'd much rather a custom enum for this if that's the case. Possibly rename the method to force which returns Leaf(..) or Internal(..)

My one worry with creating a new enum with Leaf and Internal variants is the sheer amount of type/value namespace madness that would imply. There already is the type Edge separate from the enum variant Edge, though, so I guess it would be all right.

gereeter · 2014-12-13T00:09:47Z

Actually, looking at the benchmarking code, there seem to be many more differences between insert_seq_n and insert_rand_n than I originally thought:

Before the benchmark, the map being mutated is created sequentially in insert_seq_n while it is created randomly in insert_rand_n.
The map being mutated has exactly n elements in the sequential case, but at most n elements in the random case.
The map being mutated is likely to become very dense in the random case, as the keys range from 0 to n and n of them are being inserted.
In the sequential case, the element to be inserted and removed moves along sequentially, while the element is chosen randomly in the random case. (This is what I thought would be the only difference).
In the sequential case, every search will be a hit until the element being searched for ends up larger than any key, at which point every search will be a miss. In contrast, the keys for the random case are chosen independently of creation, and so some will hit and some will miss.

Gankra · 2014-12-13T00:12:50Z

src/libcollections/btree/map.rs

                            }
-                        };
+                        }


Hmm... it occurs to me that the handle API loses some of the value of the SearchStack model. The index-based model meant the Stack could guarantee that it had a complete path, but with this design one can follow handles down a few nodes and then push that handle, violating the guarantee, right?

No - this was exactly the problem that I spent most of my time on the handle API working on that is solved by my use of IdRef and InvariantLifetime. Pusher::push requires the handle being pushed to have an IdRef based reference to the node with exactly the correct lifetime. However, if you tried to move down the node tree, then you would no longer have access to an IdRef, as those are only provided by PartialSearchStack::with - an attempt to push a node further down in the tree would fail statically with something like "expected IdRef<'id, Node<K, V>>, found &'a mut Node<K, V>". Even if you did manage to get another IdRef (e.g. through another call to PartialSearchStack::with), this would have the wrong lifetime, and the use of InvariantLifetime means that the lifetimes would have to match exactly.

I encourage you to try to break the API - it might become more clear how safety is ensured.

Ah okay, awesome! I got confused about how exactly the 'id manages to live on through the search call, but I went over the code again and see now how it works. The handle wraps exactly the node reference you give to search, and if you tried to actually "follow" the handle to the next child, the 'id wouldn't come with it.

Pretty slick!

Gankra · 2014-12-13T00:16:15Z

@gereeter I'm not sure it matters, since different benches aren't really intended to be compared to each other, but I suppose you could get the "all misses" behaviour by making sequential Vec and shuffling it.

cgaebel · 2014-12-13T00:50:17Z

I just hacked up an intrusive version of btree::iter:

test btree::map::bench::intrusive_iter_1000                ... bench:      2543 ns/iter (+/- 666)
test btree::map::bench::intrusive_iter_100000              ... bench:    343225 ns/iter (+/- 117773)
test btree::map::bench::intrusive_iter_20                  ... bench:        53 ns/iter (+/- 17)
test btree::map::bench::iter_1000                          ... bench:     15217 ns/iter (+/- 3867)
test btree::map::bench::iter_100000                        ... bench:   1432242 ns/iter (+/- 428693)
test btree::map::bench::iter_20                            ... bench:       351 ns/iter (+/- 69)

Use that information how you will. I think this is strong evidence for inclusion in the API. It composes poorly, but is 6 times faster at iterating over 20 elements, and 4 times faster at iterating over 100k elements.

(sorry, this PR just so happens to be a forum with everyone who cares about this sorta stuff. Don't mean to hijack the discussion!)

I'll prepare a PR for discussion.

cgaebel · 2014-12-13T01:16:12Z

#19796

Gankra · 2014-12-13T03:54:07Z

I'm currently skimming through their source for details, but here's at least a fragment of the sequential insertion optimization I was talking about before: https://code.google.com/p/cpp-btree/source/browse/btree.h#1552

cgaebel · 2014-12-13T04:02:15Z

They sure use a lot of unsafe code.

That optimization doesn't look too hard to add. We should do that.

gereeter · 2014-12-13T04:32:44Z

I'd like to leave the biasing idea out of this for now - while it probably is a good idea in certain cases, it will hurt some cases, and I don't think we have the benchmarks and usage data to say whether it truly is valuable. In contrast, I think that the change in this PR should help all use cases.

cgaebel · 2014-12-13T04:33:57Z

Oh sorry I meant in the future. Definitely not in this PR.

Human/IP is a shitty protocol.

Gankra · 2014-12-13T04:39:37Z

Yes agreed. I simply noted it for the sake of discussion.

gereeter · 2014-12-13T04:54:09Z

I think I've fixed all the issues brought up so far.

cgaebel · 2014-12-13T04:56:40Z

LGTM.

…ome runtine checks in favor of newly gained static safety

Gankra · 2014-12-13T05:28:01Z

r=me with squash

gereeter · 2014-12-13T05:30:47Z

Squashed.

Make BTree's internals safer and do more checks at compile time instead of run time Reviewed-by: Gankro

Before: ``` test btree::map::bench::find_rand_100 ... bench: 12 ns/iter (+/- 0) test btree::map::bench::find_rand_10_000 ... bench: 13 ns/iter (+/- 1) test btree::map::bench::find_seq_100 ... bench: 11 ns/iter (+/- 0) test btree::map::bench::find_seq_10_000 ... bench: 11 ns/iter (+/- 1) test btree::map::bench::insert_rand_100 ... bench: 106 ns/iter (+/- 1) test btree::map::bench::insert_rand_10_000 ... bench: 326 ns/iter (+/- 8) test btree::map::bench::insert_seq_100 ... bench: 198 ns/iter (+/- 1) test btree::map::bench::insert_seq_10_000 ... bench: 312 ns/iter (+/- 3) test btree::map::bench::iter_1000 ... bench: 16563 ns/iter (+/- 173) test btree::map::bench::iter_100000 ... bench: 1686508 ns/iter (+/- 108592) test btree::map::bench::iter_20 ... bench: 365 ns/iter (+/- 25) ``` After: ``` test btree::map::bench::find_rand_100 ... bench: 12 ns/iter (+/- 0) test btree::map::bench::find_rand_10_000 ... bench: 12 ns/iter (+/- 0) test btree::map::bench::find_seq_100 ... bench: 11 ns/iter (+/- 0) test btree::map::bench::find_seq_10_000 ... bench: 11 ns/iter (+/- 0) test btree::map::bench::insert_rand_100 ... bench: 89 ns/iter (+/- 1) test btree::map::bench::insert_rand_10_000 ... bench: 121 ns/iter (+/- 3) test btree::map::bench::insert_seq_100 ... bench: 149 ns/iter (+/- 0) test btree::map::bench::insert_seq_10_000 ... bench: 228 ns/iter (+/- 1) test btree::map::bench::iter_1000 ... bench: 16965 ns/iter (+/- 220) test btree::map::bench::iter_100000 ... bench: 1687836 ns/iter (+/- 18746) test btree::map::bench::iter_20 ... bench: 366 ns/iter (+/- 21) ```

cgaebel reviewed Dec 12, 2014
View reviewed changes

Gankra reviewed Dec 12, 2014
View reviewed changes

Gankra reviewed Dec 13, 2014
View reviewed changes

gereeter force-pushed the cleanup-btree-node branch 2 times, most recently from d92c67c to a4e5422 Compare December 13, 2014 04:51

Make BTree's Handle system more generic and more powerful, removing s…

808eeff

…ome runtine checks in favor of newly gained static safety

gereeter force-pushed the cleanup-btree-node branch from a4e5422 to 808eeff Compare December 13, 2014 05:30

bors added a commit that referenced this pull request Dec 14, 2014

Merge pull request #19782 from gereeter/cleanup-btree-node

b6f4111

Make BTree's internals safer and do more checks at compile time instead of run time Reviewed-by: Gankro

bors merged commit 808eeff into rust-lang:master Dec 16, 2014

gereeter deleted the cleanup-btree-node branch December 17, 2015 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make BTree's internals safer and do more checks at compile time instead of run time #19782

Make BTree's internals safer and do more checks at compile time instead of run time #19782

gereeter commented Dec 12, 2014

gereeter commented Dec 12, 2014

Gankra commented Dec 12, 2014

cgaebel Dec 12, 2014

gereeter Dec 12, 2014

gereeter commented Dec 12, 2014

cgaebel commented Dec 12, 2014

cgaebel Dec 12, 2014

cgaebel commented Dec 12, 2014

Gankra Dec 12, 2014

gereeter Dec 12, 2014

gereeter commented Dec 12, 2014

cgaebel commented Dec 12, 2014

Gankra commented Dec 12, 2014

Gankra commented Dec 12, 2014

Gankra Dec 13, 2014

gereeter Dec 13, 2014

gereeter commented Dec 13, 2014

Gankra Dec 13, 2014

gereeter Dec 13, 2014

Gankra Dec 13, 2014

Gankra commented Dec 13, 2014

cgaebel commented Dec 13, 2014

cgaebel commented Dec 13, 2014

Gankra commented Dec 13, 2014

cgaebel commented Dec 13, 2014

gereeter commented Dec 13, 2014

cgaebel commented Dec 13, 2014

Gankra commented Dec 13, 2014

gereeter commented Dec 13, 2014

cgaebel commented Dec 13, 2014

Gankra commented Dec 13, 2014

gereeter commented Dec 13, 2014

Make BTree's internals safer and do more checks at compile time instead of run time #19782

Make BTree's internals safer and do more checks at compile time instead of run time #19782

Conversation

gereeter commented Dec 12, 2014

gereeter commented Dec 12, 2014

Gankra commented Dec 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gereeter commented Dec 12, 2014

cgaebel commented Dec 12, 2014

Choose a reason for hiding this comment

cgaebel commented Dec 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gereeter commented Dec 12, 2014

cgaebel commented Dec 12, 2014

Gankra commented Dec 12, 2014

Gankra commented Dec 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gereeter commented Dec 13, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Dec 13, 2014

cgaebel commented Dec 13, 2014

cgaebel commented Dec 13, 2014

Gankra commented Dec 13, 2014

cgaebel commented Dec 13, 2014

gereeter commented Dec 13, 2014

cgaebel commented Dec 13, 2014

Gankra commented Dec 13, 2014

gereeter commented Dec 13, 2014

cgaebel commented Dec 13, 2014

Gankra commented Dec 13, 2014

gereeter commented Dec 13, 2014