Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make BTree's internals safer and do more checks at compile time instead of run time #19782

Merged
merged 1 commit into from
Dec 16, 2014

Conversation

gereeter
Copy link
Contributor

Before:

test btree::map::bench::find_rand_100                      ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_rand_10_000                   ... bench:        13 ns/iter (+/- 1)
test btree::map::bench::find_seq_100                       ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:        11 ns/iter (+/- 1)
test btree::map::bench::insert_rand_100                    ... bench:       106 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000                 ... bench:       326 ns/iter (+/- 8)
test btree::map::bench::insert_seq_100                     ... bench:       198 ns/iter (+/- 1)
test btree::map::bench::insert_seq_10_000                  ... bench:       312 ns/iter (+/- 3)
test btree::map::bench::iter_1000                          ... bench:     16563 ns/iter (+/- 173)
test btree::map::bench::iter_100000                        ... bench:   1686508 ns/iter (+/- 108592)
test btree::map::bench::iter_20                            ... bench:       365 ns/iter (+/- 25)

After:

test btree::map::bench::find_rand_100                      ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_rand_10_000                   ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_seq_100                       ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::insert_rand_100                    ... bench:        89 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000                 ... bench:       121 ns/iter (+/- 3)
test btree::map::bench::insert_seq_100                     ... bench:       149 ns/iter (+/- 0)
test btree::map::bench::insert_seq_10_000                  ... bench:       228 ns/iter (+/- 1)
test btree::map::bench::iter_1000                          ... bench:     16965 ns/iter (+/- 220)
test btree::map::bench::iter_100000                        ... bench:   1687836 ns/iter (+/- 18746)
test btree::map::bench::iter_20                            ... bench:       366 ns/iter (+/- 21)

@gereeter
Copy link
Contributor Author

cc @gankro

Also, the docs on this change are coming, but not yet done.

@Gankra
Copy link
Contributor

Gankra commented Dec 12, 2014

😲

CC @huonw @cgaebel

top: node::Handle<*mut Node<K, V>, KV, LeafOrInternal>,
}

struct LeafifiedSearchStack<'a, K:'a, V:'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct probably needs a comment, and a better name. Does LeafSearchStack still describe what this does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though LeafOccupiedSearchStack would be more accurate. However, maybe I should just make a single generic FullSearchStack with two extra parameters for the type of handle at the top.

@gereeter
Copy link
Contributor Author

Fixed @cgaebel's issues and added docs.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 12, 2014

-32 net LoC AND it's faster? Welp, I'm sold.

@@ -428,8 +428,10 @@ impl<K: Clone, V: Clone> Clone for Node<K, V> {
}
}

/// A reference to a key/value pair in the middle of a `Node`. Methods are provided for removing
/// the pair and accessing the pair and the adjacent edges.
/// A reference to a something in the middle of a `Node`. There are two `Type`s of `Handle`s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference to a something?

@cgaebel
Copy link
Contributor

cgaebel commented Dec 12, 2014

Random comment on the benchmarks:

It's really weird that find_rand_10_000 takes 12 ns, but iter_100_000 takes 17 ns per element. Does that imply that we could make iteration faster by just calling find in a loop? That seems a little crazy to me.

And since iter_1000 would be iterating through ~ 24 KB, and iter_100_000 would be iterating through 2.4 MB, I'd expect a bigger timing difference since we're stepping through two cache boundaries. Since we aren't, this leads to be believe iteration is speed limited by something other than cache, and is therefore a good candidate for optimization. Alternatively, Intel could just be prediction wizards and there's just no cache misses.

PartialSearchStack {
map: self.map,
stack: self.stack,
next: edge.edge_mut() as *mut _,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these by-value semantics necessary/useful anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, simply because Pusher is distinct from PartialSearchStack.

@gereeter
Copy link
Contributor Author

On that subject, why are the sequential runs slower than the random runs?

@cgaebel
Copy link
Contributor

cgaebel commented Dec 12, 2014

Maybe it's an artifact of btree splitting that's degrading into worst case performance? Someone should probably break out some pencil and paper and figure out why that's happening.

Hmm maybe it's because all the node linear searches are always taking the longest case.

@Gankra
Copy link
Contributor

Gankra commented Dec 12, 2014

Sequential insertion isn't necessarily something the (academic) BTree is good at. You fill up a node, split it, then never touch the left child again. This creates a big pool of half-empty nodes, and more allocations. Whereas random insertion should produce "fuller" nodes.

I am vaguely aware that Google's BTree implementation does some tricks to optimize for sequential insertions. A substantial refactor would also potentially consider this fancy kind of BTree that shares data between pairs of children. Each child must be ~2/3's full or something. Can't remember the name.

@Gankra
Copy link
Contributor

Gankra commented Dec 12, 2014

The iterator code is definitely not optimized. iirc if you toss in some #[inline]s it picks up in the benchmark, but that's not necessarily accurate for real-world usage.

I have a suspicion that BTree might be a tragic example of external doubled-ended iteration being inferior to internal iteration.

let right_edge_raw = right_edge.as_raw();
match right_edge.edge_mut() {
None => {
match top.leaf() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't seem to be using the Result-ness at all. I'd much rather a custom enum for this if that's the case. Possibly rename the method to force which returns Leaf(..) or Internal(..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one worry with creating a new enum with Leaf and Internal variants is the sheer amount of type/value namespace madness that would imply. There already is the type Edge separate from the enum variant Edge, though, so I guess it would be all right.

@gereeter
Copy link
Contributor Author

Actually, looking at the benchmarking code, there seem to be many more differences between insert_seq_n and insert_rand_n than I originally thought:

  • Before the benchmark, the map being mutated is created sequentially in insert_seq_n while it is created randomly in insert_rand_n.
  • The map being mutated has exactly n elements in the sequential case, but at most n elements in the random case.
  • The map being mutated is likely to become very dense in the random case, as the keys range from 0 to n and n of them are being inserted.
  • In the sequential case, the element to be inserted and removed moves along sequentially, while the element is chosen randomly in the random case. (This is what I thought would be the only difference).
  • In the sequential case, every search will be a hit until the element being searched for ends up larger than any key, at which point every search will be a miss. In contrast, the keys for the random case are chosen independently of creation, and so some will hit and some will miss.

}
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... it occurs to me that the handle API loses some of the value of the SearchStack model. The index-based model meant the Stack could guarantee that it had a complete path, but with this design one can follow handles down a few nodes and then push that handle, violating the guarantee, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - this was exactly the problem that I spent most of my time on the handle API working on that is solved by my use of IdRef and InvariantLifetime. Pusher::push requires the handle being pushed to have an IdRef based reference to the node with exactly the correct lifetime. However, if you tried to move down the node tree, then you would no longer have access to an IdRef, as those are only provided by PartialSearchStack::with - an attempt to push a node further down in the tree would fail statically with something like "expected IdRef<'id, Node<K, V>>, found &'a mut Node<K, V>". Even if you did manage to get another IdRef (e.g. through another call to PartialSearchStack::with), this would have the wrong lifetime, and the use of InvariantLifetime means that the lifetimes would have to match exactly.

I encourage you to try to break the API - it might become more clear how safety is ensured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, awesome! I got confused about how exactly the 'id manages to live on through the search call, but I went over the code again and see now how it works. The handle wraps exactly the node reference you give to search, and if you tried to actually "follow" the handle to the next child, the 'id wouldn't come with it.

Pretty slick!

@Gankra
Copy link
Contributor

Gankra commented Dec 13, 2014

@gereeter I'm not sure it matters, since different benches aren't really intended to be compared to each other, but I suppose you could get the "all misses" behaviour by making sequential Vec and shuffling it.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 13, 2014

I just hacked up an intrusive version of btree::iter:

test btree::map::bench::intrusive_iter_1000                ... bench:      2543 ns/iter (+/- 666)
test btree::map::bench::intrusive_iter_100000              ... bench:    343225 ns/iter (+/- 117773)
test btree::map::bench::intrusive_iter_20                  ... bench:        53 ns/iter (+/- 17)
test btree::map::bench::iter_1000                          ... bench:     15217 ns/iter (+/- 3867)
test btree::map::bench::iter_100000                        ... bench:   1432242 ns/iter (+/- 428693)
test btree::map::bench::iter_20                            ... bench:       351 ns/iter (+/- 69)

Use that information how you will. I think this is strong evidence for inclusion in the API. It composes poorly, but is 6 times faster at iterating over 20 elements, and 4 times faster at iterating over 100k elements.

(sorry, this PR just so happens to be a forum with everyone who cares about this sorta stuff. Don't mean to hijack the discussion!)

I'll prepare a PR for discussion.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 13, 2014

#19796

@Gankra
Copy link
Contributor

Gankra commented Dec 13, 2014

I'm currently skimming through their source for details, but here's at least a fragment of the sequential insertion optimization I was talking about before: https://code.google.com/p/cpp-btree/source/browse/btree.h#1552

@cgaebel
Copy link
Contributor

cgaebel commented Dec 13, 2014

They sure use a lot of unsafe code.

That optimization doesn't look too hard to add. We should do that.

@gereeter
Copy link
Contributor Author

I'd like to leave the biasing idea out of this for now - while it probably is a good idea in certain cases, it will hurt some cases, and I don't think we have the benchmarks and usage data to say whether it truly is valuable. In contrast, I think that the change in this PR should help all use cases.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 13, 2014

Oh sorry I meant in the future. Definitely not in this PR.

Human/IP is a shitty protocol.

@Gankra
Copy link
Contributor

Gankra commented Dec 13, 2014

Yes agreed. I simply noted it for the sake of discussion.

@gereeter gereeter force-pushed the cleanup-btree-node branch 2 times, most recently from d92c67c to a4e5422 Compare December 13, 2014 04:51
@gereeter
Copy link
Contributor Author

I think I've fixed all the issues brought up so far.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 13, 2014

LGTM.

…ome runtine checks in favor of newly gained static safety
@Gankra
Copy link
Contributor

Gankra commented Dec 13, 2014

r=me with squash

@gereeter
Copy link
Contributor Author

Squashed.

bors added a commit that referenced this pull request Dec 14, 2014
Make BTree's internals safer and do more checks at compile time instead of run time

Reviewed-by: Gankro
bors added a commit that referenced this pull request Dec 16, 2014
Before:
```
test btree::map::bench::find_rand_100                      ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_rand_10_000                   ... bench:        13 ns/iter (+/- 1)
test btree::map::bench::find_seq_100                       ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:        11 ns/iter (+/- 1)
test btree::map::bench::insert_rand_100                    ... bench:       106 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000                 ... bench:       326 ns/iter (+/- 8)
test btree::map::bench::insert_seq_100                     ... bench:       198 ns/iter (+/- 1)
test btree::map::bench::insert_seq_10_000                  ... bench:       312 ns/iter (+/- 3)
test btree::map::bench::iter_1000                          ... bench:     16563 ns/iter (+/- 173)
test btree::map::bench::iter_100000                        ... bench:   1686508 ns/iter (+/- 108592)
test btree::map::bench::iter_20                            ... bench:       365 ns/iter (+/- 25)
```

After:
```
test btree::map::bench::find_rand_100                      ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_rand_10_000                   ... bench:        12 ns/iter (+/- 0)
test btree::map::bench::find_seq_100                       ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::find_seq_10_000                    ... bench:        11 ns/iter (+/- 0)
test btree::map::bench::insert_rand_100                    ... bench:        89 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000                 ... bench:       121 ns/iter (+/- 3)
test btree::map::bench::insert_seq_100                     ... bench:       149 ns/iter (+/- 0)
test btree::map::bench::insert_seq_10_000                  ... bench:       228 ns/iter (+/- 1)
test btree::map::bench::iter_1000                          ... bench:     16965 ns/iter (+/- 220)
test btree::map::bench::iter_100000                        ... bench:   1687836 ns/iter (+/- 18746)
test btree::map::bench::iter_20                            ... bench:       366 ns/iter (+/- 21)
```
@bors bors merged commit 808eeff into rust-lang:master Dec 16, 2014
@gereeter gereeter deleted the cleanup-btree-node branch December 17, 2015 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants