Metabug: Libcollections TODO #18009

Gankra · 2014-10-13T16:55:54Z

As part of the stabilization process, we'd like our standard collection implementations to be good. Maybe even great. Wouldn't that be nice? The overall quality of each implementation is pretty varied, partially just because the language has dramatically shifted out from underneath them. Vec and HashMap are in a good place because they're The Big Dogs of data structuring, and have gotten substantial attention as a result. Other less-used collections are in worse shape, and need some serious attention.

If you're interested in working on a Rust project, or collections in particular, here's a big list of things to do experiment with on libcollections.

DList's internal "safe" API is totally busted. It's not safe at all. Possibly simply because RawLink is Copyable. This allows a reference to a RawLink to be converted into a value, allowing mutation of an immutable DList.
Investigate implementing a proper bidirectional cursor on DList that can insert/remove values. It's borderline useless without this.
Full code review of BitV and BitVSet. They've had a lot of bugs because the code base has seen a lot of churn. Some legacy references to the bad-old-day of BitV being an enum still exist (such as a benchmark that just tests how fast bitshifting a uint is).
Bitv uses architecture-sized uints for backing storage #16736 BitV's internal representation is a Vec of uints. It should almost certainly be a Vec of u32's, or some other non-machine-sized type.
RingBuf should not be a Vec of Options.
Investigate whether accepting total unsafety in TreeMap and using RawPtrs everywhere is worth it. May be necessary in a future full of scoped allocators.
Replace the bulky triple of Vecs used in BTreeMap's Node struct with a single manually allocated
buffer, in the same vein as HashMap's RawTable.
Investigate a richer default B-selection algorithm for BTreeMap's default constructor than "6"
Investigate alternative node-search schemes over trivial linear search for BTreeMap. Binary search? skip-k-at-a-time linear search? Dynamic (or static!) selection of these algorithms based on B/K/V?
Seriously optimize BTreeMap at all. At all.
Investigate the possibility of replacing the heap-allocated Vec<*mut Node<T>>-based search-stacks of BTreeMap, TreeMap, and TrieMap with a stack-allocated [*mut Node<T>, ..uint::BITS] (TreeMap might need more, it's an AA-tree, dunno how tall they can get). Note that a preliminary experiment on BTree gave very poor results, though it has the least to gain from such a design from it's current design (tracks depth, can use with_capacity.
Investigate the possibility of reusing the SearchStack concept used in BTreeMap to more safely wrap the rawptr-based search stacks of TreeMap and TrieMap.
Investigate the possibility of making the iterators for TreeMap and TrieMap DoubleEnded, rather than having separate forward/backward iterators.
At very least, get rid of the old-style internal backwards iterator on TrieMap.
Investigate the possibility of cutting out a lot of the macro-madness from TreeMap and TrieMap (BTreeMap has no macros and little-to-no code duplication, just hardcore generics). Possibly using the Deref/DerefMut trick in HashMap.
Investigate completely replacing the unsafe rawptr search-stacks in the iterators of TreeMap and TrieMap with safe stacks of iterators over the nodes themselves, like BTreeMap does.
Investigate arbitrary sub-range iterators for sorted sets/maps.
Investigate functionality augmentations to a lot of collections iterators. For instance DList's iterator provides peek_next and remove_next methods for soundly removing elements from the list during iteration. At least Vec, RingBuf, and HashMap could all support this soundly, I believe. Vec, RingBuf, and DList could also support insert_next. Special iterator-like objects dedicated to these tasks might be better ideas, though (cursors, zippers, etc).
Implement "entry" API for maps #17320 Implement the new Entry API on TreeMap and TrieMap.
Investigate different hashing architectures and mechanisms for hash algorithm selection than the current one. For instance, our current stream-based architecture is poorly designed for small keys (e.g. u64's) which can and should be hashed in a single step.
Add a min heap option to PriorityQueue #15947 Investigate making ordering-based collections comparator based rather than directly Ord-based using unboxed closures. Ord types should be supported through some kind of "natural comparator" that is reversible (for supporting reverse orderings). PriorityQueue in particular would like this to simplify min-vs-max-heap.
Investigate optimizing PriorityQueue.extend with a modified heapify operation.
Consider more slice utility methods. slice.rotate_left/right? slice.shift_and_replace_left/right?
Consider providing split(&mut self, at: Index) -> Self on some collections.
Investigate tricks used in other languages to optimize for common/special access patterns.
Tests! Code coverage!
Docs! Examples!
Detailed discussions of what individual collections are good/bad for.
Performance comparison tables (probably just assymptoticss
There was only one other checked box and it looked lonely so I made it a friend. :D

The text was updated successfully, but these errors were encountered:

aturon · 2014-10-13T16:57:07Z

cc me

Gankra · 2014-10-13T16:58:24Z

I've added this mostly verbatim from the reddit thread I originally posted this to after prompting from several developers. Some things were added. This is still pretty heavily biased towards things I've poked at. Maybe TrieSet is a disaster. Maybe LruCache is a nightmare. No clue.

aturon · 2014-10-13T17:11:55Z

I'd suggest using this issue to coordinate work on some of these subitems. When a particular item needs additional discussion, you should spin off and link to an actual subissue for it.

Also, just a note that most of the TODO items here are not strictly needed for stabilization, which is about API commitments; a lot of these are important suggestions for implementation and documentation improvements.

Gankra · 2014-10-13T17:15:31Z

Some of these actually have issues. Working on collecting those up now.

gamazeps · 2014-10-13T19:57:21Z

Looking into 'BitV's internal representation is a Vec of uints. It should almost certainly be a Vec of u32's, or some other non-machine-sized type.'

Seems like you already fixed it 4 days ago @gankro ;) (it simply wasn't merged yet)

Gankra · 2014-10-13T20:23:32Z

@gamazeps Huh? BitV still uses uints.

I did start up a branch expecting it to be a quick CTRL+H job, but got too distracted in the bigger "needs a full review" issue as I went over all the changes to make sure I wasn't messing something up. In particular, reading over the code, it's really hard to tell locally what should be a uint, and what should be a u32 (is that value a word, or an index?). I've been pretty scatter-brained/sick lately, and don't have enough time/energy to ensure I'm not screwing the whole API up. I've been sticking to writing/design stuff for the past-while since it's easier for me to squeeze in short bursts when I'm not doing school stuff. Also fever-dream prose is just hard to read, not a bug.

If you want to pick up and finish the work on my branch, feel free!

gamazeps · 2014-10-13T20:32:50Z

Ok ok ^^
I thought since you started it you already finished it ^^

So I'll do that (but I'm not experimented enough to do the code review of the whole BitV for now).

Gankra · 2014-10-13T20:54:56Z

Yeah, that can wait. @apoelstra has volunteered to help review any BitV PRs, but is also a bit swamped.

reem · 2014-10-13T22:22:08Z

@gankro Have you thought a lot about how to fix DList? I'd be interested in iterating on it.

Gankra · 2014-10-13T22:45:32Z

@reem I actually started up some work on this 2 months ago when I didn't understand Rust as well. It's mixed in with my Cursor work on this branch: https://github.com/Gankro/rust/compare/dlist-seek

The first commit is the refactor work, though it also features more sweeping reform in anticipation of introducing cursors: Gankra@4e259d2

At the time, I thought it was hopeless, but I realize now that it should mostly "just work" if you manually mark the RawLink's NoCopy. Skimming over it again now, I also came to the conclusion that the fact that we make frequently make invalid RawLinks transiently, so that makes dereffing them unsafe. That... might be valid?

This kind of goes hand-in-hand with some unsafe code conventions I've been mulling over in my mind. Specifically, is internal code reasonable to mark safe if assumes pre-conditions? That is, if it assumes it's only called when the structure is in a "valid" state, and not a transient "invalid" one?

reem · 2014-10-13T22:47:10Z

Ideally, I think that unsafe code should only marked safe if it works in any case, i.e. there should never be a way to cause undefined behavior in safe code, even if you used unsafe code earlier.

I think that we should discuss Zippers and Cursors separately, and should probably make an RFC to introduce them. I'd be happy to chat on IRC or here about what they should like.

Gankra · 2014-10-13T22:48:32Z

Yeah, then we need to waste cycles ensuring links are always valid, or we need to mark way more of DList straight-up unsafe.

reem · 2014-10-13T22:49:03Z

I think that (ideally) no operations on DList should let you leave it in an unsafe state.

Gankra · 2014-10-13T22:53:38Z

All of public operations are legit, it's jut trivial to make/deref a dangling pointer using the current internal API. Presumably Node or RawLink could be beefed up with more logic so that this doesn't happen. e.g. put all the unsafe link-mutating code on Node with a safe interface for DList to safely use. Node.insert_left(Node), Node.insert_right(Node), Node.cut_out(), Node.next() Node.prev(), etc.

Actually I think I mostly did some of this with the insert_before_node insert_after_node etc code I added to DList in the linked commit. (It's been a long time)

reem · 2014-10-13T22:54:59Z

Hmmm. For internal code I think the distinction is less clear. I'm attempting a refactor now, and my goal is to move as much unsafe as possible into safe abstractions, but we'll see how it goes.

I'll take a look at your refactor.

We should spin this off into a subissue.

Danger: fn resolve<'a>(&mut self) -> Option<&'a mut T>...

One of the main things I'm doing is introducing the invariant to RawLink that the internal pointer is always valid, then keeping it behind an Option. I think this makes things a bit simpler.

gamazeps · 2014-10-13T22:57:35Z

If you are going to delve deeper into it on IRC, I'm interested ;)

Gankra · 2014-10-13T22:58:40Z

Remember, I wrote it at the start of August, and I first started learning Rust in July 😅

(I see you, old transmute hack I loved)

gamazeps · 2014-10-13T23:55:45Z

The u32 for the BitV seems to be fine (running benchs to measure).

Looking into "RingBuf should not be a Vec of Options." now :)

gereeter · 2014-10-14T01:47:07Z

I've got a slimmer BTree Node almost working and have started on a faster PriorityQueue.extend().

Gankra · 2014-10-14T02:00:22Z

You're all amazing ❤️

Closes rust-lang#16736 linked to rust-lang#18009

Closes #16736 linked to #18009

reem · 2014-10-15T07:08:00Z

@gankro I'd love to take a crack at discussing what certain data structures are best for. (I'm also still looking at refactoring DList).

Where do you see that going/what format do you think would be best for it?

Gankra · 2014-10-15T16:45:27Z

@reem You can check out my BTree docs: http://doc.rust-lang.org/std/collections/struct.BTreeMap.html

and the high-levels docs I added to libcollections: http://doc.rust-lang.org/std/collections/index.html

for the sort of things I had in mind. That said, I haven't thought about it too hard, so if there's something you think should or shouldn't be done that isn't or is, feel free to pitch in ideas. I'm not a documentation expert.

Linked to rust-lang#18009

I was going to write some doc in order to remove the #[allow(missing_doc)] but there was actually none missing. I also removed a warning i didn't see in my last commit #18018 Linked to #18009

Gankra · 2014-10-29T16:40:35Z

CC #18424 collections reform has passed! Lots of work to do in there too!

…=bstrie I've implemented the new collection views API for TrieMap. I more or less followed the approach set out by @gankro in BTreeMap, by using a `SearchStack`. There's quite a bit of unsafe code, but I've wrapped it safely where I think is appropriate. I've added tests to ensure everything works, and performance seems quite good. ``` test trie::bench_map::bench_find ... bench: 67879 ns/iter (+/- 4192) test trie::bench_map::bench_find_entry ... bench: 186814 ns/iter (+/- 18748) test trie::bench_map::bench_insert_large ... bench: 716612 ns/iter (+/- 160121) test trie::bench_map::bench_insert_large_entry ... bench: 851219 ns/iter (+/- 20331) test trie::bench_map::bench_remove ... bench: 838856 ns/iter (+/- 27998) test trie::bench_map::bench_remove_entry ... bench: 981711 ns/iter (+/- 53046) ``` Using an entry is slow compared to a plain find, but is only ~15% slower for inserts and removes, which is where this API is most useful. I'm tempted to remove the standalone `remove` function in favour of an entry-based approach (to cut down on complexity). I've added some more comments to the general part of the code-base, which will hopefully help the next person looking over this. I moved the three key structures to the top of the file so that the nesting structure is clearly visible, and renamed `Child<T>` to `TrieNode<T>` and `TrieNode<T>` to `InternalNode<T>` to improve clarity. If these changes are creeping, I'm happy to revert them. Let me know if my use of `fail!` is ok, I was a little unsure of how specific to be. Some of the data-structures have various invariants that shouldn't be broken, so using `fail!` seemed appropriate. ## Still to do * Modernise iterators (make them double-ended). * Make the keys generic, or rename this data-structure (see: #14902). * Possibly move this code out of libcollections. [Searching Github for TrieMap turns up very few real results.][triemap-search] Related issues: #18009 and #17320 [triemap-search]: https://github.com/search?utf8=%E2%9C%93&q=TrieMap+language%3ARust&type=Code&ref=searchresults

Fix for task in Metabug #18009 (Rebased version of #18170) This changes much of about how RingBuf functions. `lo`, `nelts` are replaced by a more traditional `head` and`tail`. The `Vec<Option<T>>` is replaced by a bare pointer that is managed by the `RingBuf` itself. This also expects the ring buffer to always be size that is a power of 2. This change also includes a number of new tests to cover the some areas that could be of concern with manual memory management. The benchmarks have been reworked since the old ones were benchmarking of the Ring buffers growth rather then the actual test. The unit test suite have been expanded, and exposed some bugs in `fn get()` and `fn get_mut()` ## Benchmark **Before:** ``` test ring_buf::tests::bench_grow_1025 ... bench: 8919 ns/iter (+/- 87) test ring_buf::tests::bench_iter_1000 ... bench: 924 ns/iter (+/- 28) test ring_buf::tests::bench_mut_iter_1000 ... bench: 918 ns/iter (+/- 6) test ring_buf::tests::bench_new ... bench: 15 ns/iter (+/- 0) test ring_buf::tests::bench_pop_100 ... bench: 294 ns/iter (+/- 9) test ring_buf::tests::bench_pop_front_100 ... bench: 948 ns/iter (+/- 32) test ring_buf::tests::bench_push_back_100 ... bench: 291 ns/iter (+/- 16) test ring_buf::tests::bench_push_front_100 ... bench: 311 ns/iter (+/- 27 ``` **After:** ``` test ring_buf::tests::bench_grow_1025 ... bench: 2209 ns/iter (+/- 169) test ring_buf::tests::bench_iter_1000 ... bench: 534 ns/iter (+/- 27) test ring_buf::tests::bench_mut_iter_1000 ... bench: 515 ns/iter (+/- 28) test ring_buf::tests::bench_new ... bench: 11 ns/iter (+/- 0) test ring_buf::tests::bench_pop_100 ... bench: 170 ns/iter (+/- 5) test ring_buf::tests::bench_pop_front_100 ... bench: 171 ns/iter (+/- 11) test ring_buf::tests::bench_push_back_100 ... bench: 172 ns/iter (+/- 13) test ring_buf::tests::bench_push_front_100 ... bench: 158 ns/iter (+/- 12) ```

...ated buffer. Before: test btree::map::bench::find_rand_100 ... bench: 29 ns/iter (+/- 2) test btree::map::bench::find_rand_10_000 ... bench: 83 ns/iter (+/- 6) test btree::map::bench::find_seq_100 ... bench: 30 ns/iter (+/- 1) test btree::map::bench::find_seq_10_000 ... bench: 50 ns/iter (+/- 3) test btree::map::bench::insert_rand_100 ... bench: 186 ns/iter (+/- 30) test btree::map::bench::insert_rand_10_000 ... bench: 377 ns/iter (+/- 8) test btree::map::bench::insert_seq_100 ... bench: 299 ns/iter (+/- 10) test btree::map::bench::insert_seq_10_000 ... bench: 368 ns/iter (+/- 12) test btree::map::bench::iter_1000 ... bench: 20956 ns/iter (+/- 479) test btree::map::bench::iter_100000 ... bench: 2060899 ns/iter (+/- 44325) test btree::map::bench::iter_20 ... bench: 560 ns/iter (+/- 63) After: test btree::map::bench::find_rand_100 ... bench: 28 ns/iter (+/- 2) test btree::map::bench::find_rand_10_000 ... bench: 74 ns/iter (+/- 3) test btree::map::bench::find_seq_100 ... bench: 31 ns/iter (+/- 0) test btree::map::bench::find_seq_10_000 ... bench: 46 ns/iter (+/- 0) test btree::map::bench::insert_rand_100 ... bench: 141 ns/iter (+/- 1) test btree::map::bench::insert_rand_10_000 ... bench: 273 ns/iter (+/- 12) test btree::map::bench::insert_seq_100 ... bench: 255 ns/iter (+/- 17) test btree::map::bench::insert_seq_10_000 ... bench: 340 ns/iter (+/- 3) test btree::map::bench::iter_1000 ... bench: 21193 ns/iter (+/- 1958) test btree::map::bench::iter_100000 ... bench: 2203599 ns/iter (+/- 100491) test btree::map::bench::iter_20 ... bench: 614 ns/iter (+/- 110) This code could probably be a fair bit cleaner, but it works. Part of #18009.

Gankra · 2014-12-17T19:39:02Z

TreeMap and TrieMap are on the way out, so I've marked all their issues as "resolved". A lot of the other issues have prototypes in collect-rs (where TreeMap and TrieMap live now).

Also @gereeter's excellent efforts have resolved a lot of the BTree issues!

@csherratt's work has resolved the RingBuf issue!

A lot of other stuff is in rust-lang/rfcs#509

steveklabnik · 2015-10-27T15:24:57Z

it looks like this bug may be done now? It seems like this was mostly "do work for 1.0". @gankro is that true?

Gankra · 2015-10-27T15:49:49Z

Eh, it's good enough.

Gankra · 2015-10-27T15:50:25Z

(most of the outstanding issues have their own stuff somewhere in the tracker these days)

flip111 · 2017-11-10T08:42:56Z

In the original reddit thread multi indexed collections were mentioned. Was this included? Where can i find it? I'm refering to functionality which is also available in other languages:

http://www.boost.org/doc/libs/1_62_0/libs/multi_index/doc/index.html
https://pandas.pydata.org/pandas-docs/stable/advanced.html
https://hackage.haskell.org/package/ixset-typed

It's useful to model data structures as can be done in relational databases (some documentation linked above explains it).

fix: do not assume rustup is installed in xtask codegen take 2 7d9e4fcc07e5de94e37b73436147cdbbaa35dbdc broke this on rustup toolchains, the `cmd` command is trying to be too smart here

gamazeps mentioned this issue Oct 13, 2014

libcollection : Switches from uint to u32 in BitV and BitVSet #18018

Merged

gereeter mentioned this issue Oct 14, 2014

Slimmify BTree by replacing the three Vecs in Node with a manually alloc... #18028

Merged

gamazeps pushed a commit to gamazeps/rust that referenced this issue Oct 14, 2014

Switches from uint to u32 in BitV and BitVSet

1ef5e38

Closes rust-lang#16736 linked to rust-lang#18009

bors added a commit that referenced this issue Oct 14, 2014

auto merge of #18018 : gamazeps/rust/isuue16736, r=cmr

40b2449

Closes #16736 linked to #18009

gamazeps added a commit to gamazeps/rust that referenced this issue Oct 16, 2014

Removes useless confs from bitv.rs

c1f7a19

Linked to rust-lang#18009

gamazeps mentioned this issue Oct 16, 2014

Removes useless confs from bitv.rs #18089

Closed

ghost mentioned this issue Oct 19, 2014

Remove use of Option<T> inside of RingBuf #18170

Closed

michaelsproul mentioned this issue Oct 24, 2014

Implement the collection views API for TrieMap. #18287

Merged

ghost mentioned this issue Nov 7, 2014

Ringbuf remove option<T> #18747

Merged

steveklabnik added the metabug Issues about issues themselves ("bugs about bugs") label Jan 29, 2015

steveklabnik added A-libs metabug Issues about issues themselves ("bugs about bugs") and removed metabug Issues about issues themselves ("bugs about bugs") labels Oct 27, 2015

Gankra closed this as completed Oct 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metabug: Libcollections TODO #18009

Metabug: Libcollections TODO #18009

Gankra commented Oct 13, 2014

aturon commented Oct 13, 2014

Gankra commented Oct 13, 2014

aturon commented Oct 13, 2014

Gankra commented Oct 13, 2014

gamazeps commented Oct 13, 2014

Gankra commented Oct 13, 2014

gamazeps commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

gamazeps commented Oct 13, 2014

Gankra commented Oct 13, 2014

gamazeps commented Oct 13, 2014

gereeter commented Oct 14, 2014

Gankra commented Oct 14, 2014

reem commented Oct 15, 2014

Gankra commented Oct 15, 2014

Gankra commented Oct 29, 2014

Gankra commented Dec 17, 2014

steveklabnik commented Oct 27, 2015

Gankra commented Oct 27, 2015

Gankra commented Oct 27, 2015

flip111 commented Nov 10, 2017

Metabug: Libcollections TODO #18009

Metabug: Libcollections TODO #18009

Comments

Gankra commented Oct 13, 2014

aturon commented Oct 13, 2014

Gankra commented Oct 13, 2014

aturon commented Oct 13, 2014

Gankra commented Oct 13, 2014

gamazeps commented Oct 13, 2014

Gankra commented Oct 13, 2014

gamazeps commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

Gankra commented Oct 13, 2014

reem commented Oct 13, 2014

gamazeps commented Oct 13, 2014

Gankra commented Oct 13, 2014

gamazeps commented Oct 13, 2014

gereeter commented Oct 14, 2014

Gankra commented Oct 14, 2014

reem commented Oct 15, 2014

Gankra commented Oct 15, 2014

Gankra commented Oct 29, 2014

Gankra commented Dec 17, 2014

steveklabnik commented Oct 27, 2015

Gankra commented Oct 27, 2015

Gankra commented Oct 27, 2015

flip111 commented Nov 10, 2017