-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for BTreeMap cursors #107540
Comments
The key method on the Cursor and CursorMut types should be unsafe. If the key is interiorly mutable you could mutate it through the shared reference and break the invariants of BtreeMap. For example if the key is of type Cell<u32> I could call key.set and change the value from 1 to 1 billion and break potentially both the "Key must be unique" and "Key must be correctly ordered" invariants of BTreeMap. |
As I understand it, the BTreeMap invariants are not safety invariants. If the key is changed from underneath the map, it can lead to bugs. But crucially not to memory safety violations. The standard library docs make this distinction clear:
|
But the key_mut_unchecked method on CursorMut is unsafe because you might break the invariants of BTreeMap |
Unless there is a memory safety invariant with |
The invariant here is that if a function receives a |
You should clarify this in the docs. It's not really clear why key_mut_unchecked is unsafe reading the docs IMO |
FWIW, I agree that the docs are unclear. Copied directly from rust-lang/libs-team#141: /// Returns a mutable reference to the of the element that the cursor is
/// currently pointing to.
///
/// This can be used to modify the key, but you must ensure that the
/// `BTreeMap` invariants are maintained. Specifically:
///
/// * The key must remain unique within the tree.
/// * The key must remain in sorted order with regards to other elements in
/// the tree.
unsafe fn key_mut_unchecked(&mut self) -> Option<&mut K>; This does not explain what exactly is "unchecked". It appears to just repeat the invariants listed in the BTreeMap docs, which claim there is no possibility of UB by breaking them. |
So, I've been trying to create my own "range map" using this API and it feels… extremely clunky to me. Since I'm mostly working with the mutable API, I'll be talking about The handling of "ghost elements" is extremely clunky to me. From what I understand, these only can show up at the beginning and end of the map, since all entries into the API are via In that line of reasoning, I think that While I think that the guarantees for the |
There are several advantages in having a ghost element:
With that said, I can see how it can be hard to work with. I'd be happy to try out a better API if you want to sketch one out. |
I assume this is a non-starter for performance reasons, but the most "natural" feeling interface for me would be one that can point to all real elements and all points between/next to them (including start/end). There would then be convenience methods for skipping over whatever you're not interested in so you can stay in "element space" or "gap space" if you only care about one or the other most of the time. But again, I imagine this would make the API too slow because the compiler couldn't optimise away visiting all the gaps — does that sound right? (Edit: if this doesn't make sense, I'd be happy to sketch what I imagine would be the core API, and what would be the optional convenience layers.) (Edit2: or you could have separate cursor types for when you're pointing at an element vs a gap, and have the navigation functions consume self and convert between them where relevant. I think that would be on par for performance with the current design?) |
Honestly, the kind of modification I was thinking of was to leverage the existing |
I have been implementing a TCP segment reordering buffer (where you're storing ranges and they wrap around u32::MAX). So I have been looking into building a wrapping iterator over a BTreeMap and it seems that implementing a wrapping mutating cursor based on this is surprisingly nontrivial: you'd have to ... somehow ... swap between a &mut BTreeMap (to create a new CursorMut when you wrap) and a CursorMut and when you get rid of the CursorMut to replace it, put the &mut BTreeMap back. This is kind of nonobvious how to do when hand rolling a generator as one does while writing these things. I guess my feedback on this API is that it might need a function to move it back to the start or to a specified element. |
Can we document the potential hazard that In other words, when porting C++ to Rust,
FWIW, I'm strongly in favor of both API's 🚀 |
Are you sure about this? They should map the same way.
|
I'm more confused than I am sure, but I still get the impression that there is a hazard when porting C++ to Rust. C++ Rust So, here are the results on an example map:
My code#include #include int main () { std::map mymap; std::map::iterator itlow,itup; mymap[1]=1; mymap[2]=2; mymap[3]=3; mymap[4]=4; mymap[5]=5; itup=mymap.upper_bound (3); std::cout << itup->first << ' ' << itup->second << '\n'; itlow=mymap.lower_bound (3); std::cout << itlow->first << ' ' << itlow->second << '\n'; return 0; } #![feature(btree_cursors)] fn main() { use std::collections::BTreeMap; use std::ops::Bound; let mut a = BTreeMap::new(); a.insert("k1", "v1"); a.insert("k2", "v2"); a.insert("k3", "v3"); a.insert("k4", "v4"); a.insert("k5", "v5"); let cursor = a.upper_bound(Bound::Excluded(&"k3")); println!("upper exc {:?}", cursor.key()); let cursor = a.upper_bound(Bound::Included(&"k3")); println!("upper inc {:?}", cursor.key()); let cursor = a.lower_bound(Bound::Excluded(&"k3")); println!("lower exc {:?}", cursor.key()); let cursor = a.lower_bound(Bound::Included(&"k3")); println!("lower inc {:?}", cursor.key()); } @Amanieu thoughts? |
I see, you are indeed correct. Considering the feedback received so far, I am wondering if changing Cursors to represent a point between 2 elements rather than pointing at one element would be better:
Let me know if this sounds promising, and I'll try and sketch out a new API with this design. |
It's hard to say which is better without some short before & after code examples i.e. to do X, you do Anyway, after realizing the confusion, the API's were both intuitive and useful for the porting I was doing today and the only thing I'm asking for is for the docs to point out the porting hazard (alternatively, the function names could be changed to avoid matching those used in C++). |
Here's an API sketch for a cursor that resides at a point between 2 elements. Elements are read by "passing over" them with a cursor. impl<K, V> BTreeMap<K, V> {
fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
fn lower_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
fn upper_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
where
K: Borrow<Q> + Ord,
Q: Ord;
}
struct Cursor<'a, K: 'a, V: 'a>;
impl<'a, K, V> Cursor<'a, K, V> {
fn next(&mut self) -> Option<(&'a K, &'a V)>;
fn prev(&mut self) -> Option<(&'a K, &'a V)>;
fn peek_next(&self) -> Option<(&'a K, &'a V)>;
fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}
struct CursorMut<'a, K: 'a, V: 'a>;
impl<'a, K, V> CursorMut<'a, K, V> {
fn next(&mut self) -> Option<(&K, &mut V)>;
fn prev(&mut self) -> Option<(&K, &mut V)>;
unsafe fn next_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
unsafe fn prev_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
fn peek_next(&self) -> Option<(&K, &V)>;
fn peek_prev(&self) -> Option<(&K, &V)>;
fn peek_next_mut(&mut self) -> Option<(&K, &mut V)>;
fn peek_prev_mut(&mut self) -> Option<(&K, &mut V)>;
unsafe fn peek_next_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
unsafe fn peek_prev_with_mut_key(&mut self) -> Option<(&mut K, &mut V)>;
fn as_cursor(&self) -> Cursor<'_, K, V>;
// Inserts at current point, cursor is moved to before the inserted element.
fn insert_after(&mut self, key: K, value: V);
// Inserts at current point, cursor is moved to after the inserted element.
fn insert_before(&mut self, key: K, value: V);
unsafe fn insert_after_unchecked(&mut self, key: K, value: V);
unsafe fn insert_before_unchecked(&mut self, key: K, value: V);
fn remove_next(&mut self) -> Option<(K, V)>;
fn remove_prev(&mut self) -> Option<(K, V)>;
} |
Conceptually, I think that I do like this API more, but I still think that the combinatorical explosion of options here could be more easily avoided by integrating properly with the Essentially, the position of the cursor between the elements is still unchanged. But rather than making the cursor this all-powerful entity, we give it equal footing with other various elements. To accomplish this, I propose a particular sleight of hand. We retain the meaning of a "gap" as the position in the map between elements. A vacant entry points to a particular gap, and an occupied entry points to a particular element and its adjacent gaps. When mutating a key for either kind of entry, you must uphold that the entry upholds the By leaving the semantics of the key specifically in the entry and requiring the entry for mutation of elements, the semantics of all the various mutable methods becomes more clear. Here's what the conceptual API would look like: type Position = /* we'll talk about this later */;
impl<'a, K, V> Cursor<'a, K, V> { // and `CursorMut`
// looks around the cursor
fn peek_next(&self) -> Option<(&K, &V)>;
fn peek_prev(&self) -> Option<(&K, &V)>;
// move the cursor
fn next(&mut self) -> Position;
fn prev(&mut self) -> Position;
}
impl<'a, K, V> CursorMut<'a, K, V> {
// attempts to insert at this spot in the cursor
fn insert(self, key: K) -> Result<VacantEntry<'a, K, V>, CursorMut<'a, K, V>>;
unsafe fn insert_unchecked(self, key: K) -> VacantEntry<'a, K, V>;
// mutates around the cursor
fn enter_next(&mut self) -> Option<OccupiedEntry<'_, K, V>>;
fn enter_prev(&mut self) -> Option<OccupiedEntry<'_, K, V>>;
}
impl<'a, K, V> OccupiedEntry<'a, K, V> { // and `VacantEntry` and `Entry` too
// requires upholding invariants mentioned
unsafe fn key_mut(&mut self) -> &mut K;
}
impl<'a, K, V> VacantEntry<'a, K, V> {
// if we want the ability to convert back into a cursor
fn seek_elsewhere(self) -> (K, CursorMut<'a, K, V>);
fn insert_and_seek_after(self, value: V) -> CursorMut<'a, K, V>;
fn insert_and_seek_before(self, value: V) -> CursorMut<'a, K, V>;
} Part of the reason why I think Note the bit I said about That's the bit I'm fuzziest on. But the rest is my basic idea on how to use entries to solve the complexity and the semantics. We don't need any weird |
A side note on the naming of |
While I'm not familiar enough with the cursor code to refactor everything to only point between elements, I did submit #112896 which adds the |
I'd like to also comment on the method names, if I can. I've participated in several programming competitions in C++, where binary searches and searches on binary trees are common. Every time I've had to perform one, I had to stare at the documentation for several minutes, struggling to figure out what Rust's current |
I have the same problem. I was also confused by how bounds are interpreted. I think the term "above" is misleading. Finally, after quite a bit of time I came up with the following rules. let mut a = BTreeMap::new();
a.insert(1, "a");
a.insert(2, "b");
a.insert(3, "c");
a.insert(4, "d");
// a: [1, 2, 3, 4]
// The range of Included(&2) to Unbounded: [.. , 2 , ..]
// [2, +∞) ---------
let cursor = a.lower_bound(Bound::Included(&2));
// The actual cursor: [1 , 2 , 3 , ..]
// ^
assert_eq!(cursor.peek_prev(), Some((&1, &"a")));
assert_eq!(cursor.peek_next(), Some((&2, &"b")));
// The range of Excluded(&2) to Unbounded: [.. , 2 , ..]
// (2, +∞) -----
let cursor = a.lower_bound(Bound::Excluded(&2));
// The actual cursor: [1 , 2 , 3 , ..]
// ^
assert_eq!(cursor.peek_prev(), Some((&2, &"b")));
assert_eq!(cursor.peek_next(), Some((&3, &"c")));
// Conclusion: for lower_bound, the next node of the cursor is within the range.
// The range of Unbounded to Included(&3): [.. , 3 , ..]
// (-∞, 3] ---------
let cursor = a.upper_bound(Bound::Included(&3));
// The actual cursor: [.. , 2 , 3 , 4]
// ^
assert_eq!(cursor.peek_prev(), Some((&3, &"c")));
assert_eq!(cursor.peek_next(), Some((&4, &"d")));
// The range of Unbounded to Excluded(&3): [.. , 3 , ..]
// (-∞, 3) -----
let cursor = a.upper_bound(Bound::Excluded(&3));
// The actual cursor: [.. , 2 , 3 , 4]
// ^
assert_eq!(cursor.peek_prev(), Some((&2, &"b")));
assert_eq!(cursor.peek_next(), Some((&3, &"c")));
// Conclusion: for upper_bound, the previous node of the cursor is within the range. |
Feel free to submit a PR if you think you can make the documentation better. We can further refine the wording in that PR. |
I had a go at improving some of the documentation in #120936 |
improve `btree_cursors` functions documentation As suggested by `@Amanieu` (and others) in rust-lang#107540 (rust-lang#107540 (comment)) Improvements: - Document exact behavior of `{upper/lower}_bound{,_mut}` with each of the three `Bound` types using unambigous words `{greatest,greater,smallest,smaller,before,after}`. - Added another doc-example for the `Bound::Unbounded` for each of the methods - Changed doc-example to use From<[T; N]> rather than lots of `insert()`s which requires a mutable map which clutters the example when `mut` may not be required for the method (such as for `{upper,lower}_bound`. - Removed `# Panics` section from `insert_{before,after}` methods since they were changed to return an error instead a while ago. - Reworded some phrases to be more consistent with the more regular `BTreeMap` methods such as calling entries "key-value" rather than "element"s.
improve `btree_cursors` functions documentation As suggested by ``@Amanieu`` (and others) in rust-lang#107540 (rust-lang#107540 (comment)) Improvements: - Document exact behavior of `{upper/lower}_bound{,_mut}` with each of the three `Bound` types using unambigous words `{greatest,greater,smallest,smaller,before,after}`. - Added another doc-example for the `Bound::Unbounded` for each of the methods - Changed doc-example to use From<[T; N]> rather than lots of `insert()`s which requires a mutable map which clutters the example when `mut` may not be required for the method (such as for `{upper,lower}_bound`. - Removed `# Panics` section from `insert_{before,after}` methods since they were changed to return an error instead a while ago. - Reworded some phrases to be more consistent with the more regular `BTreeMap` methods such as calling entries "key-value" rather than "element"s.
Rollup merge of rust-lang#120936 - ripytide:master, r=Amanieu improve `btree_cursors` functions documentation As suggested by ``@Amanieu`` (and others) in rust-lang#107540 (rust-lang#107540 (comment)) Improvements: - Document exact behavior of `{upper/lower}_bound{,_mut}` with each of the three `Bound` types using unambigous words `{greatest,greater,smallest,smaller,before,after}`. - Added another doc-example for the `Bound::Unbounded` for each of the methods - Changed doc-example to use From<[T; N]> rather than lots of `insert()`s which requires a mutable map which clutters the example when `mut` may not be required for the method (such as for `{upper,lower}_bound`. - Removed `# Panics` section from `insert_{before,after}` methods since they were changed to return an error instead a while ago. - Reworded some phrases to be more consistent with the more regular `BTreeMap` methods such as calling entries "key-value" rather than "element"s.
improve `btree_cursors` functions documentation As suggested by ``@Amanieu`` (and others) in #107540 (rust-lang/rust#107540 (comment)) Improvements: - Document exact behavior of `{upper/lower}_bound{,_mut}` with each of the three `Bound` types using unambigous words `{greatest,greater,smallest,smaller,before,after}`. - Added another doc-example for the `Bound::Unbounded` for each of the methods - Changed doc-example to use From<[T; N]> rather than lots of `insert()`s which requires a mutable map which clutters the example when `mut` may not be required for the method (such as for `{upper,lower}_bound`. - Removed `# Panics` section from `insert_{before,after}` methods since they were changed to return an error instead a while ago. - Reworded some phrases to be more consistent with the more regular `BTreeMap` methods such as calling entries "key-value" rather than "element"s.
improve `btree_cursors` functions documentation As suggested by ``@Amanieu`` (and others) in #107540 (rust-lang/rust#107540 (comment)) Improvements: - Document exact behavior of `{upper/lower}_bound{,_mut}` with each of the three `Bound` types using unambigous words `{greatest,greater,smallest,smaller,before,after}`. - Added another doc-example for the `Bound::Unbounded` for each of the methods - Changed doc-example to use From<[T; N]> rather than lots of `insert()`s which requires a mutable map which clutters the example when `mut` may not be required for the method (such as for `{upper,lower}_bound`. - Removed `# Panics` section from `insert_{before,after}` methods since they were changed to return an error instead a while ago. - Reworded some phrases to be more consistent with the more regular `BTreeMap` methods such as calling entries "key-value" rather than "element"s.
… r=tgross35 Add missing periods on `BTreeMap` cursor `peek_next` docs Tracking issue: rust-lang#107540
Sorry, I did not read the entire discussion. Could anyone, please, briefly summarize why the decision was made to enable key mutation via a separate |
… r=tgross35 Add missing periods on `BTreeMap` cursor `peek_next` docs Tracking issue: rust-lang#107540
Rollup merge of rust-lang#128310 - kmicklas:btree-map-peek-next-docs, r=tgross35 Add missing periods on `BTreeMap` cursor `peek_next` docs Tracking issue: rust-lang#107540
Implement cursors for `BTreeSet` Tracking issue: rust-lang#107540 This is a straightforward wrapping of the map API, except that map's `CursorMut` does not make sense, because there is no value to mutate. Hence, map's `CursorMutKey` is wrapped here as just `CursorMut`, since it's unambiguous for sets and we don't normally speak of "keys". On the other hand, I can see some potential for confusion with `CursorMut` meaning different things in each module. I'm happy to take suggestions to improve that. r? `@Amanieu`
Implement cursors for `BTreeSet` Tracking issue: rust-lang#107540 This is a straightforward wrapping of the map API, except that map's `CursorMut` does not make sense, because there is no value to mutate. Hence, map's `CursorMutKey` is wrapped here as just `CursorMut`, since it's unambiguous for sets and we don't normally speak of "keys". On the other hand, I can see some potential for confusion with `CursorMut` meaning different things in each module. I'm happy to take suggestions to improve that. r? ``@Amanieu``
Implement cursors for `BTreeSet` Tracking issue: rust-lang#107540 This is a straightforward wrapping of the map API, except that map's `CursorMut` does not make sense, because there is no value to mutate. Hence, map's `CursorMutKey` is wrapped here as just `CursorMut`, since it's unambiguous for sets and we don't normally speak of "keys". On the other hand, I can see some potential for confusion with `CursorMut` meaning different things in each module. I'm happy to take suggestions to improve that. r? ```@Amanieu```
Implement cursors for `BTreeSet` Tracking issue: rust-lang#107540 This is a straightforward wrapping of the map API, except that map's `CursorMut` does not make sense, because there is no value to mutate. Hence, map's `CursorMutKey` is wrapped here as just `CursorMut`, since it's unambiguous for sets and we don't normally speak of "keys". On the other hand, I can see some potential for confusion with `CursorMut` meaning different things in each module. I'm happy to take suggestions to improve that. r? ````@Amanieu````
Rollup merge of rust-lang#128309 - kmicklas:btreeset-cursor, r=Amanieu Implement cursors for `BTreeSet` Tracking issue: rust-lang#107540 This is a straightforward wrapping of the map API, except that map's `CursorMut` does not make sense, because there is no value to mutate. Hence, map's `CursorMutKey` is wrapped here as just `CursorMut`, since it's unambiguous for sets and we don't normally speak of "keys". On the other hand, I can see some potential for confusion with `CursorMut` meaning different things in each module. I'm happy to take suggestions to improve that. r? ````@Amanieu````
@GrigorenkoPV according to the 8ee9693 commit message:
Maybe there is more discussion somewhere? I think one should explore a |
Could it be possible to add functions to For example: impl<'a, K, V> CursorMut<'a, K, V> {
fn peek_next(&mut self) -> Option<(&K, &mut V)>;
fn peek_prev(&mut self) -> Option<(&K, &mut V)>;
// TODO: a less silly name
fn peek_next_lifetime(&mut self) -> Option<(&'a K, &'a mut V)>;
fn peek_prev_lifetime(&mut self) -> Option<(&'a K, &'a mut V)>;
} My use case is to have an interface that allows to add an object to the TreeMap if the key is vacant or to leave it unmodified if it already exists and then return a mutable reference to the object. This is almost doable with the current As a reference, this is a simplified version of my code: pub fn add_symbol<'a>(
map: &'a mut BTreeMap<u32, SymbolMetadata>,
vram: u32,
) -> &'a mut SymbolMetadata {
let mut cursor = map.upper_bound_mut(Bound::Included(&vram));
let must_insert_new = if let Some((sym_vram, sym)) = cursor.peek_prev() {
if vram == *sym_vram {
// Symbol already exists
false
} else {
// Create new symbol if new `vram` is outside the range of the last symbol that's lower than `vram`
vram >= *sym_vram + sym.size()
}
} else {
// Create new if it doesn't exist
true
};
if must_insert_new {
cursor
.insert_before(vram, SymbolMetadata::new(vram))
.expect("This should not be able to panic");
}
// Hack to workaround lifetimes issues
let ptr = cursor.peek_prev().unwrap().1 as *mut SymbolMetadata;
unsafe { &mut *ptr }
} |
The proposed signature (with fn into_prev_and_next(self) -> (Option<…>, Option<…>) |
Yeah, you are right, I didn't realize Rust would allow multiple Your impl<'a, K, V> CursorMut<'a, K, V> {
fn into_prev_and_next(self) -> (Option<(&'a K, &'a mut V)>, Option<(&'a K, &'a mut V)>);
} This is an implementation I made to test the idea, and seems to work fine. fn into_prev_and_next<'a, K, V>(mut cursor: btree_map::CursorMut<'a, K, V>) -> (Option<(&'a K, &'a mut V)>, Option<(&'a K, &'a mut V)>) {
let prev: Option<(&'a K, &'a mut V)> = cursor.peek_prev().map(|(k, v)| {
let ptr_k = k as *const K;
let ptr_v = v as *mut V;
unsafe {
(&*ptr_k, &mut *ptr_v)
}
});
let next: Option<(&'a K, &'a mut V)> = cursor.peek_next().map(|(k, v)| {
let ptr_k = k as *const K;
let ptr_v = v as *mut V;
unsafe {
(&*ptr_k, &mut *ptr_v)
}
});
(prev, next)
} I suppose |
Yes, that’s the signature I meant. The implementation sketch makes me nervous because it looks like the sort of code that can run afoul of aliasing models (stacked borrows, tree borrows, or future alternatives - try testing it under Miri). But I’m sure that’s just an artifact of implementing it in terms of the existing methods, std can definitely implement it without such problems. |
Feature gate:
#![feature(btree_cursors)]
ACP: rust-lang/libs-team#141
This is a tracking issue for the
Cursor
andCursorMut
types forBTreeMap
.A Cursor is like an iterator, except that it can freely seek back-and-forth, and can safely mutate the tree during iteration. This is because the lifetime of its yielded references is tied to its own lifetime, instead of just the underlying tree. A cursor either points to an element in the tree, or to a "ghost" non-element that is logically located after the last element and before the first element.
Public API
Steps / History
Unresolved Questions
Footnotes
https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html ↩
The text was updated successfully, but these errors were encountered: