-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BTreeMap Raw Entry API #73
Comments
I am willing to mentor implementation work on this API. Matching the |
@CAD97 let me see if I got your suggestion right. The single key comparator that you propose will be used internally to implicitly compare current items of the map and find the first element that does not return I also support trying to have the APIs as homogeneous as possible. In fact it's one thing I like about |
Yep, exactly. The only difference is |
Sounds good to me actually and less verbose. And you are probably right about avoiding arguing over how would |
Given that the HashMap raw_entry API is not quite something we're sure we want to stabilize (IIUC), I'm not sure mirroring it for BTreeMap is the right approach. CC @Amanieu |
It's actually a shame that this API would be removed from the |
The ability/desire to support cursor operations on the entry API for BTreeMap give at least a reason to have the API on BTreeMap even if it's not available on HashMap. But that also said, I think I have a vague idea of how The two-step builder process isn't necessary; we can just have the (But it would be nice if |
@mokhaled2992 is this still something you're interested in pushing forward? If not, is this something I can help move? Getting some form of external comparators is extremely important for my use case (a SQL-esque database, where for example string keys in an index can be compared either case-insensitively or case-sensitively depending on the index, but storing case-sensitivity on every single string key is unacceptably high overhead). |
@glittershark I'm still interested in that, I think having the support for custom on the fly comparison functions enables alot of optimizations as I explained in detail in the main issue text and without the need to carry too much unnecessary information in the key like weak pointers for instance. On a parallel dimension, in my day to day work we use the corresponding Hasbrown hashmap almost always instead of the standard hashtable because it enables custom hashing/equality on the fly. It would be great to see that added one day finally to the standrad hashmap and of course to the sorted map which is what this detailed issue about. |
That's great to hear! To be clear: I believe this is likely to be something I will be able to work on full-time for at least a couple of weeks if not more. Is this in a state at the moment where me throwing my time at it is likely to help get it landed, even if it's only as a nightly unstable API? I guess that's a question for @CAD97 as well. |
I provided all what I believe is necessary but I don't have enough background about the internals to be able to support with implementation or any kind of details about implementation. The PRs of the corresponding hashmap raw entry API might be helpful to look at as a start. |
I brought this up in the Zulip to ask about if we can try to land this to nightly: https://rust-lang.zulipchat.com/#narrow/stream/327149-t-libs-api.2Fapi-changes/topic/BTreeMap.20raw.20entry I'm formally volunteering to both mentor and review an unstable implementation of a raw entry API for To summarize per my understanding: The proposed API is a "raw entry" API for The API surface which I wish to second is the following (though of course open to bikeshed): // mod std::collections::btree_map
impl<K, V, A> BTreeMap<K, V, A>
where
A: Allocator + Clone,
{
// existing:
pub fn get_key_value<Q>(&self, k: &Q) -> Option<(&K, &V)>
where
K: Borrow<Q> + Ord,
Q: Ord + ?Sized,
;
pub fn entry(&mut self, key: K) -> Entry<'_, K, V, A>
where
K: Ord,
;
// new, feature(btree_raw_entry):
pub fn raw_entry<F>(&self, cmp: F) -> Option<(&K, &V)>
where
F: FnMut(&K) -> Ordering,
;
pub fn raw_entry_mut<F>(&self, cmp: F) -> RawEntryMut<'_, K, V, A>
where
F: FnMut(&K) -> Ordering,
;
}
pub enum RawEntryMut<'a, K, V, A = Global>
where
K: 'a,
V: 'a,
A: Allocator + Clone,
{
Vacant(RawVacantEntryMut<'a, K, V, A>),
Occupied(RawOccupiedEntryMut<'a, K, V, A>),
}
impl<'a, K, V, A> RawEntryMut<'a, K, V, A>
where
A: Allocator + Clone,
{
pub fn or_insert_with<F>(self, default: F) -> (&'a mut K, &'a mut V)
where
F: FnOnce() -> (K, V),
;
pub fn and_modify<F>(self, f: F) -> RawEntry<'a, K, V, A>
where
F: FnOnce(&mut K, &mut V),
;
}
pub struct RawVacantEntryMut<'a, K, V, A = Global>
where
A: Allocator + Clone,
{ /* private fields */ }
impl<'a, K, V, A> RawVacantEntryMut<'a, K, V, A>
where
A: Allocator + Clone,
{
pub fn insert(self, key: K, value: V) -> (&'a mut K, &'a mut V);
}
pub struct RawOccupiedEntryMut<'a, K, V, A = Global>
where
A: Allocator + Clone,
{ /* private fields */ }
impl<'a, K, V, A> RawOccupiedEntryMut<'a, K, V, A>
where
A: Allocator + Clone,
{
pub fn key(&self) -> &K;
pub fn key_mut(&mut self) -> &mut K;
pub fn into_key(self) -> &'a mut K;
pub fn get(&self) -> &V;
pub fn get_mut(&mut self) -> &mut V;
pub fn into_mut(self) -> &'a mut V;
pub fn get_key_value(&self) -> (&K, &V);
pub fn get_key_value_mut(&mut self) -> (&mut K, &mut V);
pub fn into_key_value(self) -> (&'a mut K, &'a mut V);
pub fn insert(&mut self, value: V) -> V;
pub fn insert_key(&mut self, key: K) -> K;
pub fn remove(self) -> V;
pub fn remove_entry(self) -> (K, V);
} Some alternative thoughtsWhile writing this up, I had an interesting idea: it may be useful to return
Additional utility could be provided by giving I've skipped the Observations on the raw entry APIThere's a choice to be made here:
It's my belief, however, that the main benefit of the raw entry API is not the minor optimization benefits, but rather the dependency injection. It comes down to which is more desired: serving more use cases (modal raw entry) or a simpler API (always full injection) which still covers the whole problem domain, just with a bit more manual work for the use cases between normal non-raw Given Rust's std biases towards providing more helpers, I might actually bias towards the modal API if it can be shown to pull its weight. The patterns the dependency injection abilities of the raw entry API unlock imho justify the raw entry API, but the questionable potential performance benefit of the non-dependency-injection modes are much harder to justify. |
I think the raw API ought to be Right now, for example, I can trust that a This is, in my opinion, a completely different issue than the fact that I can't trust |
Do we have an API that could be used to get non-overlapping muts without UB? Or is this to avoid blocking off some future API |
If I'm not mistaken this Safety issue was discussed in the Hashmap version of the API (rust-lang/rust#54043) and the discussion was steering in the direction that Keys inserted in the wrong position in a hashmap is not considered unsafe in the Rust sense of safety and we can extend that to the BTreeMap's API. Honestly I do not mind if it is marked as |
See #141 where I propose an API for |
We are currently moving moving away from The proper solution for comparator support is either:
|
It's also worth mentioning the existence of https://github.com/eggyal/copse which is another external crate port of |
Proposal
Problem statement
We need an API that will allow the users to
get
,insert
: so two lookups.BTreeMap
itself (as inC++
). Additionally those custom comparators would allow the users to store entries in an order that can rely on some external contexts. Currently this is only possible through one of these two unpleasant alternatives:2.1. Making your external context a global mutable and having to deal with the hassle that comes out of that because of threading/synchronization etc....just no
2.2. Making your context part of the key. This requires the context to be some
Rc<RefCell<....>>
and the key to have someWeak
pointer to that. Obviously a big unnecessary overhead.Motivation, use-cases
Imagine implementing a dictionary where the strings are stored contiguously in some vector and a you want to establish some sorted lightweight index on that vector. The index would be a
BTreeMap
ofusize
keys which are an index into the vector. Obviously the sorting order would then have to be different from the natural sorting order of theusize
and it will need to rely on the actual strings vector, we want alphabetical order sorting for our index. By making the raw entry API take a closure we achieve two things: enabling mixed type comparisons, and enabling comparators to consider external contexts during comparisons and only during comparisons. In fact this is even better than theC++
alternative where the comparatorlambda
has to be passed to themap
constructor to be stored as member variable in themap
and keep whatever references it needs throughout the whole lifetime of themap
.Furthermore, the raw entry API would enable the users to efficiently tackle the very common case of "get or insert", in which the caller wants to get a reference to the value if it exists or insert a new value immediately at the right position without having to do the lookup twice; once to check for existence and another to insert.
Below is a basic user code that summarizes the use case that was described above
Solution sketches
The proposed API looks like this
RawEntryMut
can be an enumVacant
/Occupied
where theVacant
variant points to where that key would be inserted, or usingC++
terms:Nevertheless, this is an implementation detail of
Vacant
, the user obviously will just callinsert
to insert an entry in a vacant location.Finally, I'm aware of the cursor APIs discussions. I believe future work can extend the methods of
RawEntryMut
to add things likeinsert_(before|after)
and so on.Links and related work
I gathered some links showing the demand for such a feature
Furthermore,
This use case is fully achievable in
C++
(C++ Use case code snippet) and I demonstrate that below.What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
The text was updated successfully, but these errors were encountered: