Tracking issue for HashMap::raw_entry #56167

sfackler · 2018-11-22T16:51:47Z

Added in #54043.

As of 6ecad33 / 2019-01-09, this feature covers:

impl<K, V, S> HashMap<K, V, S>
    where K: Eq + Hash,
          S: BuildHasher
{
    pub fn raw_entry(&self) -> RawEntryBuilder<K, V, S> {…}
    pub fn raw_entry_mut(&mut self) -> RawEntryBuilderMut<K, V, S> {…}
}

pub struct RawEntryBuilder<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return Option<(&'a K, &'a V)>
pub struct RawEntryBuilderMut<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return RawEntryMut<'a, K, V, S>
pub enum RawEntryMut<'a, K: 'a, V: 'a, S: 'a> {
    Occupied(RawOccupiedEntryMut<'a, K, V>),
    Vacant(RawVacantEntryMut<'a, K, V, S>),
}
pub struct RawOccupiedEntryMut<'a, K: 'a, V: 'a> {…}
pub struct RawVacantEntryMut<'a, K: 'a, V: 'a, S: 'a> {…}

… as well as Debug impls for each 5 new types, and their inherent methods.

The text was updated successfully, but these errors were encountered:

Amanieu · 2018-11-26T16:49:12Z

What is the motivation for having separate from_hash and search_bucket methods? It seems that the only difference is whether the hash value is checked before calling is_match. However if the table does not store full hashes (i.e. hashbrown) then there is no difference between these methods.

Could we consider merging these methods into a single one? Or is there some use case where the difference in behavior is useful?

Gankra · 2018-11-27T03:08:55Z

I am also extremely confused by this distinction, as my original designs didn't include them (I think?) and the documentation that was written is very unclear.

Amanieu · 2018-11-27T21:26:59Z

cc @fintelia

fintelia · 2018-11-27T23:10:33Z

The reason I added search_bucket was because I wanted to be able to delete a random element from a HashMap in O(1) time, without storing an extra copy of all the keys. Basically, instead of doing something like this:

let key = map.iter().nth(rand() % map.len()).0.clone();
map.remove(&key);

I wanted to just be able to pick a random "bucket" and then get an entry/raw entry to the first element in it if any:

loop {
    if let Occupied(o) = map.raw_entry_mut().search_bucket(rand(), || true) {
        o.remove();
        break;
    }
}

(the probabilities aren't uniform in the second version, but close enough for my purposes)

Gankra · 2018-11-28T02:26:24Z

I continue to not want to support the "random deletion" usecase in std's HashMap. You really, really, really, should be using a linked hashmap or otherwise ordered map for that.

Amanieu · 2018-12-09T00:10:34Z

I have removed this method in the hashbrown PR (#56241). Your code snippet for random deletion won't work with hashbrown anyways since it always checks the hash as part of the search process.

It doesn't work in hashbrown anyways (see rust-lang#56167)

gdzx · 2019-03-01T16:51:36Z

I can avoid unnecessary clones inherent to the original entry API which is nice. But unless I'm mistaken, the current raw_entry API seems to hash the keys twice in this simple use case:

#![feature(hash_raw_entry)]

use std::collections::HashMap;

let mut map = HashMap::new();

map.raw_entry_mut()
   .from_key("poneyland")
   .or_insert("poneyland", 3);

Currently I use the following function to hash once and automatically provide an owned key if necessary (somewhat similar to what was discussed in rust-lang/rfcs#1769):

use std::borrow::Borrow;
use std::collections::hash_map::RawEntryMut;
use std::hash::{BuildHasher, Hash, Hasher};

fn get_mut_or_insert_with<'a, K, V, Q, F>(
    map: &'a mut HashMap<K, V>,
    key: &Q,
    default: F,
) -> &'a mut V
where
    K: Eq + Hash + Borrow<Q>,
    Q: Eq + Hash + ToOwned<Owned = K>,
    F: FnOnce() -> V,
{
    let mut hasher = map.hasher().build_hasher();
    key.hash(&mut hasher);
    let hash = hasher.finish();

    match map.raw_entry_mut().from_key_hashed_nocheck(hash, key) {
        RawEntryMut::Occupied(entry) => entry.into_mut(),
        RawEntryMut::Vacant(entry) => {
            entry
                .insert_hashed_nocheck(hash, key.to_owned(), default())
                .1
        }
    }
}

Given k1 and k2 with the same type K such that hash(k1) != hash(k2), is there a use-case for calling RawEntryBuilderMut::from_key_hashed_nocheck with hash(k1), &k1 and then inserting with RawVacantEntry::or_insert using k2 ?

If there isn't, why not saving the hash in RawVacantEntryMut and using it inside RawVacantEntryMut::insert ? It would even be possible to assert in debug builds that the owned key has indeed the same hash as the borrowed key used to lookup the entry.

timvermeulen · 2019-04-13T20:20:56Z

I'm not yet very familiar with this API, but what @gdouezangrard suggested seems like a great idea to me. What even happens currently if the two hashes don't match, is the key-value pair then inserted into the wrong bucket? It's not clear to me from (quickly) reading the source code.

sujayakar · 2019-04-26T17:34:02Z

I submitted rust-lang/hashbrown#54 to support using a K that doesn't implement Hash via the raw entry API. See rust-lang/hashbrown#44 for the original motivation. Now that hashbrown is merged into std, could we expose this functionality on the std::collections::hash_map types as well?

If so, I'd be happy to submit a PR!

thomcc · 2020-04-11T20:53:32Z

This is a really great API, it's also what keeps crates (hashlink for example) using hashbrown instead of the stdlib hash map -- since hashbrown exposes this.

What could be next steps here towards stabilization?

sanbox-irl · 2020-11-07T21:15:20Z

Just gonna add another ping here -- what's blocking this right now?

Amanieu · 2020-11-12T17:21:48Z

I see a few things that need to be resolved:

We need proper documentation for the API functions, including examples.
hashbrown has a few extensions to RawEntry, such as https://docs.rs/hashbrown/0.9.1/hashbrown/hash_map/enum.RawEntryMut.html#method.and_replace_entry_with. We should decide whether to port these to std as well.
The name feels a bit wierd since it isn't clear what "raw" means. But I don't have any better ideas.

I would recommend prototyping in the hashbrown crate first, which can then be ported back in the the std HashMap.

KamilaBorowska · 2021-02-04T14:21:24Z

I find raw_entry and raw_entry_mut methods unnecessary - unlike entry method, they don't take any parameters, they just provide access to methods that could as well be in HashMap itself. I think I would consider getting rid of those and putting raw entry APIs directly in HashMap. .raw_entry().from_key(...) is also unnecessary, unless I'm missing something it's identical to already stabilized .get_key_value(...).

I also would like to point out that RawVacantEntryMut doesn't really do much other than providing an API that allows insertion which provides a reference to inserted key and value. This structure doesn't store anything other than a mutable reference to a hash map. This particular API can be used to create unrelated keys, like in this example.

#![feature(hash_raw_entry)]

use std::collections::HashMap;

fn main() {
    let mut map = HashMap::new();
    map.raw_entry_mut().from_key(&42).or_insert(1, 2);
    println!("{}", map[&1]);
}

This is a bit like calling insert after determining an entry is vacant. I think raw_entry_mut APIs could return Options just like raw_entry APIs.

#![feature(hash_raw_entry)]

use std::collections::hash_map::{HashMap, RawEntryMut};

fn main() {
    let mut map = HashMap::new();
    if let RawEntryMut::Vacant(_) = map.raw_entry_mut().from_key(&42) {
        map.insert(1, 2);
    }
    println!("{}", map[&1]);
}

I think raw entry API is useful, but I don't think its API should be conflated with entry API.

tkaitchuck · 2021-03-28T23:28:50Z

As discussed here: rust-lang/hashbrown#232
Allowing the user to specify the hashed value with the contract that it is generated in the same way that the map computes that hash has two drawbacks:

It locks in the implementation of the Hashmap to never changing how that code is invoked. In particular this prohibits hashmap from ever using specialization. This is leaving significant performance gains on the table for types with fixed lengths and short strings. (This makes raw_entry a non-zero-cost-abstraction because the cost is incurred even if the feature is not used.)
It creates an opportunity for a bugs in applications that accidently do something different. If for example an application takes advantage of this to create a memoized hash or similar, and their calculation is different in some cases the results will be unexpected and lack a clear error message.

If the feature of a user specified hash is needed, it may be useful to instead provide a method on the raw entry to hash a key. That way the hashmap can implement this however it sees fit and the application code is less error prone because there is an unambiguous way to obtain the hash value if it is not known in advance.

mokhaled2992 · 2022-07-14T21:08:53Z

FYI

https://internals.rust-lang.org/t/interface-improvements-for-btreemap/16957

https://internals.rust-lang.org/t/comparator-for-btree-map-set-s/17011

mqudsi · 2022-11-03T16:16:01Z

For anyone reading this RFC while exploring a successor/better alternative to hash_raw_entry, please keep in mind a very common pattern that wasn't cleanly covered by the prototype raw entry api: cleanly bubbling back whether or not a key was entered into the map, should that information be needed down the line.

The (un)released .raw_entry[_mut]().or_insert_with(|| ...) method returned a tuple of (&K, &V) but I think returning (&K, &V, bool) would have reduced boilerplate code without incurring too much of a burden (since one of &K or &V in the tuple is oftentimes already masked because only the other is required), in lieu of the following:

let mut new_pool: bool = false;
let (_, old_status) = cached_status.raw_entry_mut()
    .from_key(pool)
    .or_insert_with(|| { new_pool = true; (pool.to_owned(), *new_status) });

if new_pool || old_status != new_status {
    eprintln!("{}: {:?}", pool, status);
    *old_status = *status;
}

which could have just become

let (_, old_status, new_pool) = cached_status.raw_entry_mut()
    .from_key(pool)
    .or_insert_with(|| (pool.to_owned(), *new_status));

if new_pool || old_status != new_status {
    eprintln!("{}: {:?}", pool, status);
    *old_status = *status;
}

The "regular" HashMap::insert() returned an option by means of which it was possible to deduce if the key already existed, while the unstablized HashMap::try_insert() returns a Result in an error state if the key already existed. (The "regular" non-raw entry api suffers from a similar flaw.)

zopsicle · 2023-06-10T14:05:00Z

Is there a reason for the HashMap::raw_entry and HashMap::raw_entry_mut methods to require S: BuildHasher? Perhaps this bound can be moved to RawEntryBuilder::from_key and RawEntryBuilderMut::from_key. This would make it possible to use a hash map without an intrinsic hasher builder (i.e. S = ()), accessing entries purely through the raw entry API.

vlovich · 2023-11-18T01:38:11Z

Is there a reason to overload this into std::collections::HashMap? If instead we had a std::collections::RawHashMap that has a dedicated API, does that simplify the need to merge cleanly in between existing non-raw methods?

I think that could get rid of the (at least to me) confusingly named raw_entry_mut and raw_entry which don't actually return entries like you might expect with sibling methods named entry but instead return views to access entries in a raw way (raw_entries / raw_entries_mut might be better names if bikeshedding). I'm wondering if instead you have a RawHashMap that has the methods of raw_entry/raw_entry_mut exposed as normal top-level methods. That way RawHashMap would be a clean break in that it wouldn't even need an S as the API contract there is that the hash is always managed externally (i.e. the HashMap interface can remain unchanged).

In fact, I don't think you'd even need a RawHashMap<K, V> - it could just be RawHashTable<V> or RawHashSet<V> where the key is the hash and V either implements PartialEq OR there's a function to determine whether two V instances are equivalent for a colliding hash. The std HashMap could then just be a wrapper on top of RawHashTable<(K, V)> (although the tuple is actually probably a named struct that implements PartialEq evaluating just K). RawHashTable could also then be used for both HashMap and HashSet which is nice in terms of reducing maintenance (i.e. don't have to reimplement the "raw" APIs for HashSet which is currently missing from nightly).

Thoughts? I'm curious who's driving this / who I'd need to be coordinate with if there's interest in this alternative path.

SkiFire13 · 2023-11-18T09:25:24Z

Is there a reason to overload this into std::collections::HashMap?

Many people and APIs already use HashMap/HashSet, so for this to be useful it needs to be available on HashMap/HashSet. Otherwise people could just switch to a more flexible solution, i.e. hashbrown::HashMap/hashbrown::HashTable.

but instead return views to access entries in a raw way (raw_entries / raw_entries_mut might be better names if bikeshedding)

+1 on this. To be more intuitive I would also rename the from_ methods to find_by_ since we're calling them on a RawEntries, not a single RawEntryBuilder.

I'm wondering if instead you have a RawHashMap

IMO this would be useful only if it was accessible from an instance of HashMap, but at that point it seems another renaming of raw_entry rather than an entirely separate type.

where the key is the hash and V either implements PartialEq OR there's a function to determine whether two V instances are equivalent for a colliding hash.

Definitely need a way to use an external function since many usecases rely on that.

The std HashMap could then just be a wrapper on top of RawHashTable<(K, V)>

Currently HashMap is a wrapper on top of hashbrown::HashMap and forwards most calls to it. Making it a wrapper over another wrapper of hashbrown::HashTable would likely increase maintenance costs since all the HashMap and HashSet methods would now have to be reimplemented on top of it.

vlovich · 2023-11-18T23:13:32Z

What I'm suggesting would be that HashMap and HashSet would just expose a raw/raw_mut to return the underlying RawTable. As is, anyone interested in using the raw interface still has to play this dance pretending like there's a hasher which makes things confusing to maintain (in case someone in your codebase doesnt realize your only using the raw interface and decides to store keys directly).

Doing a lift and shift doesn't honestly seem like that much work and I don't see why long term maintenance work would increase with this approach.

Amanieu · 2024-03-20T11:59:21Z

I would like to deprecate and remove raw_entry in favor of the low-level HashTable API that has been added to hashbrown. I am not suggesting adding HashTable to the standard library: people who need it should just use it directly from hashbrown.

HashTable provides a cleaner approach since it is an entirely separate type and doesn't need to deal with the restrictions of HashMap such as the need for separate key & value types and the need for an explicit hasher.

Note that HashTable is not convertible to/from HashMap (via an as_raw method). However I don't believe there are any use cases where this matters and which could not just use HashTable directly instead.

@rfcbot fcp close

rfcbot · 2024-03-20T11:59:23Z

Team member @Amanieu has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

vlovich · 2024-03-20T15:04:37Z

Doesn't this decision mean binary bloat because std::hashmap will exist because some dependency somewhere uses it and a duplicate implementation would be pulled in through hashbrown for those that need HashTable? Is there a reason not to standardize HashTable?

EDIT: To be clear, I'm not against this, but I'd like to understand if there's an interest in standardizing HashTable

sanbox-irl · 2024-03-20T16:20:31Z

I haven't been following this that closely, so for the benefit of those not paying a lot of attention, let's say that a user wants to do the raw_entry thing with an expensive key that they only want to construct if the map doesn't have it (ie, https://stackoverflow.com/questions/51542024/how-do-i-use-the-entry-api-with-an-expensive-key-that-is-only-constructed-if-the).

Is the answer of using HashTable just....don't use HashMap and use HashTable if you want to do that? Just to be clear, I haven't been following very much, so I appreciate anyone's help walking me through what the solutiong that amanieu is proposing is.

SkiFire13 · 2024-03-20T17:09:24Z

Is the answer of using HashTable just....don't use HashMap and use HashTable if you want to do that?

hashbrown's answer to that is the entry_ref method, which works like entry except it constructs the key only when calling insert with a vacant entry. For now I don't see proposals to add a similar API to the stdlib though.

HashTable would instead be the solution for those that need an extremely flexible API that lets them manually supply the hash. In raw_entry terms this would be the equivalent of RawEntryBuilder{Mut}::from_hash and RawEntryBuilder{Mut}::from_key_hashed_nocheck.

rfcbot · 2024-08-06T16:24:36Z

🔔 This is now entering its final comment period, as per the review above. 🔔

joshtriplett · 2024-08-06T16:24:58Z

Checking my box here, and hoping that HashTable and some portion of RawTable (enough for dashmap for instance) can be stabilized in the future.

rfcbot · 2024-08-16T16:28:20Z

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

sfackler added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Nov 22, 2018

sfackler mentioned this issue Nov 22, 2018

Hashmap raw_entry crash #56158

Closed

Amanieu mentioned this issue Nov 27, 2018

Support for the raw_entry API rust-lang/hashbrown#23

Closed

Amanieu added a commit to Amanieu/rust that referenced this issue Dec 11, 2018

Remove search_bucket from raw_entry

44a9ce8

It doesn't work in hashbrown anyways (see rust-lang#56167)

jonas-schievink added the A-collections Area: `std::collection` label Jun 9, 2019

boxdot mentioned this issue Aug 19, 2019

Improve string lookup: avoid allocations boxdot/osmflat-rs#37

Merged

kmeisthax mentioned this issue Jul 18, 2020

AVM2 interpreter ruffle-rs/ruffle#404

Merged

17 tasks

KodrAus added I-nominated Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020

dekellum mentioned this issue Nov 12, 2020

Add HashMap.entry_or_clone() method rust-lang/rfcs#1203

Open

sanbox-irl mentioned this issue Nov 19, 2020

Working on RawEntry for stabilization in the std rust-lang/hashbrown#212

Open

3 tasks

m-ou-se removed the I-nominated label Dec 8, 2020

cuviper mentioned this issue Dec 16, 2020

Expose RawEntry API indexmap-rs/indexmap#166

Closed

marioortizmanero mentioned this issue Jul 26, 2022

RHashMap::raw_entry[_mut] support rodrimati1992/abi_stable_crates#83

Open

6 tasks

allada mentioned this issue Jul 16, 2023

Implement the WaitExecution TraceMachina/nativelink#177

Merged

cuviper mentioned this issue Dec 29, 2023

Entry API equivalent for Sets rust-lang/rfcs#1490

Closed

kyren mentioned this issue Jan 8, 2024

Convert to hashbrown::HashTable kyren/hashlink#21

Merged

cuviper mentioned this issue Jan 24, 2024

Add an opt-in trait for "unstable" raw entries indexmap-rs/indexmap#300

Merged

kennytm mentioned this issue Feb 7, 2024

Improving Entry API to get the keys back when they are unused rust-lang/rfcs#690

Open

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. labels Mar 20, 2024

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 6, 2024

GoldsteinE mentioned this issue Aug 20, 2024

#[may_dangle], a refined dropck escape hatch (tracking issue for RFC 1327) #34761

Open

7 tasks

apiraino removed the to-announce Announce this issue on triage meeting label Aug 22, 2024

SkiFire13 mentioned this issue Sep 10, 2024

bevy_reflect: Replace HashTable with HashMap bevyengine/bevy#15149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue for HashMap::raw_entry #56167

Tracking issue for HashMap::raw_entry #56167

sfackler commented Nov 22, 2018 •

edited by SimonSapin

Loading

Amanieu commented Nov 26, 2018

Gankra commented Nov 27, 2018

Amanieu commented Nov 27, 2018

fintelia commented Nov 27, 2018 •

edited

Loading

Gankra commented Nov 28, 2018

Amanieu commented Dec 9, 2018

gdzx commented Mar 1, 2019

timvermeulen commented Apr 13, 2019

sujayakar commented Apr 26, 2019

thomcc commented Apr 11, 2020

sanbox-irl commented Nov 7, 2020

Amanieu commented Nov 12, 2020

KamilaBorowska commented Feb 4, 2021 •

edited

Loading

tkaitchuck commented Mar 28, 2021 •

edited

Loading

mokhaled2992 commented Jul 14, 2022

mqudsi commented Nov 3, 2022 •

edited

Loading

zopsicle commented Jun 10, 2023 •

edited

Loading

vlovich commented Nov 18, 2023

SkiFire13 commented Nov 18, 2023

vlovich commented Nov 18, 2023 •

edited

Loading

Amanieu commented Mar 20, 2024

rfcbot commented Mar 20, 2024 •

edited by joshtriplett

Loading

vlovich commented Mar 20, 2024 •

edited

Loading

sanbox-irl commented Mar 20, 2024

SkiFire13 commented Mar 20, 2024

rfcbot commented Aug 6, 2024

joshtriplett commented Aug 6, 2024

rfcbot commented Aug 16, 2024

Tracking issue for HashMap::raw_entry #56167

Tracking issue for HashMap::raw_entry #56167

Comments

sfackler commented Nov 22, 2018 • edited by SimonSapin Loading

Amanieu commented Nov 26, 2018

Gankra commented Nov 27, 2018

Amanieu commented Nov 27, 2018

fintelia commented Nov 27, 2018 • edited Loading

Gankra commented Nov 28, 2018

Amanieu commented Dec 9, 2018

gdzx commented Mar 1, 2019

timvermeulen commented Apr 13, 2019

sujayakar commented Apr 26, 2019

thomcc commented Apr 11, 2020

sanbox-irl commented Nov 7, 2020

Amanieu commented Nov 12, 2020

KamilaBorowska commented Feb 4, 2021 • edited Loading

tkaitchuck commented Mar 28, 2021 • edited Loading

mokhaled2992 commented Jul 14, 2022

mqudsi commented Nov 3, 2022 • edited Loading

zopsicle commented Jun 10, 2023 • edited Loading

vlovich commented Nov 18, 2023

SkiFire13 commented Nov 18, 2023

vlovich commented Nov 18, 2023 • edited Loading

Amanieu commented Mar 20, 2024

rfcbot commented Mar 20, 2024 • edited by joshtriplett Loading

vlovich commented Mar 20, 2024 • edited Loading

sanbox-irl commented Mar 20, 2024

SkiFire13 commented Mar 20, 2024

rfcbot commented Aug 6, 2024

joshtriplett commented Aug 6, 2024

rfcbot commented Aug 16, 2024

sfackler commented Nov 22, 2018 •

edited by SimonSapin

Loading

fintelia commented Nov 27, 2018 •

edited

Loading

KamilaBorowska commented Feb 4, 2021 •

edited

Loading

tkaitchuck commented Mar 28, 2021 •

edited

Loading

mqudsi commented Nov 3, 2022 •

edited

Loading

zopsicle commented Jun 10, 2023 •

edited

Loading

vlovich commented Nov 18, 2023 •

edited

Loading

rfcbot commented Mar 20, 2024 •

edited by joshtriplett

Loading

vlovich commented Mar 20, 2024 •

edited

Loading