Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put HashMap's hasher in an Result and Default::default() it to Err(Default::default) #2551

Open
SoniEx2 opened this issue Sep 23, 2018 · 24 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@SoniEx2
Copy link

SoniEx2 commented Sep 23, 2018

HashMap implements Clone, but cloning HashMap::default() should NOT produce the same hasher state. Instead, there should be a distinction between a default HashMap and an empty/cleared HashMap. Both are empty, but the former also has the hasher set to Err(Default::default).

This is important because one could easily do:

let v: Vec<HashSet<&str>> = vec![];
// some stuff here
if v.len() < my.len() {
    v.resize(my.len(), HashSet::default());
}

and accidentally introduce a DoS exploit to their code. (this would be partially solved by resize_default, but I don't think it's a proper solution.)

@mbrubeck
Copy link
Contributor

@rust-lang/libs This suggestion would help prevent one possible cause of denial-of-service vulnerabilities. Is it worth the cost of adding a Option to HashMap (which I think would increase its size from 5 words to 6)? Or is there an alternate mitigation without that cost?

@scottmcm
Copy link
Member

There's no issue if one writes

v.resize_with(my.len(), HashSet::default);

right?

Is this just in the category with .or_default(foo()) where function calls should lint to use the *_with version? We could presumably have something like an attribute on the parameter to say #[suggest_if_direct_call("resize_with")] or something.

@SoniEx2
Copy link
Author

SoniEx2 commented Sep 29, 2018

I prefer "right by default". It seems dangerous to hide a potential DoS exploit behind an innocent-looking lint that can be easily disabled.

(resize_with isn't stable yet, either)

@burdges
Copy link

burdges commented Sep 29, 2018

We could always replace the hasher when cloning empty HashMaps @mbrubeck so no new Option required. I've no idea if RandomState has reseeding, etc. like rand, but at worst this incurs one syscall per clone.

Is this worse than v.resize(my.len(), ["dog"].iter().collect()) though? We could actually replace the hasher when cloning any HashMap, at the cost of rehashing everything.

I do like @scottmcm answer that lints should encourage explicitness, but actually no answer above handles the perfectly reasonable Cow<'a,HashMap<K,V>> well, so..

We could add a clone mode via an optional const parameter to HashMap or a associated constant in BuildHasher:

pub enum RekeyingMode {
    FastClone,  // Never reseed
    CheapClones,  // Reseed only if empty, probably unnecessary
    SecureClone,  // Reseed even if non-empty
}

I dislike this because although explicit it mostly just shifts blame onto users.

Can we instead simply lint against using Clone for HashMap entirely, even when buried inside Cow or whatever? If you need to clone a HashMap then maybe you should use some wrapper with the desirable behavior.

@scottmcm
Copy link
Member

Looking at this again, I don't think the original proposed change is allowable, regardless of whether we want to, because there's no way to get an S: BuildHasher from None in insert and friends:

impl<K, V, S> HashMap<K, V, S> where K: Eq + Hash, S: BuildHasher {
    pub fn insert(&mut self, k: K, v: V) -> Option<V>;
}

It would additionally need something like S: Default, which is a breaking change and not desirable anyway. Not to mention that Option<S> isn't enough, since HashMap is Send, so

pub fn hasher(&self) -> &S

means it would need instead to be something like Mutex<Option<S>> to lazy-init safely.

@SoniEx2
Copy link
Author

SoniEx2 commented Sep 29, 2018

Add some dependent typing and we can easily solve this. It would only be None in cases where Default was called, or in HashMap::new().

Altho I do wonder if you could have two impls of HashMap, one where it is Default and one where it isn't.

@burdges
Copy link

burdges commented Oct 3, 2018

Instances where this becomes dangerous sound rare, thanks to the strong protections provided by SipHash24. If the user switch to weaker hash functions like say fnv for performance, then clone becomes more risky. If the user justifies their weaker hash functions by keying them separately, then clone actually becomes dangerous.

We should just warn against using HashMap's Clone so I started writing an RFC for that:

Summary

Allow implementations of traits to warn against unexpected behavior or limitations by specifing lints against their own use. Also, provide a warning that cloning a HashMap also clones its BuildHasher.

Motivation

We encounter scenarions where implementing some trait makes sense, and sems desirable in general, but caveats about specific use cases still exist. We should expect such scenarios to encourage abuse of heavy handed techniques like hidden trait implementations (RFC #2529).

As an example, we want HashMap: Clone when possible, but if the HashMap uses a non-cryptographic hasher then this may create a DoS vector, even if no individual clone does so. We could address this shortcoming:

#[derive(Clone)]
pub struct HashMap<K, V, S = RandomState> {
    ...
}

#[caveat("A cloned HashMap can become a DoS vector, provided its Hasher is not cryptographic, because cloned BuildHasher instance can facilitate exposing the Hasher's key.  RandomState has cryptographic key protections, making clone safer.")]
default impl<K: Clone, V: Clone, S: Clone> Clone for HashMap<K, V, S>

#[no_caveat]
impl<K: Clone, V: Clone> Clone for HashMap<K, V, RandomState>

@SoniEx2
Copy link
Author

SoniEx2 commented Oct 7, 2018

Hmm... If we're gonna be using another word for it anyway, we could always go with Result<H, fn() -> H>?

fn new() -> Foo {
    Foo { hasher: Err(<H as Default>::default) }
}

@SoniEx2 SoniEx2 changed the title Put HashMap's hasher in an Option and Default::default() it to None Put HashMap's hasher in an Result and Default::default() it to Default::default Oct 7, 2018
@SoniEx2 SoniEx2 changed the title Put HashMap's hasher in an Result and Default::default() it to Default::default Put HashMap's hasher in an Result and Default::default() it to Err(Default::default) Oct 7, 2018
@theastrallyforged
Copy link

@burdges Bikeshed: add names to #[caveat]s, in case one impl ends up with more than one:

#[caveat("hashmap_clone_dos", "A cloned HashMap can become a DoS vector, [etc]")] default impl<K: Clone, V: Clone, S: Clone> Clone for HashMap<K, V, S> { .. }

#[no_caveat("hashmap_clone_dos")] impl<K: Clone, V: Clone> Clone for HashMap<K, V, RandomState> { .. }

@SoniEx2
Copy link
Author

SoniEx2 commented Oct 7, 2018

This is not about caveats in impls, please keep that separate.

@burdges
Copy link

burdges commented Oct 7, 2018

We already established that your proposal cannot work @SoniEx2 You'd require some BuildBuildHasher machinery.

Afaik, there is no actual danger here so long as you use hashers like SipHasher24 that provide cryptographic security assurances for their key material, so afaik RandomState remains safe.

I'd argue running any adversary controlled data through some weak hasher like fnv inherently creates a DoS vector. If you do that while arguing otherwise based on an ephemeral hasher, then you could create exactly the same problem by cloning a full hashmap, like by using Cow<HashMap<K,V>>.

Afaik, there is nothing special about empty hash maps here and no nice way to exploit them even if so.

@SoniEx2
Copy link
Author

SoniEx2 commented Oct 7, 2018

"We" already established you can't use Option, because there's no function pointer to make the BuildHasher.

I said you can use an Result<S, fn() -> S> instead of an Option and it'd work. https://play.rust-lang.org/?gist=9c724f67bb9379c9f89568bbbfc8bb3c&version=stable&mode=debug&edition=2015

@burdges
Copy link

burdges commented Oct 7, 2018

You're now proposing what I called a "buildbuildhasher" scheme, but..

We've no strong reason to believe this improves security over using the current default of S = RandomState. Also, these schemes cannot cleanly address cloning non-empty HashMaps when S != RandomState either. In principle, these scheme also create vulnerabilities or new DoS vectors in low entropy settings too.

The right answer is to warn anyone using the HashMap: Clone bound with S != RandomState.

@SoniEx2
Copy link
Author

SoniEx2 commented Oct 7, 2018

RandomState doesn't guarantee DoS-proofing. And, in fact, RandomState isn't affected by low-entropy settings! See here.

So, yes, it would improve things. At the very least, it would be no worse than what we currently have.

@burdges
Copy link

burdges commented Oct 8, 2018

Very nice! :) I hadn't know about these attacks outside the siphash threat model, so scheme like you propose actually do help, but only with empty hash maps.

I'm afraid lints against the HashMap: Clone bound remain the only solution that covers all cases, although now we cannot exclude S = RandomState so the lints become more intrusive.

@scottmcm
Copy link
Member

scottmcm commented Oct 8, 2018

you could create exactly the same problem by cloning a full hashmap

Well, what would it take to solve that? We could re-hash everything in the hashmap on clone, right? If S::clone changed the hashing seed and we re-hashed everything, I think it would be fine.

But here's an interesting observation: There's no requirement from HashMap that S::clone return a new BuildHasher that produces Hashers that compute the same hashes as the original one!

BuildHasher specifically says that Hashers from the same instance produce the same hashes, but doesn't say anything about other instances, even Clones. Hash has the "if you're implementing this and Eq" rule on it, but there's nothing that says what Clone is supposed to mean.

So arguably it's a bug today that we're not re-hashing everything on HashMap::clone...

@burdges
Copy link

burdges commented Oct 8, 2018

Yes exactly @scottmcm but..

Right now HashMap: Clone costs only an allocation and a memcpy. If you change the Hasher then you must rehashing everything, which becomes extremely slow in comparison. If you want that, then you should just write old_hash_map.iter().collect::<HashMap<_,_,_>>(). Also, there are scenarios where quickly coping a HashMap makes sense, so another method must provide Clone

I still like your original idea about using lints, but we should lint against the HashMap: Clone bound itself, not just methods like Vec::resize.

I also like lints here because hidden impls #2529 make me nervous. I'd hope lints giving caveats about impls should help prevent over usage of hidden impls.

@scottmcm
Copy link
Member

scottmcm commented Oct 8, 2018

@burdges

Right now HashMap: Clone costs only an allocation and a memcpy.

That seems implausible to me. HashMaps can have Strings in them.

@SoniEx2
Copy link
Author

SoniEx2 commented Oct 8, 2018

Again this isn't about lints please leave those elsewhere.

@burdges
Copy link

burdges commented Oct 8, 2018

@scottmcm Yes, HashMap: Clone actually clones all the entries, whatever that means, but if those entries are simple enough then it should just copy memory, but rehashing a large table could be quite expensive.

@burdges
Copy link

burdges commented Oct 8, 2018

As an aside, BuildHasherDefault is kinda an anti-pattern since people commonly envision Default as constant. And the example even uses #[derive(Default)].

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 15, 2018
@SoniEx2
Copy link
Author

SoniEx2 commented Oct 28, 2018

hmm, is it a logic error for a key's Clone to change its hash?

@Ixrec
Copy link
Contributor

Ixrec commented Oct 28, 2018

From rereading the official documentation for Clone and Hash, no it doesn't seem to be a logic error. So I think HashMap's clone() does need to rerun hash() on every key value just in case the hashes changed.

I think there might even be legitimate use cases for that. Say the key type is some kind of Box or Gc type or whatever that contains a pointer. In that case, hash() simply returning the wrapped pointer would probably be a pretty decent implementation, and that pointer value would obviously change on clone(). The part I'm not sure about is why you'd ever want to use smart pointers as HashMap keys, but I suspect someone has an answer to that.

In order for it to be a logic error, you'd need a special rule just for types implementing both Hash and Clone, like the rule we have for Hash and Eq. I don't think you could add a Clone rule and a Hash rule that naturally combine to form this guarantee, because there's not a lot Clone can guarantee about a cloned value's semantics (for instance, a rule like "a value and its clone must have no observable difference in behavior" would immediately invalidate all smart pointers, and probably tons of other stuff).

@SoniEx2
Copy link
Author

SoniEx2 commented Oct 28, 2018

okay, so it's (IMO) reasonable to make BuildHasher also change on Clone. it'd be a better solution to this security issue and doesn't make HashMap any bigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

7 participants