Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement adaptive hashing using specialization #5

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

pczarn
Copy link

@pczarn pczarn commented Mar 22, 2016

Adaptive hashing provides fast and complexity-safe hashing for hashmaps with simple key types. The user doesn't need to change any code to get speedups from adaptive hashing, in contrast to the use of FnvHasher.

@@ -54,6 +59,8 @@ use table::BucketState::{
Full,
};

pub use adaptive_map::HashMapInterface;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this exists temporarily. As long as specialization doesn't work for inherent impls, users of adaptive hashing must import this trait.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the inherent methods on HashMap could defer to HashMapInterface, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try that, it's a great idea!

@pczarn
Copy link
Author

pczarn commented Mar 22, 2016

Now, some benchmarks. With black_box, in benches directory. This benchmark indicates real differences the most:

Adaptive

Adaptive
find_existing     bench: 13,655 ns/iter (+/- 2,068)
find_nonexisting  bench: 15,241 ns/iter (+/- 411)
get_remove_insert bench:     68 ns/iter (+/- 8)
grow_by_insertion bench:    131 ns/iter (+/- 11)
hashmap_as_queue  bench:     60 ns/iter (+/- 8)
new_drop          bench:      3 ns/iter (+/- 0)
new_insert_drop   bench:     84 ns/iter (+/- 2)

(Outdated!! the 2-4 variant is no longer used) SipHash-2-4

find_existing     bench: 26,208 ns/iter (+/- 916)
find_nonexisting  bench: 24,833 ns/iter (+/- 5,082)
get_remove_insert bench:    118 ns/iter (+/- 0)
grow_by_insertion bench:    159 ns/iter (+/- 5)
hashmap_as_queue  bench:     82 ns/iter (+/- 0)
new_drop          bench:     76 ns/iter (+/- 3)
new_insert_drop   bench:    167 ns/iter (+/- 7)

@@ -317,7 +323,7 @@ fn test_resize_policy() {
/// }
/// ```
#[derive(Clone)]
pub struct HashMap<K, V, S = RandomState> {
pub struct HashMap<K, V, S = AdaptiveState> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this would be a breaking change at the API level, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this change can't be added to the std library. Is this library a drop-in replacement for std's HashMap?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose of this repo is to iterate outside std, but with the ultimate goal of incorporating any worthwhile changes back into std.

@pczarn
Copy link
Author

pczarn commented Mar 23, 2016

Two problems remain.

First, adaptive maps are not yet safeguarded against DoS by insertion through Entry. The first step is tracking nightly (#6).

Second, the DoS safeguard ignores the effect of fn robin_hood. The safeguard is simple; consider a situation where the user does an insertion followed by searches. The insertion of an element ensures that searches for that element in the future won't take more than a limited number of iterations (a threshold of 128). However, in between element insertion and searches for that element, other unrelated insertions may displace that element.

It seems clear that fn robin_hood by definition can't increase the displacement of a "unfortunate" ousted element beyond the displacement of another more "fortunate" element. Since the latter is already below the threshold of 128, it follows that the former won't be increased above 128.

This should be closely reconsidered and documented in the form of a proof.

@pczarn pczarn force-pushed the adaptive_hashing branch from 84929fb to 9dcdf9c Compare March 23, 2016 19:05
@pczarn pczarn force-pushed the adaptive_hashing branch from 9dcdf9c to 18a8b5f Compare March 24, 2016 09:16
@pczarn pczarn force-pushed the adaptive_hashing branch from 18a8b5f to 3c3aca6 Compare March 24, 2016 09:32
@pczarn pczarn force-pushed the adaptive_hashing branch from 3c3aca6 to 14d75b2 Compare March 24, 2016 12:52
@pczarn
Copy link
Author

pczarn commented Mar 24, 2016

Updated and rebased.

/cc @contain-rs/publishers @bill-myers @pnkfelix @arthurprs @Jurily @gereeter @ticki @divyekapoor
rust-lang/rfcs#631, rust-lang/rust#11783 "Implement fast HashMap"
rust-lang/rust#29754 "Change Siphash to use one of the faster variants of the algorithm (Siphash13, Highwayhash)"
rust-lang/rust#28044 "WIP: Hash and Hasher update for faster common case hashing"
Source of the idea: rust-lang/rust#11783 (comment)

@arthurprs
Copy link

The difference is brutal.

Does this apply to &str/String as well?

@pczarn
Copy link
Author

pczarn commented Mar 24, 2016

Does this apply to &str/String as well?

Not yet, because this requires explicit one-shot hashing with a hasher such as FarmHash, which probably needs an RFC.

@ticki
Copy link

ticki commented Mar 24, 2016

Wonderful, @pczarn. Great work!

@pczarn
Copy link
Author

pczarn commented Jul 13, 2016

The difference is now smaller, because SipHash-1-3 is used by default.

@pczarn
Copy link
Author

pczarn commented Jul 13, 2016

I'm currently writing an RFC.

@arthurprs
Copy link

We should investigate using the integers as the hash in the Adaptive path.

@pczarn
Copy link
Author

pczarn commented Jul 13, 2016

@arthurprs I'm sure there's no good way of using integers as the hash. The adaptive path allows us to use a non-cryptographic hash, but the hash should still be statistically good.

However, we could use hashes that are cheaply converted back to integer keys. That is, a reversible function for hashing. That would save some space at the expense of slower access to integer keys. Unfortunately, HashMap exposes iterators that must borrow keys stored somewhere inside the structure. It's a dead end.

There's more to be gained from using 32-bit hashes on 32-bit platforms. Perhaps 48-bit hashes on 64-bit platforms, too.

@bstrie
Copy link

bstrie commented Sep 14, 2016

@pczarn If the collision threshold is N, what's to stop an attacker from attempting to collide many different buckets N-1 times in order to DoS the server without triggering the collision detector?

@ticki
Copy link

ticki commented Sep 14, 2016

@bstrie How much different would that be from just spamming keys randomly to fill the hashtable up? Clearly, it is limited how much you can delay a particular key.

@comex
Copy link

comex commented Sep 15, 2016

@ticki Very different, as the expected number of probes per lookup for random keys is constant - something like 3, AFAIK, depending on the load factor - no matter how big the hash table gets, while the worst case before hitting the limit is 127 probes.

If the u64 hash values are different (only equal modulo the table size), that's just 127 u64 comparisons, which is a bit slower but no big deal. If they're the same, though, that could be 127 comparisons of really long strings. In theory (as is mentioned in one of the todo comments) this could be a problem even with a single chain that gets repeatedly looked up - lookups aren't much cheaper than insertions, so AFAICS there's no real need to blow up the table...

This should be solvable by specifically checking for equal hashes. Set a much lower chain length limit for the fast path, like 16 or so - which should still be very uncommon. Every insertion that exceeds that chain length triggers a separate function that scans the chain to see if many keys (say, also 16) have equal hashes. If this is true, perform the switch, and of course still do it if the chain length exceeds 128 even with unequal hashes.

If the hasher truly behaves randomly (as it really should for non-attack scenarios), the chance of even one 64-bit collision should be rather low, and every additional collision required divides the probability by on the order of 2^64.

Well, technically not just attack scenarios: it's possible to end up with a non-malicious but systemic source of full collisions, such as if someone's custom Hash implementation hashes only part of the object. That usually means the input to the hasher is the same, so with enough collisions the hash table is doomed to failure no matter what hash it uses (or if anything, switching to SipHash may be beneficial). But if there are only a few objects with equal hasher input, my proposal would make a useless switch more likely. Since this would already be a serious bug and SipHash is not that slow, I don't think that's a big deal.

@ticki
Copy link

ticki commented Sep 15, 2016

@comex

That's not really my point. Say you want to attack key K, then you generate a bunch of preimages of hash(K) % E. You can maximally insert N of these before switching hash function. Due to probing, doing so to another key simultaneously would require that the entry is at minimum N entries away from any other attacked keys. This means that you can at most attack E / N entries, i.e. you slow down E / N keys. Well, you'd need N insertions for each attack, hence E insertions to complete such an attack. That's no different from just repeatedly inserting without any knowledge about the internals.

The only advantage over the uninformed attack is that you partially get to choose which keys to slow down, but even you can only slow it down by N probes.

@pczarn

An idea: When reallocating the table, it should check the highest probe length and conditionally switch back to the fast hash function.

@ticki
Copy link

ticki commented Sep 15, 2016

I'm fine with having N = 16. A random hash function in a table of 1024 entries should emit such behavior naturally with a probability less than 0.0001% (from the collision counting equation).

Edit: My calculation was off.

@Veedrac
Copy link

Veedrac commented Sep 15, 2016

@comex

Set a much lower chain length limit for the fast path, like 16 or so - which should still be very uncommon.

Sadly strong chain length bounds only apply with the double-probing variant of Robin Hood hashing. Chains regularly get to length ~46 for million-long maps with the linear probing variant.

Insertions have much more worrisome behaviour, since they can end up shifting 1k elements even with purely random elements in the worst case.

@ticki
Copy link

ticki commented Sep 15, 2016

@Veedrac Yeah, clustering can have a real, negative effect. One possible solution is to use quadratic + Robin Hood. That should make it much less likely to happen.

@comex
Copy link

comex commented Sep 15, 2016

@ticki Slowing down isn't all or nothing. If you insert N keys with the same hash-modulo-table-size, the first takes 0 probes, the second takes 1, ..., the Nth takes N-1; total number of probes is N(N-1)/2. That's for each chain; the current E doesn't really matter, since you can just repeat this an arbitrary number of times (as long as you know where in the table there's free space) and let the table be grown, ideally having the hash also be equal modulo the new table size - in which case each resize repeats all the work done so far, asymptotically doubling the total number of probes.

The average number of probes per insertion (which we understand as a cost to the attacker) is thus roughly (N-1)/2 without growth - for N=128, that's 63.5 - while inserting keys randomly keeps it under 3. Double both numbers to factor in growth.

Anyway, that's the worst case for the attacker, if (a) they can only perform an arbitrary number of insertions, not lookups, and (b) the inserted keys are required to be unique (e.g. the program refuses the request if an insertion finds an existing key). If they're able to either keep looking up the same key or keep re-inserting it, they can just hammer the last key in the chain, N probes for each operation. In this case the only incentive to create multiple chains is to pessimize cache behavior.

@Veedrac Hrm. If my suggestion to have an intermediate step of checking for fully equal keys is followed, having a few chains >= 16 but < 128 is not the end of the world. I guess it depends how cheap the check can be made...

But if an alternate probing scheme can avoid the code bloat of adding that logic while improving performance in general, it sounds like a good idea even at the expense of some code bloat of its own. Pure double hashing has bad cache behavior, but what about starting with linear probing and switching to double hashing after some low iteration count (like 4)? That is, the probe locations would be h1, h1+1, h1+2, h1+3, h1+h2, h1+2h2, etc. Or maybe ..., h1+3, h1+h2, h1+h2+1, h1+h2+2, h2+h2+3, h2+2h2, ...

@ticki
Copy link

ticki commented Sep 15, 2016

@comex You forget the fact that collisions are not merely enough. You have to deal with probe length, i.e. you need to keep N distance from other hashes, which reduces the damage.

@pczarn
Copy link
Author

pczarn commented Sep 15, 2016

Why is this only for simple key types?

The code doesn't cover strings yet. The algorithm will eventually work just fine for strings and slices.

can you think of any way to minimize the code-complexity of this proposal?

I can't think of any such way, despite lots of consideration. This proposal is already reasonably simple. If someone can come up with a simplification, I would be impressed and grateful. I still have to write an RFC.

This should be solvable by specifically checking for equal hashes. Set a much lower chain length limit for the fast path, like 16 or so - which should still be very uncommon. Every insertion that exceeds that chain length triggers a separate function that scans the chain to see if many keys (say, also 16) have equal hashes. If this is true, perform the switch, and of course still do it if the chain length exceeds 128 even with unequal hashes.

When a chain turns out not to have many equal hashes, you need to resume insertion. The hard part is implementing the resuming it in a way that won't harm code generation. A recursive call is not ideal for generated code, unless it gets tail call optimization.

Should I be more concerned about equal hashes? I think applications should at least ensure their inputs are not too long. Web servers certainly check the length of query keys.

Typical DDoS is devastating for many hashmaps, e.g. in Java, because every key comparison requires reading a heap-allocated object, which loads a cache line. With large strings, we're reading contiguous memory.

An idea: When reallocating the table, it should check the highest probe length and conditionally switch back to the fast hash function.

I think it's not necessary. Why implement switching back, when the algorithm is meant to never switch in practice.

@comex
Copy link

comex commented Sep 15, 2016

@ticki I don't know what you mean. It should be possible to fill up the hash table without gaps, up to the maximum number of entries before resize - like this, imagining N were 4:

slot  0 1 2 3 4 5 6 7 8 9 a b
hash  0 0 0 0 4 4 4 4 8 8 8 8 ...

@ticki
Copy link

ticki commented Sep 15, 2016

@comex. Good point.

@pczarn

I think it's not necessary. Why implement switching back, when the algorithm is meant to never switch in practice.

Well, let's consider a strict hypothetical scenario:

You use the hashtable for a long running KV store (say, a server), and some attack generates a lot of collisions and inserts them into the hash table. Now, the hash table will move on to a secure hash function. When the attacker realize that their approach is inefficient, they might discontinue the attack. If the hash table keeps growing, and then reallocates, nothing is lost by switching back to the old hash function.

Of course, this gives a new attack vector: To do the collision attack, then let it reallocate another time, switching back to the old function, and repeat indefinite. Although, it is worth noting that getting a table to reallocate is not exactly trivial, since it requires you to insert a high (exponentially growing) number of insertions.

Generally, I think the best way of modeling the security in this case is to compare the attack to the naïve one in which a lot of random keys are spammed, given that it requires this technique to revert it back to an unsecure hash function, it is only marginally worse than having a "never-go-back" approach.

The question is this one: How big is the gain, and is it worth it? Unfortunately, that cannot be measured by micro benchmarks.

@pczarn
Copy link
Author

pczarn commented Oct 30, 2016

I changed the way the safeguard works. Previously, it allowed for huge cost of map creation, only disallowing costly lookups. Now, the safeguard is a part of insertion instead of every lookup.

The code is a bit less complex.

Two tasks.

  • found the equation for the probability that a randomly picked bucket is in a chain (island) of length X.
  • wrote an RFC.

@arthurprs
Copy link

Very cool. It can be seen as an adaptive load factor now, which is great. Maybe it should take into account resizing policy though.

@funny-falcon
Copy link

funny-falcon commented Nov 28, 2016

It is quite easy to add seed to this mix function:

https://en.wikipedia.org/wiki/Xor-encrypt-xor

In 1991, motivated by Rivest's DESX construction, Even and Mansour proposed a much simpler scheme (the "two-key Even-Mansour scheme"), which they suggested was perhaps the simplest possible block cipher: XOR the plaintext with a prewhitening key, apply a publicly known unkeyed permutation (in practice, a pseudorandom permutation) to the result, and then XOR a postwhitening key to the permuted result to produce the final ciphertext.[3][4]

So if we have 128bit seed, then we can simply:

self.hash = mix(msg_data ^ seed.k0) ^ seed.k1;

This construction will be strong enough to not use fallback to SipHash at all (for simple keys).

(note: Even-Mansour scheme relies on "strong pseudorandom permutation".
mix function is pseudorandom permutation with unknown "strength".
But I'm pretty sure, it is "strong enough" for this use-case, ie as a hash function in a hash table).

@funny-falcon
Copy link

This construction will be strong enough to not use fallback to SipHash at all (for simple keys).

Then there is no need in Adaptive Hashing for simple keys.
But Adaptive Hashing still could be useful for "more complex" keys (ie strings, structs, etc.), so we just need fast hash function for them.

bors added a commit to rust-lang/rust that referenced this pull request Feb 16, 2017
Adaptive hashmap implementation

All credits to @pczarn who wrote rust-lang/rfcs#1796 and contain-rs/hashmap2#5

 **Background**

Rust std lib hashmap puts a strong emphasis on security, we did some improvements in #37470 but in some very specific cases and for non-default hashers it's still vulnerable (see #36481).

This is a simplified version of rust-lang/rfcs#1796 proposal sans switching hashers on the fly and other things that require an RFC process and further decisions. I think this part has great potential by itself.

**Proposal**
This PR adds code checking for extra long probe and shifts lengths (see code comments and rust-lang/rfcs#1796 for details), when those are encountered the hashmap will grow (even if the capacity limit is not reached yet) _greatly_ attenuating the degenerate performance case.

We need a lower bound on the minimum occupancy that may trigger the early resize, otherwise in extreme cases it's possible to turn the CPU attack into a memory attack. The PR code puts that lower bound at half of the max occupancy (defined by ResizePolicy). This reduces the protection (it could potentially be exploited between 0-50% occupancy) but makes it completely safe.

**Drawbacks**

* May interact badly with poor hashers.  Maps using those may not use the desired capacity.
* It adds 2-3 branches to the common insert path, luckily those are highly predictable and there's room to shave some in future patches.
* May complicate exposure of ResizePolicy in the future as the constants are a function of the fill factor.

**Example**

Example code that exploit the exposure of iteration order and weak hasher.

```
const MERGE: usize = 10_000usize;
#[bench]
fn merge_dos(b: &mut Bencher) {
    let first_map: $hashmap<usize, usize, FnvBuilder> = (0..MERGE).map(|i| (i, i)).collect();
    let second_map: $hashmap<usize, usize, FnvBuilder> = (MERGE..MERGE * 2).map(|i| (i, i)).collect();
    b.iter(|| {
        let mut merged = first_map.clone();
        for (&k, &v) in &second_map {
            merged.insert(k, v);
        }
        ::test::black_box(merged);
    });
}
```

_91 is stdlib and _ad is patched (the end capacity in both cases is the same)

```
running 2 tests
test _91::merge_dos              ... bench:  47,311,843 ns/iter (+/- 2,040,302)
test _ad::merge_dos              ... bench:     599,099 ns/iter (+/- 83,270)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants