Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amortize the cost of small tables #47

Closed
Amanieu opened this issue Feb 25, 2019 · 7 comments · Fixed by #162
Closed

Amortize the cost of small tables #47

Amanieu opened this issue Feb 25, 2019 · 7 comments · Fixed by #162

Comments

@Amanieu
Copy link
Member

Amanieu commented Feb 25, 2019

When growing a table, hashbrown will start with a table of size 2 (which can hold 1 element due to load factor) and then double every time the load factor is reached.

Table size Capacity Memory usage (sizeof(T) = 4) Memory usage (sizeof(T) = 8) Memory usage (sizeof(T) = 16)
(empty) 0 0 0 0
2 1 26 (32) 34 (40) 50 (56)
4 3 36 (40) 52 (56) 84 (88)
8 7 56 88 152
16 14 96 160 288
32 28 176 304 560

(rounded up to 8 for allocator alignment)

In comparison, the current std HashMap starts off empty but then immediately grows to a table size of 32 (capacity 29) as soon as the first element is added.

We want to try to balance memory usage for small tables with the allocation overhead when growing tables up to their final size.

@flip111
Copy link

flip111 commented Feb 25, 2019

Exponential strategy (doubling) works well to prevent reallocation. However when memory space is limited doubling might not be possible. Is it possible to provide (and change) a maximum growth size?

@Amanieu
Copy link
Member Author

Amanieu commented Feb 25, 2019

Unlike a vector, the size of the hash table must be a power of two since we use a bit mask on the hash to map it to a table index. There are different table designs which use non-power-of-two sizes but they are generally much slower.

@ghost
Copy link

ghost commented Feb 25, 2019

One thing we can infer here is that doubling the table size of a small hash table does not really double its size in bytes. This is very much unlike Vec, where doubling the buffer always makes it exactly twice as large in bytes.

I think it would be more appropriate to skip some powers of two when growing small hash tables. Growing table size as 0 -> 4 -> 16 -> 32 -> 64 -> 128 -> ... seems like a fine strategy to me, for example.

@Amanieu
Copy link
Member Author

Amanieu commented Feb 25, 2019

The reason why the size doesn't exactly double is that there is a fixed cost of 16 bytes in each allocation. There are 16 + N control bytes and N element slots. So the size is calculated as 16 + N + sizeof(T) * N.

@Amanieu
Copy link
Member Author

Amanieu commented Feb 25, 2019

By comparison, here is what the std HashMap does:

Table size Capacity Memory usage (sizeof(T) = 4) Memory usage (sizeof(T) = 8) Memory usage (sizeof(T) = 16)
(empty) 0 0 0 0
32 29 384 512 768

@nnethercote
Copy link
Contributor

I think jumping from 0 to 4 or even 8 would be fine.

@workingjubilee
Copy link
Member

From my perspective as a mentor on Exercism, and my observations of new Rustaceans:

Hashmaps, probably due to having no literal syntax, seem to rarely be declared for just 2 values. Even people who come from langs which throw hashmaps everywhere will not use them as often because it requires the mental overhead of the use. When people want a tiny amount of k/v pairs, I often see them juggle some tuples or even declare tiny structs. HashMaps are favored when (from the logical perspective) you would have potentially Many and they want to do dynamic access.

So: yeah, 0 -> (4|8) seems fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants