-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amortize the cost of small tables #47
Comments
Exponential strategy (doubling) works well to prevent reallocation. However when memory space is limited doubling might not be possible. Is it possible to provide (and change) a maximum growth size? |
Unlike a vector, the size of the hash table must be a power of two since we use a bit mask on the hash to map it to a table index. There are different table designs which use non-power-of-two sizes but they are generally much slower. |
One thing we can infer here is that doubling the table size of a small hash table does not really double its size in bytes. This is very much unlike I think it would be more appropriate to skip some powers of two when growing small hash tables. Growing table size as 0 -> 4 -> 16 -> 32 -> 64 -> 128 -> ... seems like a fine strategy to me, for example. |
The reason why the size doesn't exactly double is that there is a fixed cost of 16 bytes in each allocation. There are 16 + N control bytes and N element slots. So the size is calculated as |
By comparison, here is what the std HashMap does:
|
I think jumping from 0 to 4 or even 8 would be fine. |
From my perspective as a mentor on Exercism, and my observations of new Rustaceans: Hashmaps, probably due to having no literal syntax, seem to rarely be declared for just 2 values. Even people who come from langs which throw hashmaps everywhere will not use them as often because it requires the mental overhead of the So: yeah, 0 -> (4|8) seems fine. |
When growing a table, hashbrown will start with a table of size 2 (which can hold 1 element due to load factor) and then double every time the load factor is reached.
sizeof(T) = 4
)sizeof(T) = 8
)sizeof(T) = 16
)(rounded up to 8 for allocator alignment)
In comparison, the current std HashMap starts off empty but then immediately grows to a table size of 32 (capacity 29) as soon as the first element is added.
We want to try to balance memory usage for small tables with the allocation overhead when growing tables up to their final size.
The text was updated successfully, but these errors were encountered: