-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace HashDictionary
with an ordered hash dictionary
#13
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13 +/- ##
==========================================
+ Coverage 59.10% 63.99% +4.88%
==========================================
Files 18 18
Lines 983 1322 +339
==========================================
+ Hits 581 846 +265
- Misses 402 476 +74
Continue to review full report at Codecov.
|
0708ce4
to
c075d8a
Compare
c075d8a
to
464d167
Compare
The new implementation preserves insertion order. It may use slightly more memory overal but is much faster to iterate because the indices/values are stored densely (performance comparable to `Vector`, i.e. limited by memory bandwidth). Faster insertion/deletion/resizing due to less need to call `hash` (hashes are recorded).
2f42b42
to
3119467
Compare
3119467
to
20d6344
Compare
Hey Andy, sorry I haven't reviewed yet, this diff is kind of epic and scary! I assume the best strategy would be to read the entirety of HashDictionary.jl and HashIndices.jl? The diffs look pretty useless as it's a big rewrite... and I didn't know the previous code anyway. |
Yes it’s a brand new implementation of a hash map based on a python 3.6 talk I saw regarding their then-new implementation. My first hash map so it scares me! I need to copy some tests or something. |
Still doesn't pass tests though...
OK this is getting super close. Benchmark suite is working, results are mostly good. The delete operation appears to be a bit slower than for |
760e664
to
cb905c6
Compare
e81e03e
to
0d34363
Compare
Bravo, this looks epic. Sorry I didn't get around to reviewing it! |
The new implementation preserves insertion order. It may use slightly more memory overall but is much faster to iterate because the indices/values are stored (somewhat) densely (performance comparable to
Vector
, i.e. limited by memory bandwidth). Faster insertion/deletion/resizing due to less need to callhash
(hashes are recorded). Compared to OrderedCollections.jl this implementation (a) does not mutate the object on reads, like iteration (which can't be thread-safe) and (b) may theoretically deal better with deletion without slowdowns. Implementation is somewhat inspired by Python 3.6 ordered dictionary.This PR now includes new functions
distinct
andindex
. The constructors forHashIndices
andHashDictionary
has been improved.Things to do:
copy
optimizationfilter!
optimizationsizehint
s safe!HashIndices
input is uniquified (or throw an error?)think about adding a linear probing cut-off, double-check resizing...it's good enough and pretty well tested, can revisit for speed purposesThis PR obviously leaves Dictionaries.jl with only ordered collections - so we are free to add more semantics to
AbstractIndices
andAbstractDictionary
- things to think about includesort
,sortperm
,permute!
,push!
/pop!
/splice!
(these aren't collections ofPair
s though).