Replace `HashDictionary` with an ordered hash dictionary #13

andyferris · 2020-02-24T12:39:02Z

The new implementation preserves insertion order. It may use slightly more memory overall but is much faster to iterate because the indices/values are stored (somewhat) densely (performance comparable to Vector, i.e. limited by memory bandwidth). Faster insertion/deletion/resizing due to less need to call hash (hashes are recorded). Compared to OrderedCollections.jl this implementation (a) does not mutate the object on reads, like iteration (which can't be thread-safe) and (b) may theoretically deal better with deletion without slowdowns. Implementation is somewhat inspired by Python 3.6 ordered dictionary.

This PR now includes new functions distinct and index. The constructors for HashIndices and HashDictionary has been improved.

Things to do:

copy optimization
filter! optimization
make sizehints safe!
add performance benchmarks to repository
ensure HashIndices input is uniquified (or throw an error?)
~~think about adding a linear probing cut-off, double-check resizing...~~ it's good enough and pretty well tested, can revisit for speed purposes

This PR obviously leaves Dictionaries.jl with only ordered collections - so we are free to add more semantics to AbstractIndices and AbstractDictionary - things to think about include sort, sortperm, permute!, push!/pop!/splice! (these aren't collections of Pairs though).

codecov-io · 2020-02-24T12:47:07Z

Codecov Report

Merging #13 into master will increase coverage by 4.88%.
The diff coverage is 64.78%.

@@            Coverage Diff             @@
##           master      #13      +/-   ##
==========================================
+ Coverage   59.10%   63.99%   +4.88%     
==========================================
  Files          18       18              
  Lines         983     1322     +339     
==========================================
+ Hits          581      846     +265     
- Misses        402      476      +74

Impacted Files	Coverage Δ
src/Dictionaries.jl	`100.00% <ø> (ø)`
src/MappedDictionary.jl	`15.00% <0.00%> (+7.30%)`	⬆️
src/map.jl	`25.86% <16.12%> (-4.45%)`	⬇️
src/Indices.jl	`71.79% <18.18%> (-11.54%)`	⬇️
src/Dictionary.jl	`72.50% <25.00%> (-1.86%)`	⬇️
src/tokens.jl	`25.39% <60.00%> (+9.71%)`	⬆️
src/HashDictionary.jl	`61.61% <61.22%> (+0.42%)`	⬆️
src/AbstractDictionary.jl	`83.25% <66.66%> (+9.22%)`	⬆️
src/AbstractIndices.jl	`61.11% <68.42%> (+28.18%)`	⬆️
src/insertion.jl	`65.03% <71.42%> (+24.73%)`	⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f92b3d4...0d34363. Read the comment docs.

The new implementation preserves insertion order. It may use slightly more memory overal but is much faster to iterate because the indices/values are stored densely (performance comparable to `Vector`, i.e. limited by memory bandwidth). Faster insertion/deletion/resizing due to less need to call `hash` (hashes are recorded).

c42f · 2020-04-24T04:55:15Z

Hey Andy, sorry I haven't reviewed yet, this diff is kind of epic and scary!

I assume the best strategy would be to read the entirety of HashDictionary.jl and HashIndices.jl? The diffs look pretty useless as it's a big rewrite... and I didn't know the previous code anyway.

andyferris · 2020-04-26T21:48:42Z

Yes it’s a brand new implementation of a hash map based on a python 3.6 talk I saw regarding their then-new implementation.

My first hash map so it scares me! I need to copy some tests or something.

Still doesn't pass tests though...

andyferris · 2020-06-01T12:21:39Z

OK this is getting super close. Benchmark suite is working, results are mostly good. The delete operation appears to be a bit slower than for Set or Dictionaries.OldHashIndices (which impacts other operations) so that needs investigating. On the other hand, operations like filter! (to few) and reductions are like totally rediculously awesome :)

Fixes #20

Closes #18

Fixes #11

c42f · 2020-06-12T04:27:10Z

Bravo, this looks epic. Sorry I didn't get around to reviewing it!

andyferris added the enhancement New feature or request label Feb 24, 2020

andyferris requested a review from c42f February 24, 2020 12:39

andyferris force-pushed the ajf/order-hash-dictionary branch from 0708ce4 to c075d8a Compare February 25, 2020 13:19

andyferris force-pushed the ajf/order-hash-dictionary branch from c075d8a to 464d167 Compare March 8, 2020 23:07

andyferris force-pushed the ajf/order-hash-dictionary branch 9 times, most recently from 2f42b42 to 3119467 Compare March 11, 2020 00:18

Added Benchmark to CI via github workflows

20d6344

andyferris force-pushed the ajf/order-hash-dictionary branch from 3119467 to 20d6344 Compare March 11, 2020 02:43

tkf mentioned this pull request Apr 19, 2020

Widening-based map() to remove return_type andyferris/StaticArraysLite.jl#1

Open

andyferris added 4 commits May 31, 2020 15:38

Use github actions for tests etc

7330f19

Improve CI, add tests, fix a deleted bug

afa4c38

Still doesn't pass tests though...

Fix the implementation :)

0ae4d60

Fix benchmarks, add union etc.

320cd05

andyferris added 3 commits June 2, 2020 10:50

Add some @inbounds

2fe4c7a

Don't compare hashes, benchmarking progress.

33608fd

Many many changes

0e0a306

This was referenced Jun 8, 2020

Broadcasting dictionaries with unmatched keys gives confusing result #19

Closed

Indexing with tuples as keys #15

Open

This was referenced Jun 8, 2020

Define a divide-and-conquer parallelism API? #14

Open

Constructing a Dictionary from ranges #7

Closed

Merge traits with ArrayInterface. #6

Open

Enabling dynamic and fixed indices. #5

Open

tkf mentioned this pull request Jun 9, 2020

(HashIndices([1, missing]) == HashIndices([1, missing])) == missing? #11

Closed

andyferris added 3 commits June 10, 2020 12:49

Many changes

cf9d945

Fix co-iteration

0dbbbf5

Fix a test

2426c30

Datseris mentioned this pull request Jun 10, 2020

Performance: consider Dictionaries.jl JuliaDynamics/Agents.jl#99

Open

Move OldHashIndices/OldHashDictionaries to contrib/

cb905c6

andyferris force-pushed the ajf/order-hash-dictionary branch from 760e664 to cb905c6 Compare June 11, 2020 02:54

andyferris added 7 commits June 11, 2020 15:17

More tests

5bc53c5

Forbid IteratorSize of HasShape for Indices

bdc07ed

Fixes #20

Fix merge / mergewith

a9a9866

Closes #18

Finalize == semantics

f891d95

Fixes #11

Auto-convert settokenvalue!

456b90a

Small cleanup

ba4bd05

Fix settokenvalue!

0d34363

andyferris force-pushed the ajf/order-hash-dictionary branch from e81e03e to 0d34363 Compare June 11, 2020 06:51

andyferris merged commit 51998b8 into master Jun 11, 2020

andyferris deleted the ajf/order-hash-dictionary branch June 11, 2020 07:18

andyferris mentioned this pull request Jun 11, 2020

MethodError for dictionary #12

Closed

tkf mentioned this pull request Jul 17, 2020

[WIP] allow sorting Dict/Set values in show JuliaLang/julia#33744

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `HashDictionary` with an ordered hash dictionary #13

Replace `HashDictionary` with an ordered hash dictionary #13

andyferris commented Feb 24, 2020 •

edited

Loading

codecov-io commented Feb 24, 2020 •

edited by codecov bot

Loading

c42f commented Apr 24, 2020 •

edited

Loading

andyferris commented Apr 26, 2020

andyferris commented Jun 1, 2020

c42f commented Jun 12, 2020

Replace HashDictionary with an ordered hash dictionary #13

Replace HashDictionary with an ordered hash dictionary #13

Conversation

andyferris commented Feb 24, 2020 • edited Loading

codecov-io commented Feb 24, 2020 • edited by codecov bot Loading

Codecov Report

c42f commented Apr 24, 2020 • edited Loading

andyferris commented Apr 26, 2020

andyferris commented Jun 1, 2020

c42f commented Jun 12, 2020

Replace `HashDictionary` with an ordered hash dictionary #13

Replace `HashDictionary` with an ordered hash dictionary #13

andyferris commented Feb 24, 2020 •

edited

Loading

codecov-io commented Feb 24, 2020 •

edited by codecov bot

Loading

c42f commented Apr 24, 2020 •

edited

Loading