dict: Change to a sticky Copy-On-Write based table. #259

nairb774 · 2017-02-19T06:49:56Z

This improves the speed of the previous dict implementation through a reduction in the number of atomic loads in the read path (max 1 when the dict is read-only - think globals) as well as the number of allocations needed in the write path.

Overall the performance is improved by about 30%.

Some of the major changes are as follows:

The internal table layout was changed from []*dictEntry to []dictEntry reducing a memory indirection as well as hopefully improving the speed of slot probing in insertAbsentEntry as well as lookupEntry.
Many iteration operations which might have needed to grab a relatively expensive lock previously can now do so without locking if the dict is in the read-only mode.
The sizeof(Dict) increased some as a few variables (used and fill) were moved from the dictTable to Dict itself. The addition of a write and misses values to the Dict makes the overall memory usage of Dict generally larger. This is offset by the type change of dictTable and the reduction of additional pointers there. An empty dict weighs in at 304 bytes compared to 176 previously. At 4 elements, both the old and new implementation use 304 bytes of memory. From that point on, the new implementation actually uses less memory.

Benchmark data (unchanged/statistically insignificant results removed):

name                       old time/op    new time/op    delta
DictIterItems/0-elements      515ns ± 0%     503ns ± 0%    -2.45%  (p=0.008 n=5+5)
DictIterItems/1-elements      579ns ± 0%     574ns ± 1%    -0.86%  (p=0.016 n=5+5)
DictIterItems/2-elements      644ns ± 1%     633ns ± 1%    -1.71%  (p=0.008 n=5+5)
DictIterItems/4-elements      782ns ± 2%     839ns ±15%    +7.21%  (p=0.032 n=5+5)
DictIterItems/5-elements      856ns ± 0%     865ns ± 0%    +0.98%  (p=0.008 n=5+5)
DictIterItems/6-elements     1.12µs ± 0%    1.06µs ± 0%    -5.27%  (p=0.008 n=5+5)
DictIterItems/7-elements     1.19µs ± 0%    1.13µs ± 1%    -4.93%  (p=0.008 n=5+5)
DictIterItems/8-elements     1.25µs ± 1%    1.19µs ± 2%    -4.16%  (p=0.008 n=5+5)
DictIterKeys/0-elements       521ns ± 1%     505ns ± 0%    -3.00%  (p=0.008 n=5+5)
DictIterKeys/1-elements       531ns ± 0%     517ns ± 0%    -2.67%  (p=0.008 n=5+5)
DictIterKeys/2-elements       541ns ± 0%     529ns ± 1%    -2.22%  (p=0.008 n=5+5)
DictIterKeys/3-elements       556ns ± 0%     543ns ± 0%    -2.27%  (p=0.008 n=5+5)
DictIterKeys/4-elements       566ns ± 0%     554ns ± 0%    -2.19%  (p=0.008 n=5+5)
DictIterKeys/5-elements       584ns ± 2%     572ns ± 1%    -2.06%  (p=0.016 n=5+5)
DictIterKeys/6-elements       848ns ± 5%     765ns ± 2%    -9.83%  (p=0.008 n=5+5)
DictIterKeys/7-elements       851ns ± 0%     764ns ± 1%   -10.23%  (p=0.008 n=5+5)
DictIterKeys/8-elements       862ns ± 0%     774ns ± 0%   -10.28%  (p=0.008 n=5+5)
DictIterValues/0-elements     518ns ± 1%     506ns ± 2%    -2.35%  (p=0.008 n=5+5)
DictIterValues/1-elements     535ns ± 2%     514ns ± 0%    -3.97%  (p=0.016 n=5+4)
DictIterValues/2-elements     545ns ± 0%     527ns ± 0%    -3.16%  (p=0.008 n=5+5)
DictIterValues/3-elements     559ns ± 1%     543ns ± 1%    -2.79%  (p=0.008 n=5+5)
DictIterValues/4-elements     582ns ± 8%     552ns ± 0%    -5.09%  (p=0.016 n=5+4)
DictIterValues/5-elements     581ns ± 2%     564ns ± 0%    -2.96%  (p=0.008 n=5+5)
DictIterValues/6-elements     843ns ± 2%     752ns ± 1%   -10.77%  (p=0.008 n=5+5)
DictIterValues/7-elements     850ns ± 1%     764ns ± 1%   -10.07%  (p=0.008 n=5+5)
DictIterValues/8-elements     869ns ± 3%     789ns ± 1%    -9.21%  (p=0.008 n=5+5)
CallSimple                    277ms ± 2%     204ms ± 4%   -26.36%  (p=0.008 n=5+5)
CallKeywords                  310ns ± 5%     328ns ± 1%    +5.67%  (p=0.008 n=5+5)
DictCompCreate                457µs ± 3%     423µs ± 9%    -7.45%  (p=0.032 n=5+5)
Fib                          6.00µs ±11%    5.16µs ± 3%   -14.08%  (p=0.008 n=5+5)
FibParallel10                5.81µs ± 1%    5.22µs ± 4%   -10.22%  (p=0.008 n=5+5)
FibParallel11                6.03µs ±12%    5.27µs ± 4%   -12.61%  (p=0.008 n=5+5)
FibParallel2                 5.44µs ± 1%    4.86µs ± 1%   -10.59%  (p=0.016 n=4+5)
FibParallel3                 5.76µs ±15%    4.88µs ± 2%   -15.34%  (p=0.008 n=5+5)
FibParallel4                 5.76µs ± 9%    5.05µs ± 4%   -12.45%  (p=0.008 n=5+5)
FibParallel5                 5.76µs ± 3%    4.99µs ± 3%   -13.32%  (p=0.008 n=5+5)
FibParallel6                 5.62µs ± 5%    5.07µs ± 1%    -9.65%  (p=0.016 n=5+4)
FibParallel8                 5.68µs ± 3%    5.15µs ± 3%    -9.36%  (p=0.016 n=5+4)
FibParallel9                 5.92µs ± 9%    5.18µs ± 7%   -12.53%  (p=0.008 n=5+5)
DictCreate                    953ns ± 4%     838ns ± 4%   -12.07%  (p=0.008 n=5+5)
DictCreateFunc               1.67µs ± 4%    1.19µs ± 3%   -28.67%  (p=0.008 n=5+5)
DictSetItem                   252ns ± 7%     169ns ± 3%   -32.88%  (p=0.008 n=5+5)
DictStringOnlyGetItem         129ns ± 3%     108ns ± 2%   -15.84%  (p=0.008 n=5+5)
DictStringOnlySetItem         245ns ± 4%     172ns ± 5%   -30.02%  (p=0.008 n=5+5)
ListContains3                 432ns ± 2%     456ns ± 3%    +5.51%  (p=0.008 n=5+5)
WhileXRange                   399ns ± 4%     377ns ± 1%    -5.32%  (p=0.008 n=5+5)

name                       old alloc/op   new alloc/op   delta
DictIterItems/0-elements       320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterItems/1-elements       384B ± 0%      528B ± 0%   +37.50%  (p=0.008 n=5+5)
DictIterItems/2-elements       448B ± 0%      592B ± 0%   +32.14%  (p=0.008 n=5+5)
DictIterItems/3-elements       512B ± 0%      656B ± 0%   +28.12%  (p=0.008 n=5+5)
DictIterItems/4-elements       576B ± 0%      720B ± 0%   +25.00%  (p=0.008 n=5+5)
DictIterItems/5-elements       640B ± 0%      784B ± 0%   +22.50%  (p=0.008 n=5+5)
DictIterItems/6-elements       704B ± 0%      848B ± 0%   +20.45%  (p=0.008 n=5+5)
DictIterItems/7-elements       768B ± 0%      912B ± 0%   +18.75%  (p=0.008 n=5+5)
DictIterItems/8-elements       832B ± 0%      976B ± 0%   +17.31%  (p=0.008 n=5+5)
DictIterKeys/0-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/1-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/2-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/3-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/4-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/5-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/6-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/7-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterKeys/8-elements        320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/0-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/1-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/2-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/3-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/4-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/5-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/6-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/7-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
DictIterValues/8-elements      320B ± 0%      464B ± 0%   +45.00%  (p=0.008 n=5+5)
ListContains/false-3           336B ± 0%      464B ± 0%   +38.10%  (p=0.008 n=5+5)
ListContains/false-10          336B ± 0%      464B ± 0%   +38.10%  (p=0.008 n=5+5)
TupleContains/false-3          400B ± 0%      528B ± 0%   +32.00%  (p=0.008 n=5+5)
TupleContains/false-10         400B ± 0%      528B ± 0%   +32.00%  (p=0.008 n=5+5)
CallMethod                    244MB ± 0%     244MB ± 0%    -0.07%  (p=0.016 n=4+5)
CallKwargs                     383B ± 0%      415B ± 0%    +8.36%  (p=0.008 n=5+5)
DictCompCreate                142kB ± 0%     154kB ± 0%    +8.21%  (p=0.016 n=5+4)
GeneratorExpCreate             735B ± 0%      863B ± 0%   +17.41%  (p=0.008 n=5+5)
ListCompCreate               40.4kB ± 0%    41.0kB ± 0%    +1.27%  (p=0.008 n=5+5)
DictCreate                     303B ± 0%      335B ± 0%   +10.56%  (p=0.008 n=5+5)
DictCreateFunc                 559B ± 0%      415B ± 0%   -25.76%  (p=0.008 n=5+5)
DictSetItem                   63.0B ± 0%     31.0B ± 0%   -50.79%  (p=0.008 n=5+5)
DictStringOnlySetItem         63.0B ± 0%     31.0B ± 0%   -50.79%  (p=0.008 n=5+5)

name                       old allocs/op  new allocs/op  delta
DictIterItems/0-elements       6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterItems/1-elements       7.00 ± 0%      6.00 ± 0%   -14.29%  (p=0.008 n=5+5)
DictIterItems/2-elements       8.00 ± 0%      7.00 ± 0%   -12.50%  (p=0.008 n=5+5)
DictIterItems/3-elements       9.00 ± 0%      8.00 ± 0%   -11.11%  (p=0.008 n=5+5)
DictIterItems/4-elements       10.0 ± 0%       9.0 ± 0%   -10.00%  (p=0.008 n=5+5)
DictIterItems/5-elements       11.0 ± 0%      10.0 ± 0%    -9.09%  (p=0.008 n=5+5)
DictIterItems/6-elements       12.0 ± 0%      11.0 ± 0%    -8.33%  (p=0.008 n=5+5)
DictIterItems/7-elements       13.0 ± 0%      12.0 ± 0%    -7.69%  (p=0.008 n=5+5)
DictIterItems/8-elements       14.0 ± 0%      13.0 ± 0%    -7.14%  (p=0.008 n=5+5)
DictIterKeys/0-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/1-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/2-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/3-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/4-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/5-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/6-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/7-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterKeys/8-elements        6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/0-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/1-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/2-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/3-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/4-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/5-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/6-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/7-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
DictIterValues/8-elements      6.00 ± 0%      5.00 ± 0%   -16.67%  (p=0.008 n=5+5)
ListContains/false-3           7.00 ± 0%      6.00 ± 0%   -14.29%  (p=0.008 n=5+5)
ListContains/false-10          7.00 ± 0%      6.00 ± 0%   -14.29%  (p=0.008 n=5+5)
TupleContains/false-3          8.00 ± 0%      7.00 ± 0%   -12.50%  (p=0.008 n=5+5)
TupleContains/false-10         8.00 ± 0%      7.00 ± 0%   -12.50%  (p=0.008 n=5+5)
CallMethod                    6.74M ± 0%     6.74M ± 0%    -0.01%  (p=0.016 n=4+5)
CallSimple                     6.00 ± 0%      3.80 ±47%   -36.67%  (p=0.008 n=5+5)
CallKwargs                     7.00 ± 0%      3.00 ± 0%   -57.14%  (p=0.008 n=5+5)
DictCompCreate                2.75k ± 0%     1.73k ± 0%   -36.83%  (p=0.008 n=5+5)
GeneratorExpCreate             14.0 ± 0%      13.0 ± 0%    -7.14%  (p=0.008 n=5+5)
ListCompCreate                  745 ± 0%       741 ± 0%    -0.54%  (p=0.008 n=5+5)
DictCreate                     6.00 ± 0%      2.00 ± 0%   -66.67%  (p=0.008 n=5+5)
DictCreateFunc                 10.0 ± 0%       3.0 ± 0%   -70.00%  (p=0.008 n=5+5)
DictSetItem                    1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
DictStringOnlySetItem          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)

This improves the speed of the previous dict implementation through a reduction in the number of atomic loads in the read path (max 1 when the dict is read-only - think globals) as well as the number of allocations needed in the write path. Overall the performance is improved by about 30%. Some of the major changes are as follows: * The internal table layout was changed from []*dictEntry to []dictEntry reducing a memory indirection as well as hopefully improving the speed of slot probing in insertAbsentEntry as well as lookupEntry. * Many iteration operations which might have needed to grab a relatively expensive lock previously can now do so without locking if the dict is in the read-only mode. * The sizeof(Dict) increased some as a few variables (used and fill) were moved from the dictTable to Dict itself. The addition of a `write` and `misses` values to the Dict makes the overall memory usage of Dict generally larger. This is offset by the type change of dictTable and the reduction of additional pointers there. An empty dict weighs in at 304 bytes compared to 176 previously. At 4 elements, both the old and new implementation use 304 bytes of memory. From that point on, the new implementation actually uses less memory.

trotterdylan · 2017-02-20T18:40:29Z

This is really cool. Thanks for preparing this PR. It's a bit involved so I may take a few days to get through it.

At a high level, I wanted to point out that the immutability of dictEntry objects is an important invariant of the current design. I can't remember all the details off hand but I think it ultimately comes down to not being able to update both the key and value atomically. Can you comment on why this is no longer important in this design?

nairb774 · 2017-02-21T01:36:15Z

Sorry about the size. It wouldn't surprise me to take some time - the nice part is that this is a performance change, not new functionality so time is on our side.

As an overview, the implementation is a blending of the table layout of the Python dict implementation, and a concurrency implementation borrowed from golang/sync@54b13b0. In that light, writes to the table are serialized by a mutex, while reads are performed from a immutable table (assuming the map has transitioned to fast reads). This means that for heavy read-only dicts (think globals) most reads only have the cost of a single atomic read. For heavy read/write dicts, the cost mostly the overhead of locking the recursiveMutex.

If anything I write here would make better sense in the code or commit comment, let me know. I'd also be happy to add/modify existing benchmarks if you think that would be good to make sure this is as good an improvement as I think it is. The main goal for me has been to try to get the CallSimple benchmark closer to the speed that CPython is able to execute it (12ms).

trotterdylan

Overall I just have a few style comments. The implementation is really interesting. For mostly-write and mostly-read dicts it makes a lot of sense to me why this would be more efficient. But are there workloads where it would be less efficient? ISTM you may do a lot of copying when you do a write/read/write/read access pattern. Maybe a benchmark around this would be useful.

trotterdylan · 2017-03-05T03:30:36Z

runtime/dict.go

@@ -47,25 +53,16 @@ type dictEntry struct {
 	value *Object
 }

+func (d dictEntry) isEmpty() bool   { return d.key == nil }


Can these be *dictEntry methods, even though we don't explicitly use *dictEntry anymore? My Go understanding here is a little confused. I know that you can call *Mutex methods on var m Mutex so it stands to reason you could call *dictEntry methods on elements of []dictEntry

trotterdylan · 2017-03-05T03:31:06Z

runtime/dict.go

+	// By having it a value type, the compiler is able to make all
+	// `&deletedEntry` effectively constant (avoids a memory read if this
+	// had a pointer type).
+	deletedEntry Object


Maybe this should now be called deletedKey

trotterdylan · 2017-03-05T03:50:47Z

runtime/dict.go

+		}
+		d.mutex.Unlock(f)
+	}
+	return


I prefer return iter

trotterdylan · 2017-03-05T03:52:06Z

runtime/dict.go

 	}
-	return entry
+	return


return entry

trotterdylan · 2017-03-05T03:54:07Z

runtime/dict.go

-		table.insertAbsentEntry(&dictEntry{hashString(key), NewStr(key).ToObject(), value})
+		table.insertAbsentEntry(dictEntry{hashString(key), NewStr(key).ToObject(), value})
+	}
+	d := &Dict{


Why not:

return &Dict{ ... }

trotterdylan · 2017-03-05T03:55:58Z

runtime/dict.go

+	// pointer to d.write (which would be bad).
+	atomic.StorePointer(d.unsafeReadTablePointer(), unsafe.Pointer(&table))
+	d.misses = 0
+	return


return table

trotterdylan · 2017-03-05T16:16:42Z

runtime/dict.go

@@ -323,14 +343,41 @@ func (d *Dict) incVersion() {
 	atomic.AddInt64(&d.version, 1)
 }

+// populateWriteTable makes sure that d.write is populated with the dict's
+// table, possibly copying it from the read table.


Mention that lock must be held during this call

trotterdylan · 2017-03-05T16:24:50Z

runtime/dict.go

+	// then the programmer hasn't bothered to implement proper locking in
+	// their code and in reality they don't know which statement is
+	// happening before and which is happening after (mutator vs. eq).
+	// Additionally, it shouldn't matter


Adjust line wrap formatting

trotterdylan · 2017-03-06T05:46:09Z

runtime/dict.go

+		table = *d.loadReadTable()
+		d.mutex.Unlock(f)
+		goto top
+	} else if d.misses > d.used {


This line could use a comment. It's not clear to me what this comparison means.

If more people is writing than reading, the de facto table is the wrote one, not the read one, IIUC.

trotterdylan suggested changes Mar 6, 2017

View reviewed changes

This was referenced Oct 6, 2018

dict: Change to a sticky Copy-On-Write based table. grumpyhome/grumpy#85

Open

Migrate google/grumpy issues to grumpyhome/grumpy grumpyhome/grumpy#8

Closed

Faster dict grumpyhome/grumpy#103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dict: Change to a sticky Copy-On-Write based table. #259

dict: Change to a sticky Copy-On-Write based table. #259

nairb774 commented Feb 19, 2017

trotterdylan commented Feb 20, 2017

nairb774 commented Feb 21, 2017

trotterdylan left a comment

trotterdylan Mar 5, 2017

trotterdylan Mar 5, 2017

trotterdylan Mar 5, 2017

trotterdylan Mar 5, 2017

trotterdylan Mar 5, 2017

trotterdylan Mar 5, 2017

trotterdylan Mar 5, 2017

trotterdylan Mar 5, 2017

trotterdylan Mar 6, 2017

alanjds Oct 6, 2018

dict: Change to a sticky Copy-On-Write based table. #259

Are you sure you want to change the base?

dict: Change to a sticky Copy-On-Write based table. #259

Conversation

nairb774 commented Feb 19, 2017

trotterdylan commented Feb 20, 2017

nairb774 commented Feb 21, 2017

trotterdylan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment