Cached string decoder for map keys (CachedKeyDecoder) #54

sergeyzenchenko · 2019-06-03T10:40:05Z

Check this out @gfx We are using it in our project.

Big part of all strings are map keys and most of the time total number of uniq keys is small.
For example in our payload in 260k objects 67% of all string are keys with 100 uniq values.
I've tried to cache this values and skip string decoding at all for keys.
It's using bytes representation of key string and stores actual key value. So instead of decoding key string we just need to find this key value in cache.

Results before:

Benchmark on NodeJS/v12.3.1

operation                                                         |   op   |   ms  |  op/s 
----------------------------------------------------------------- | ------: | ----: | ------:
buf = Buffer.from(JSON.stringify(obj));                           |  557200 |  5000 |  111440
buf = JSON.stringify(obj);                                        | 1078100 |  5000 |  215620
obj = JSON.parse(buf);                                            |  394300 |  5001 |   78844
buf = require("msgpack-lite").encode(obj);                        |  416400 |  5000 |   83280
obj = require("msgpack-lite").decode(buf);                        |  313600 |  5000 |   62720
buf = require("@msgpack/msgpack").encode(obj);                    |  646100 |  5000 |  129220
obj = require("@msgpack/msgpack").decode(buf);                    |  561800 |  5000 |  112360
✨  Done in 36.69s.

After:

Benchmark on NodeJS/v12.3.1

operation                                                         |   op   |   ms  |  op/s 
----------------------------------------------------------------- | ------: | ----: | ------:
buf = Buffer.from(JSON.stringify(obj));                           |  598900 |  5000 |  119780
buf = JSON.stringify(obj);                                        | 1151000 |  5000 |  230200
obj = JSON.parse(buf);                                            |  397600 |  5000 |   79520
buf = require("msgpack-lite").encode(obj);                        |  429000 |  5000 |   85800
obj = require("msgpack-lite").decode(buf);                        |  323800 |  5000 |   64760
buf = require("@msgpack/msgpack").encode(obj);                    |  666800 |  5000 |  133360
obj = require("@msgpack/msgpack").decode(buf);                    |  744000 |  5000 |  148800
✨  Done in 36.95s.

I am not sure that we should include it directly into this library, but maybe it can be added as optional speedup that user can add if it's required.

It works better if Decoder is reused, because cache is need to be created first.

Let me know what do you think about it.

gfx · 2019-06-03T12:33:20Z

Fascinating idea! 👏 👏 👏

I'll merge the string caching in decoders, but there are some improvement and changes needed.

The reuse of Decoder

If one would like to decode it in the fastest way, they should reuse the instance of Decoder, anyway. We should re-write the benchmark script to reuse Decoder.

And, I think decode() should create Decoder for each time in order to control configuration easily.

Default behavior

I'd like to use caching by default, so it should be merged to Decoder.

Configuration of cacheSize: number may be a good idea to control memory usage because uncontrolled caching is a kind of memory leak.

sergeyzenchenko · 2019-06-03T13:00:01Z

Sure, this is just a prototype solution that we use, it's not intended to be merged directly. It requires changes and tests.

I think we can allow user to pass string decoder to options, so user can pass custom decoder if needed.

What do you think about it?

sergeyzenchenko · 2019-06-03T13:10:35Z

I am not sure about how to properly implement cacheSize yet.
In worst case scenario cacheSize can be filled with low frequently used keys if they are frequently used in first documents.

Maybe I can add cache.compact() function that user can call on demand.
Or this can be called periodically inside Decoder.
This function will remove less frequently used keys.

Current version tracks number of hits for each key and orders them by this number to make most frequently used keys faster.

sergeyzenchenko · 2019-06-03T13:59:52Z

Lol @gfx , I've just improved performance more :D Now it's 176660 ops/s

gfx · 2019-06-04T01:13:50Z

Lol @gfx , I've just improved performance more :D Now it's 176660 ops/s

Amazing! 😆

cacheSize is not necessarily correct; it's just a hint.

I think we can allow user to pass string decoder to options, so user can pass custom decoder if needed

You mean decode(buffer, { stringDecoder: sharedCachingStringDecoder })? I prefer { stringCache: sharedStringCache } because its intent is clearer so users can focus on the cache engine, not the whole string decoder.

BTW I have another idea to share encoders and decoders, not directly related to this PR: #56

…ecoder

gfx · 2019-07-26T08:01:29Z

I'd like to proceed this patch this week. FYI: Chrome 76 is going to be released at the end of this month, which includes JSON.parse performance improvement.

gfx · 2019-08-02T05:10:22Z

src/CachedKeyDecoder.ts

+    if (records.length >= this.maxLengthPerKey) {
+      // `records` are full!
+      // Set `record` to a randomized position.
+      records[(Math.random() * records.length) | 0] = record;


I have replaced the original hit-count based sorting algorithm by random-picking algorithm to reduce overhead in #get().

codecov-io · 2019-08-02T05:16:48Z

Codecov Report

Merging #54 into master will increase coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #54      +/-   ##
==========================================
+ Coverage   98.16%   98.25%   +0.08%     
==========================================
  Files          15       16       +1     
  Lines         927      972      +45     
  Branches      189      197       +8     
==========================================
+ Hits          910      955      +45     
  Misses         17       17

Impacted Files	Coverage Δ
src/CachedKeyDecoder.ts	`100% <100%> (ø)`
src/Decoder.ts	`98.88% <100%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0be621...11803d8. Read the comment docs.

gfx · 2019-08-02T05:23:17Z

Benchmark on NodeJS/v12.7.0

operation	op	ms	op/s
buf = Buffer.from(JSON.stringify(obj));	507700	5000	101540
buf = JSON.stringify(obj);	958100	5000	191620
obj = JSON.parse(buf);	346500	5000	69300
buf = require("msgpack-lite").encode(obj);	361800	5001	72345
obj = require("msgpack-lite").decode(buf);	267400	5000	53480
buf = require("@msgpack/msgpack").encode(obj);	510200	5000	102040
obj = require("@msgpack/msgpack").decode(buf);	825500	5000	165100

…ious!

gfx · 2019-08-02T05:25:20Z

Now, this PR is finished. cc: @sergeyzenchenko

gfx · 2019-08-02T05:37:11Z

src/Decoder.ts

@@ -59,6 +62,9 @@ export class Decoder {
  headByte = HEAD_BYTE_REQUIRED;
  readonly stack: Array<StackState> = [];

+  // TODO: parameterize this property.
+  readonly cachedKeyDecoder = sharedCachedKeyDecoder;


I'm thinking about the interface to customize a key decoder, so it's not yet public for now.

sergeyzenchenko added Perf Improvement DO NOT MERGE ☠️ labels Jun 3, 2019

sergeyzenchenko requested a review from gfx June 3, 2019 10:40

sergeyzenchenko force-pushed the cached_key_string_decoder branch 3 times, most recently from 2c6e1fc to 97eb530 Compare June 3, 2019 10:45

Experimental cache string decoder for map keys

3dd2893

sergeyzenchenko force-pushed the cached_key_string_decoder branch from 97eb530 to 3dd2893 Compare June 3, 2019 10:45

Improve performance

e3c0f74

gfx mentioned this pull request Jun 7, 2019

experimental string cache in encoder but it's slow #62

Closed

Merge remote-tracking branch 'origin/master' into cached_key_string_d…

7050b48

…ecoder

fix CachedKeyDecoder's remaining issues

8be4057

gfx force-pushed the cached_key_string_decoder branch from 838b277 to 8be4057 Compare August 2, 2019 02:26

gfx added 2 commits August 2, 2019 11:32

fix possible unexpected behaviors for NodeJS Buffer

4d57efe

tweaks

fdd2f28

gfx changed the title ~~Experimental cache string decoder for map keys~~ Cached string decoder for map keys (CachedKeyDecoder) Aug 2, 2019

gfx approved these changes Aug 2, 2019

View reviewed changes

gfx reviewed Aug 2, 2019

View reviewed changes

cleanup

24a304c

gfx removed the DO NOT MERGE ☠️ label Aug 2, 2019

update benchmark results; now decode() is 1.5x times faster than prev…

dd1252a

…ious!

comment out console.dir in tests

11803d8

gfx reviewed Aug 2, 2019

View reviewed changes

gfx merged commit 0a5966e into master Aug 2, 2019

gfx deleted the cached_key_string_decoder branch August 2, 2019 05:48

gfx mentioned this pull request Oct 4, 2020

Omit property names #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cached string decoder for map keys (CachedKeyDecoder) #54

Cached string decoder for map keys (CachedKeyDecoder) #54

sergeyzenchenko commented Jun 3, 2019 •

edited

Loading

gfx commented Jun 3, 2019 •

edited

Loading

sergeyzenchenko commented Jun 3, 2019

sergeyzenchenko commented Jun 3, 2019

sergeyzenchenko commented Jun 3, 2019

gfx commented Jun 4, 2019

gfx commented Jul 26, 2019

gfx Aug 2, 2019

codecov-io commented Aug 2, 2019 •

edited

Loading

gfx commented Aug 2, 2019 •

edited

Loading

gfx commented Aug 2, 2019

gfx Aug 2, 2019

Cached string decoder for map keys (CachedKeyDecoder) #54

Cached string decoder for map keys (CachedKeyDecoder) #54

Conversation

sergeyzenchenko commented Jun 3, 2019 • edited Loading

gfx commented Jun 3, 2019 • edited Loading

The reuse of Decoder

Default behavior

sergeyzenchenko commented Jun 3, 2019

sergeyzenchenko commented Jun 3, 2019

sergeyzenchenko commented Jun 3, 2019

gfx commented Jun 4, 2019

gfx commented Jul 26, 2019

gfx Aug 2, 2019

Choose a reason for hiding this comment

codecov-io commented Aug 2, 2019 • edited Loading

Codecov Report

gfx commented Aug 2, 2019 • edited Loading

gfx commented Aug 2, 2019

gfx Aug 2, 2019

Choose a reason for hiding this comment

sergeyzenchenko commented Jun 3, 2019 •

edited

Loading

gfx commented Jun 3, 2019 •

edited

Loading

codecov-io commented Aug 2, 2019 •

edited

Loading

gfx commented Aug 2, 2019 •

edited

Loading