Rewriting the parser using stacks #25

vincent-163 · 2019-01-22T13:46:01Z

After figuring out that the benchmark was defective, I created a new parser using the technique mentioned in #24 .
The benchmark results without benchPool, which should be closer to the real performance:

goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4         	  500000	      3824 ns/op	  49.68 MB/s	     960 B/op	      51 allocs/op
BenchmarkParse/small/stdjson-struct-4      	 1000000	      2005 ns/op	  94.75 MB/s	     224 B/op	       4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4         	 1000000	      1420 ns/op	 133.75 MB/s	     168 B/op	       2 allocs/op
BenchmarkParse/small/fastjson-4                     	 1000000	      2046 ns/op	  92.85 MB/s	    3424 B/op	      11 allocs/op
BenchmarkParse/small/fastjson-get-4                 	  500000	      2178 ns/op	  87.21 MB/s	    3424 B/op	      11 allocs/op
BenchmarkParse/medium/stdjson-map-4                 	   50000	     23996 ns/op	  97.05 MB/s	   10195 B/op	     208 allocs/op
BenchmarkParse/medium/stdjson-struct-4              	   50000	     27107 ns/op	  85.92 MB/s	    9174 B/op	     258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4        	  100000	     10688 ns/op	 217.90 MB/s	     280 B/op	       5 allocs/op
BenchmarkParse/medium/fastjson-4                    	  200000	      9703 ns/op	 240.01 MB/s	   17688 B/op	      54 allocs/op
BenchmarkParse/medium/fastjson-get-4                	  200000	      9814 ns/op	 237.30 MB/s	   17688 B/op	      54 allocs/op
BenchmarkParse/large/stdjson-map-4                  	    5000	    335337 ns/op	  83.85 MB/s	  210764 B/op	    2785 allocs/op
BenchmarkParse/large/stdjson-struct-4               	   10000	    140989 ns/op	 199.43 MB/s	   15617 B/op	     353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4         	   10000	    131005 ns/op	 214.63 MB/s	     280 B/op	       5 allocs/op
BenchmarkParse/large/fastjson-4                     	   10000	    122384 ns/op	 229.75 MB/s	  283200 B/op	     540 allocs/op
BenchmarkParse/large/fastjson-get-4                 	   10000	    121147 ns/op	 232.10 MB/s	  283200 B/op	     540 allocs/op
BenchmarkParse/canada/stdjson-map-4                 	      30	  39510900 ns/op	  56.97 MB/s	12260534 B/op	  392539 allocs/op
BenchmarkParse/canada/stdjson-struct-4              	      50	  40044040 ns/op	  56.21 MB/s	12260139 B/op	  392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4        	     200	   9637495 ns/op	 233.57 MB/s	     291 B/op	       5 allocs/op
BenchmarkParse/canada/fastjson-4                    	      20	 100080750 ns/op	  22.49 MB/s	75844145 B/op	  114252 allocs/op
BenchmarkParse/canada/fastjson-get-4                	      30	  54762833 ns/op	  41.11 MB/s	75844142 B/op	  114252 allocs/op
BenchmarkParse/citm/stdjson-map-4                   	     100	  15829930 ns/op	 109.11 MB/s	 5214145 B/op	   95402 allocs/op
BenchmarkParse/citm/stdjson-struct-4                	     200	   7847090 ns/op	 220.11 MB/s	    1993 B/op	      75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4          	     200	   7860580 ns/op	 219.73 MB/s	     281 B/op	       5 allocs/op
BenchmarkParse/citm/fastjson-4                      	     100	  10795730 ns/op	 159.99 MB/s	17601362 B/op	   30574 allocs/op
BenchmarkParse/citm/fastjson-get-4                  	     200	  10633195 ns/op	 162.44 MB/s	17601360 B/op	   30574 allocs/op
BenchmarkParse/twitter/stdjson-map-4                	     200	   5939440 ns/op	 106.33 MB/s	 2187556 B/op	   31264 allocs/op
BenchmarkParse/twitter/stdjson-struct-4             	     500	   2821878 ns/op	 223.79 MB/s	     409 B/op	       6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4       	     500	   2807618 ns/op	 224.93 MB/s	     408 B/op	       6 allocs/op
BenchmarkParse/twitter/fastjson-4                   	     500	   2810480 ns/op	 224.70 MB/s	 5047840 B/op	    4729 allocs/op
BenchmarkParse/twitter/fastjson-get-4               	     500	   2816916 ns/op	 224.19 MB/s	 5047840 B/op	    4729 allocs/op
PASS
ok  	github.com/valyala/fastjson	60.703s

fastjson was even slower then stdjson despite claimed 15x improvement.
After rewriting the parser using stacks, the results are:

goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4         	  300000	      3967 ns/op	  47.89 MB/s	     960 B/op	      51 allocs/op
BenchmarkParse/small/stdjson-struct-4      	 1000000	      1979 ns/op	  95.99 MB/s	     224 B/op	       4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4         	 1000000	      1422 ns/op	 133.57 MB/s	     168 B/op	       2 allocs/op
BenchmarkParse/small/fastjson-4                     	 1000000	      1458 ns/op	 130.28 MB/s	    2576 B/op	      11 allocs/op
BenchmarkParse/small/fastjson-get-4                 	 1000000	      1585 ns/op	 119.86 MB/s	    2576 B/op	      11 allocs/op
BenchmarkParse/medium/stdjson-map-4                 	  100000	     22231 ns/op	 104.76 MB/s	   10195 B/op	     208 allocs/op
BenchmarkParse/medium/stdjson-struct-4              	   50000	     25822 ns/op	  90.19 MB/s	    9174 B/op	     258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4        	  200000	     10289 ns/op	 226.35 MB/s	     280 B/op	       5 allocs/op
BenchmarkParse/medium/fastjson-4                    	  200000	      8781 ns/op	 265.22 MB/s	   17528 B/op	      19 allocs/op
BenchmarkParse/medium/fastjson-get-4                	  200000	      8879 ns/op	 262.29 MB/s	   17528 B/op	      19 allocs/op
BenchmarkParse/large/stdjson-map-4                  	    3000	    337614 ns/op	  83.28 MB/s	  210761 B/op	    2785 allocs/op
BenchmarkParse/large/stdjson-struct-4               	   10000	    146534 ns/op	 191.89 MB/s	   15617 B/op	     353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4         	   10000	    129590 ns/op	 216.98 MB/s	     280 B/op	       5 allocs/op
BenchmarkParse/large/fastjson-4                     	   20000	     65955 ns/op	 426.32 MB/s	  165288 B/op	      37 allocs/op
BenchmarkParse/large/fastjson-get-4                 	   20000	     65889 ns/op	 426.74 MB/s	  165288 B/op	      37 allocs/op
BenchmarkParse/canada/stdjson-map-4                 	      30	  39993266 ns/op	  56.29 MB/s	12260535 B/op	  392539 allocs/op
BenchmarkParse/canada/stdjson-struct-4              	      50	  40921580 ns/op	  55.01 MB/s	12260137 B/op	  392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4        	     200	   9418370 ns/op	 239.01 MB/s	     280 B/op	       5 allocs/op
BenchmarkParse/canada/fastjson-4                    	     100	  15223640 ns/op	 147.87 MB/s	25831338 B/op	      60 allocs/op
BenchmarkParse/canada/fastjson-get-4                	     100	  14208220 ns/op	 158.43 MB/s	25831338 B/op	      60 allocs/op
BenchmarkParse/citm/stdjson-map-4                   	     100	  16219920 ns/op	 106.49 MB/s	 5213919 B/op	   95401 allocs/op
BenchmarkParse/citm/stdjson-struct-4                	     200	   7732285 ns/op	 223.38 MB/s	    1993 B/op	      75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4          	     200	   7791690 ns/op	 221.67 MB/s	     281 B/op	       5 allocs/op
BenchmarkParse/citm/fastjson-4                      	     300	   4030463 ns/op	 428.54 MB/s	 7909291 B/op	      59 allocs/op
BenchmarkParse/citm/fastjson-get-4                  	     300	   3764146 ns/op	 458.86 MB/s	 7909289 B/op	      59 allocs/op
BenchmarkParse/twitter/stdjson-map-4                	     200	   6131945 ns/op	 102.99 MB/s	 2188071 B/op	   31266 allocs/op
BenchmarkParse/twitter/stdjson-struct-4             	     500	   2844898 ns/op	 221.98 MB/s	     409 B/op	       6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4       	     500	   2823320 ns/op	 223.68 MB/s	     408 B/op	       6 allocs/op
BenchmarkParse/twitter/fastjson-4                   	    1000	   1084364 ns/op	 582.38 MB/s	 2359209 B/op	      49 allocs/op
BenchmarkParse/twitter/fastjson-get-4               	    1000	   1068186 ns/op	 591.20 MB/s	 2359208 B/op	      49 allocs/op
PASS
ok  	github.com/valyala/fastjson	57.055s

fastjson is still slower than original, but now it is really faster than stdjson, rather than.fake results obtained by parsing the exact same json value again and again.
There is no need to reuse Parser now since it is always reset before Parse. There may still room for improvement by reusing Parser, worth investigating later.

codecov · 2019-01-22T13:47:36Z

Codecov Report

Merging #25 into master will decrease coverage by 0.23%.
The diff coverage is 94.87%.

@@            Coverage Diff             @@
##           master      #25      +/-   ##
==========================================
- Coverage   93.11%   92.88%   -0.24%     
==========================================
  Files           9        9              
  Lines        1046     1124      +78     
==========================================
+ Hits          974     1044      +70     
- Misses         49       55       +6     
- Partials       23       25       +2

Impacted Files	Coverage Δ
scanner.go	`100% <100%> (ø)`	⬆️
arena.go	`100% <100%> (ø)`	⬆️
update.go	`78.87% <82.35%> (+0.69%)`	⬆️
parser.go	`90.36% <96.9%> (+0.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cbbc967...651c591. Read the comment docs.

vincent-163 · 2019-01-22T15:19:15Z

After resuing the parser, here are the new benchmark results:

goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4         	  300000	      4660 ns/op	  40.77 MB/s	     960 B/op	      51 allocs/op
BenchmarkParse/small/stdjson-struct-4      	 1000000	      2362 ns/op	  80.43 MB/s	     224 B/op	       4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4         	 1000000	      1442 ns/op	 131.69 MB/s	     168 B/op	       2 allocs/op
BenchmarkParse/small/fastjson-4                     	 5000000	       356 ns/op	 532.46 MB/s	     192 B/op	       1 allocs/op
BenchmarkParse/small/fastjson-get-4                 	 3000000	       496 ns/op	 382.78 MB/s	     192 B/op	       1 allocs/op
BenchmarkParse/medium/stdjson-map-4                 	   50000	     25757 ns/op	  90.42 MB/s	   10195 B/op	     208 allocs/op
BenchmarkParse/medium/stdjson-struct-4              	   50000	     30011 ns/op	  77.60 MB/s	    9174 B/op	     258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4        	  100000	     11133 ns/op	 209.18 MB/s	     280 B/op	       5 allocs/op
BenchmarkParse/medium/fastjson-4                    	  500000	      3786 ns/op	 615.14 MB/s	    2688 B/op	       1 allocs/op
BenchmarkParse/medium/fastjson-get-4                	  300000	      3373 ns/op	 690.34 MB/s	    2688 B/op	       1 allocs/op
BenchmarkParse/large/stdjson-map-4                  	    3000	    375616 ns/op	  74.86 MB/s	  210749 B/op	    2785 allocs/op
BenchmarkParse/large/stdjson-struct-4               	   10000	    160624 ns/op	 175.05 MB/s	   15617 B/op	     353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4         	   10000	    136074 ns/op	 206.64 MB/s	     280 B/op	       5 allocs/op
BenchmarkParse/large/fastjson-4                     	   30000	     38770 ns/op	 725.24 MB/s	   28707 B/op	       1 allocs/op
BenchmarkParse/large/fastjson-get-4                 	   30000	     50005 ns/op	 562.30 MB/s	   28707 B/op	       1 allocs/op
BenchmarkParse/canada/stdjson-map-4                 	      20	  56081600 ns/op	  40.14 MB/s	12260568 B/op	  392540 allocs/op
BenchmarkParse/canada/stdjson-struct-4              	      30	  42409066 ns/op	  53.08 MB/s	12260171 B/op	  392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4        	     200	  10085860 ns/op	 223.19 MB/s	     281 B/op	       5 allocs/op
BenchmarkParse/canada/fastjson-4                    	     200	   7093665 ns/op	 317.33 MB/s	 3185747 B/op	       2 allocs/op
BenchmarkParse/canada/fastjson-get-4                	     200	   6163670 ns/op	 365.21 MB/s	 3185753 B/op	       3 allocs/op
BenchmarkParse/citm/stdjson-map-4                   	     100	  18414800 ns/op	  93.79 MB/s	 5214044 B/op	   95402 allocs/op
BenchmarkParse/citm/stdjson-struct-4                	     200	   9872230 ns/op	 174.96 MB/s	    1994 B/op	      75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4          	     200	   8464495 ns/op	 204.05 MB/s	     284 B/op	       5 allocs/op
BenchmarkParse/citm/fastjson-4                      	     500	   2291844 ns/op	 753.63 MB/s	 1827241 B/op	       1 allocs/op
BenchmarkParse/citm/fastjson-get-4                  	    1000	   2131119 ns/op	 810.47 MB/s	 1777878 B/op	       1 allocs/op
BenchmarkParse/twitter/stdjson-map-4                	     200	   6208910 ns/op	 101.71 MB/s	 2188398 B/op	   31267 allocs/op
BenchmarkParse/twitter/stdjson-struct-4             	     500	   3027482 ns/op	 208.59 MB/s	     409 B/op	       6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4       	     500	   2823266 ns/op	 223.68 MB/s	     408 B/op	       6 allocs/op
BenchmarkParse/twitter/fastjson-4                   	    2000	    775716 ns/op	 814.10 MB/s	  645840 B/op	       1 allocs/op
BenchmarkParse/twitter/fastjson-get-4               	    2000	    787570 ns/op	 801.85 MB/s	  645840 B/op	       1 allocs/op
PASS
ok  	github.com/valyala/fastjson	62.245s

Now it's much closer to the original performance. Here is the CPU profile:

File: fastjson.test
Type: cpu
Time: Jan 22, 2019 at 11:11pm (CST)
Duration: 21.35s, Total samples = 52.83s (247.49%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 36.16s, 68.45% of 52.83s total
Dropped 141 nodes (cum <= 0.26s)
Showing top 10 nodes out of 60
      flat  flat%   sum%        cum   cum%
     6.45s 12.21% 12.21%     42.34s 80.14%  github.com/valyala/fastjson.(*Parser).parseObject
     5.42s 10.26% 22.47%      8.69s 16.45%  github.com/valyala/fastjson.skipWS
     5.01s  9.48% 31.95%     42.41s 80.28%  github.com/valyala/fastjson.(*Parser).parseValue
     3.59s  6.80% 38.75%      3.59s  6.80%  github.com/valyala/fastjson.parseRawKey
     3.36s  6.36% 45.11%      3.36s  6.36%  runtime.memmove
     3.27s  6.19% 51.30%      3.27s  6.19%  github.com/valyala/fastjson.skipWSSlow
     2.83s  5.36% 56.65%      3.54s  6.70%  runtime.findObject
     2.43s  4.60% 61.25%      2.88s  5.45%  github.com/valyala/fastjson.(*Parser).getValue
     2.01s  3.80% 65.06%     27.18s 51.45%  github.com/valyala/fastjson.(*Parser).parseArray
     1.79s  3.39% 68.45%      5.55s 10.51%  runtime.wbBufFlush1

harshavardhana · 2019-08-25T19:19:12Z

@valyala can this be merged as well?

valyala · 2019-08-28T20:28:50Z

This is great idea, since it requires less RAM when parsing JSON structs with non-constant structure, but it slows down parsing a bit. I tried playing with this PR in the mem-optimize2 branch, but it isn't ready for merging into master.

azlan · 2020-05-28T09:18:04Z

Any update for this PR?
I like the concept of this library, but I'm having memory leaks using parserPool, using it without pool is slow.

vincent-163 added 3 commits January 22, 2019 19:26

Replace deceptive benchmark code and create new parsers on each run

3a66ca4

Move cache to Parser struct

7a22541

Rewrite stack-based parser

036cdae

Reuse parser

651c591

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewriting the parser using stacks #25

Rewriting the parser using stacks #25

vincent-163 commented Jan 22, 2019

codecov bot commented Jan 22, 2019 •

edited

Loading

vincent-163 commented Jan 22, 2019

harshavardhana commented Aug 25, 2019

valyala commented Aug 28, 2019

azlan commented May 28, 2020

Rewriting the parser using stacks #25

Are you sure you want to change the base?

Rewriting the parser using stacks #25

Conversation

vincent-163 commented Jan 22, 2019

codecov bot commented Jan 22, 2019 • edited Loading

Codecov Report

vincent-163 commented Jan 22, 2019

harshavardhana commented Aug 25, 2019

valyala commented Aug 28, 2019

azlan commented May 28, 2020

codecov bot commented Jan 22, 2019 •

edited

Loading