Remove use of generator functions #314

arthurschreiber · 2015-09-08T19:31:58Z

The switch to a generator based parser in #285 brought better performance and decreased memory usage with large column values, but significantly decreased performance when processing a large number of result rows containing small column values. See #303 for more information.

This PR replaces all use of generator functions with a callback based approach. It is definately less elegant, but also has a much better performance profile.

Here's the benchmarks so far:

$ node benchmarks/
Many result rows x 7.19 ops/sec ±1.40% (38 runs sampled)
Memory: 25.03125 MiB
inserting nvarchar(max) with 5242880 chars x 15.00 ops/sec ±1.60% (69 runs sampled)
Memory: 28.88671875 MiB
inserting nvarchar(max) with 4 chars x 70.06 ops/sec ±0.38% (29 runs sampled)
Memory: 7.65234375 MiB
inserting varbinary(4) with 4 bytes x 70.06 ops/sec ±0.41% (29 runs sampled)
Memory: 8.015625 MiB
inserting varbinary(max) with 50 MiB x 4.03 ops/sec ±3.67% (24 runs sampled)
Memory: 90.84765625 MiB
inserting varbinary(max) with 5 MiB x 34.24 ops/sec ±2.63% (57 runs sampled)
Memory: 37.046875 MiB
inserting varbinary(max) with 4 bytes x 70.09 ops/sec ±0.35% (29 runs sampled)
Memory: 7.9765625 MiB

compared to tedious 1.12.3:

$ node benchmarks
Many result rows x 1.09 ops/sec ±1.35% (10 runs sampled)
Memory: 13.9296875 MiB
inserting nvarchar(max) with 5242880 chars x 10.33 ops/sec ±2.11% (51 runs sampled)
Memory: 29.625 MiB
inserting nvarchar(max) with 4 chars x 72.84 ops/sec ±0.52% (40 runs sampled)
Memory: 8.08984375 MiB
inserting varbinary(4) with 4 bytes x 71.34 ops/sec ±0.30% (34 runs sampled)
Memory: 8.1015625 MiB
inserting varbinary(max) with 50 MiB x 2.57 ops/sec ±3.22% (17 runs sampled)
Memory: 97.53515625 MiB
inserting varbinary(max) with 5 MiB x 21.09 ops/sec ±2.60% (51 runs sampled)
Memory: 69.35546875 MiB
inserting varbinary(max) with 4 bytes x 72.04 ops/sec ±0.43% (37 runs sampled)
Memory: 7.890625 MiB

and tedious 1.11.5:

$ node benchmarks/
Many result rows x 18.17 ops/sec ±1.48% (60 runs sampled)
Memory: 23.84375 MiB
inserting nvarchar(max) with 5242880 chars x 1.79 ops/sec ±1.09% (13 runs sampled)
Memory: 29.3515625 MiB
inserting nvarchar(max) with 4 chars x 67.65 ops/sec ±0.32% (19 runs sampled)
Memory: 5.5 MiB
inserting varbinary(4) with 4 bytes x 67.80 ops/sec ±0.28% (21 runs sampled)
Memory: 5.77734375 MiB
inserting varbinary(max) with 50 MiB x 0.08 ops/sec ±0.94% (5 runs sampled)
Memory: 208.00390625 MiB
inserting varbinary(max) with 5 MiB x 6.77 ops/sec ±1.67% (36 runs sampled)
Memory: 133.6640625 MiB
inserting varbinary(max) with 4 bytes x 67.77 ops/sec ±0.32% (21 runs sampled)
Memory: 5.796875 MiB

As you can see, we're still not back to the old performance, but the difference is not as big as before. There's a lot more that can be done to improve the performance even further. Basic profiling shows that a lot of time is currently spent recompiling the same functions over and over again, and there's also quite a few functions which either are never optimized or get deoptimized.

bretcope · 2015-09-08T19:34:55Z

What functions in particular are performance bottlenecks?

arthurschreiber · 2015-09-08T19:51:15Z

Basically, it's every function that makes use of a generator. it's partially due to regenerator, the internal invoke that is called on every generator step is getting deoptimized pretty early on, and deoptimization tracing will dump a huge list of a ton of related functions that get deoptimized as a result of that.

The current callback based approach is quite naive, because it goes through the awaitData function every time a read(Something) method is called on the parser. Token parsing can be made much, much more effective by reading a bunch of data into a buffer in one swoop and reading from that buffer directly, instead of jumping through multiple indirection hoops.

I've learnt from the mistake I made with #303 / #285 and will ensure that I add benchmarks for many different parser cases and for each individual token type, helping us to benchmark and profile every change we do and to pinpoint bottlenecks.

arthurschreiber · 2015-10-23T10:05:49Z

Node v4.2.1 / Tedious 1.12.3

$ node benchmarks
Many result rows x 1.38 ops/sec ±1.43% (11 runs sampled)
Memory: 33.6171875 MiB
inserting nvarchar(max) with 5242880 chars x 11.11 ops/sec ±1.76% (57 runs sampled)
Memory: 81.47265625 MiB
inserting nvarchar(max) with 4 chars x 523 ops/sec ±1.63% (86 runs sampled)
Memory: 19.55078125 MiB
inserting varbinary(4) with 4 bytes x 540 ops/sec ±1.41% (85 runs sampled)
Memory: 19.65234375 MiB
inserting varbinary(max) with 50 MiB x 2.53 ops/sec ±1.93% (17 runs sampled)
Memory: 106.3671875 MiB
inserting varbinary(max) with 5 MiB x 22.62 ops/sec ±3.96% (58 runs sampled)
Memory: 59.41015625 MiB
inserting varbinary(max) with 4 bytes x 516 ops/sec ±2.02% (86 runs sampled)
Memory: 19.890625 MiB
parsing `COLMETADATA` tokens x 126 ops/sec ±2.53% (87 runs sampled)
Memory: 14.71484375 MiB
parsing `DONEPROC` tokens x 56.52 ops/sec ±3.38% (70 runs sampled)
Memory: 44.9765625 MiB
parsing tokens for 100 rows x 75.63 ops/sec ±2.66% (75 runs sampled)
Memory: 15.0390625 MiB

Node v4.2.1 / Tedious 1.12.3 plus changes in this branch

$ node benchmarks
Many result rows x 3.93 ops/sec ±1.94% (24 runs sampled)
Memory: 72.91796875 MiB
inserting nvarchar(max) with 5242880 chars x 20.53 ops/sec ±3.04% (54 runs sampled)
Memory: 119.69921875 MiB
inserting nvarchar(max) with 4 chars x 531 ops/sec ±1.33% (83 runs sampled)
Memory: 21.94921875 MiB
inserting varbinary(4) with 4 bytes x 546 ops/sec ±1.61% (84 runs sampled)
Memory: 21.6015625 MiB
inserting varbinary(max) with 50 MiB x 4.64 ops/sec ±2.43% (27 runs sampled)
Memory: 131.46875 MiB
inserting varbinary(max) with 5 MiB x 43.46 ops/sec ±3.82% (72 runs sampled)
Memory: 85.38671875 MiB
inserting varbinary(max) with 4 bytes x 531 ops/sec ±1.37% (86 runs sampled)
Memory: 21.90625 MiB
parsing `COLMETADATA` tokens x 147 ops/sec ±2.17% (80 runs sampled)
Memory: 44.92578125 MiB
parsing `DONEPROC` tokens x 137 ops/sec ±2.07% (75 runs sampled)
Memory: 43.90234375 MiB
parsing tokens for 100 rows x 168 ops/sec ±2.47% (76 runs sampled)
Memory: 45.30078125 MiB

I have some more changes which further bring performance back to the levels of 1.11.5 all across the board, but I'll release the changes that are in this PR as 1.12.4.

Remove use of generator functions

patriksimek mentioned this pull request Sep 19, 2015

Transient failures on SQL Azure v12 tediousjs/node-mssql#196

Closed

arthurschreiber added 4 commits October 23, 2015 11:47

Rewrite the Readable Packet Stream to not use generators.

e80c45c

Rewrite all token parsing code to not make use of generators.

c31064e

Fix Parser#isEnd.

2afe782

Prevent class thrashing and resulting deoptimizations.

abc426f

arthurschreiber force-pushed the arthur/perf branch from ffe58fa to abc426f Compare October 23, 2015 10:03

Don't create Buffer slices unless necessary.

85855ab

arthurschreiber added a commit that referenced this pull request Oct 23, 2015

Merge pull request #314 from pekim/arthur/perf

68b6ace

Remove use of generator functions

arthurschreiber merged commit 68b6ace into master Oct 23, 2015

chdh mentioned this pull request Feb 23, 2017

Pause Query Results #181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove use of generator functions #314

Remove use of generator functions #314

arthurschreiber commented Sep 8, 2015

bretcope commented Sep 8, 2015

arthurschreiber commented Sep 8, 2015

arthurschreiber commented Oct 23, 2015

Remove use of generator functions #314

Remove use of generator functions #314

Conversation

arthurschreiber commented Sep 8, 2015

bretcope commented Sep 8, 2015

arthurschreiber commented Sep 8, 2015

arthurschreiber commented Oct 23, 2015