Improve allocation speed and reduce memory copying to increase HTTP pars... #199

rudi-cilibrasi · 2014-12-23T06:09:40Z

...er performance by 1-2 (and occasionally more) percent in many cases such as benchmark/http/end-vs-write-end.js based on slab allocator todo by Ben.

bnoordhuis · 2014-12-23T12:37:45Z

Before we go into specifics, am I right to understand that you ran the benchmarks and saw an improvement? Can you post the numbers? I would expect a difference only with benchmarks that split the request headers over multiple TCP chunks and I'm not sure if we actually have those; if not, you are more than welcome to add some. :-)

I should mention that the http benchmarks tend to have fairly high variance. I usually benchmark using median-of-three or median-of-five, with a few seconds cool-off between each run. If the difference is only 1 or 2%, it's statistically insignificant and probably just noise (because of the aforementioned variance) unless it's consistent across many (10, 20 or more) runs.

rudi-cilibrasi · 2014-12-23T15:37:29Z

Yes I saw the highish variance. And I am guessing the new allocator is a
bit faster (1-2%) in many cases and much faster (2-10%) in a few. I would
appreciate some confirmation however. I ran dozens of times many
benchmarks however only by hand so soon I will try the automatic averaging
as you suggested. I was already guessing (without enough reading) we were
not hitting the very chunked case however I did see some improvement in
chunked.js particularly. You are right that was the original main focus
of the micro-optimization. Before I post the numbers that I saw improve I
want to explain the three sources of speedup I can imagine; the first
reason is limited like you said but I guess the other three can still apply.

split request headers over multiple chunks (limited as you said)
allocate 5 blocks per large block new[] allocation
recycle freed blocks so they are reused rather than deallocated/allocated
reduced variation in memory chunk size

I am not sure but I think the speedup is there in most cases even without
the multiple TCP chunks it is just not very significant usually. But in
certain cases it is. For example using

node benchmark/http/end-vs-write-end.js | grep buf

yields originally these measurements:
http/end-vs-write-end.js type=buf kb=64 c=100 method=write: 9277
http/end-vs-write-end.js type=buf kb=64 c=100 method=end : 9113
http/end-vs-write-end.js type=buf kb=128 c=100 method=write: 8540
http/end-vs-write-end.js type=buf kb=128 c=100 method=end : 8456

and after the patch yields

http/end-vs-write-end.js type=buf kb=64 c=100 method=write: 9830
http/end-vs-write-end.js type=buf kb=64 c=100 method=end : 9631
http/end-vs-write-end.js type=buf kb=128 c=100 method=write: 8552
http/end-vs-write-end.js type=buf kb=128 c=100 method=end : 8405

I guess that is pretty typical. It is roughly equivalent in many cases,
sometimes 1-4% better, and in some cases 5-10% better.

I should mention that the TCP fragmentation case was the main reason for
the patch so I would love to see such a benchmark but was not sure I would
know how to easily write it yet.

Thanks for the feedback.

On Tue, Dec 23, 2014 at 4:38 AM, Ben Noordhuis notifications@github.com
wrote:

Before we go into specifics, am I right to understand that you ran the
benchmarks and saw an improvement? Can you post the numbers? I would expect
a difference only with benchmarks that split the request headers over
multiple TCP chunks and I'm not sure if we actually have those; if not, you
are more than welcome to add some. :-)

I should mention that the http benchmarks tend to have fairly high
variance. I usually benchmark using median-of-three or median-of-five, with
a few seconds cool-off between each run. If the difference is only 1 or 2%,
it's statistically insignificant and probably just noise (because of the
aforementioned variance) unless it's consistent across many (10, 20 or
more) runs.

—
Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

bnoordhuis · 2014-12-24T11:08:35Z

I'm running on battery power right now - not an ideal environment for benchmarking - so I can't really confirm if it's faster. I did some quick tests yesterday but the results were inconclusive.

I had a look at the existing benchmarks and I don't think any of them really stress the 'split headers' case. I would suggest adding one that:

Starts a HTTP server on a UNIX socket / pipe instead of TCP to avoid loopback TCP ACK latency. On Linux, that introduces 40 ms latency per TCP packet; see [uv] Feedback about the performance orthecreedence/cl-async#98 for an example of the disastrous effect that can have on throughput.
Start the client in a sub-process and use the net module to write raw HTTP. If you are feeling adventurous, you can use the raw bindings to remove the overhead of the net streams layer. There are examples in benchmark/net.

By the way, if you just want to run the buf tests, you can do so like this:

$ node benchmark/http/end-vs-write-end.js type=buf

rudi-cilibrasi · 2014-12-24T16:59:04Z

Thank you for these great tips. I have already started reading the
benchmark/net examples you mentioned. It looks doable to me but it may
take me some time because I have not worked with node networking at the
byte level as much as C. I really appreciate the git tips you and Colin
gave me and have already practiced using them for the first time in this
pull request. Please let me know if I made any mistake in my first commit
squash operation. I will work towards getting together a better
performance test and add it to the branch.

On Wed, Dec 24, 2014 at 3:08 AM, Ben Noordhuis notifications@github.com
wrote:

I'm running on battery power right now - not an ideal environment for
benchmarking - so I can't really confirm if it's faster. I did some quick
tests yesterday but the results were inconclusive.

I had a look at the existing benchmarks and I don't think any of them
really stress the 'split headers' case. I would suggest adding one that:

Starts a HTTP server on a UNIX socket / pipe instead of TCP to avoid
loopback TCP ACK latency. On Linux, that introduces 40 ms latency per TCP
packet; see orthecreedence/cl-async#98
orthecreedence/cl-async#98 for an example
of the disastrous effect that can have on throughput.
2.

Start the client in a sub-process and use the net module to write raw
HTTP. If you are feeling adventurous, you can use the raw bindings to
remove the overhead of the net streams layer. There are examples in
benchmark/net.

By the way, if you just want to run the buf tests, you can do so like this:

$ node benchmark/http/end-vs-write-end.js type=buf

—
Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

rudi-cilibrasi · 2014-12-25T01:20:05Z

I pushed the new node javascript client and server in the benchmarks
directory. I noticed something confusing and mentioned it in the commit
log. I look forward to your advice, reaction, or information. Best
regards,

Rudi

On Wed, Dec 24, 2014 at 8:58 AM, Rudi Cilibrasi cilibrar@gmail.com wrote:

Thank you for these great tips. I have already started reading the
benchmark/net examples you mentioned. It looks doable to me but it may
take me some time because I have not worked with node networking at the
byte level as much as C. I really appreciate the git tips you and Colin
gave me and have already practiced using them for the first time in this
pull request. Please let me know if I made any mistake in my first commit
squash operation. I will work towards getting together a better
performance test and add it to the branch.

On Wed, Dec 24, 2014 at 3:08 AM, Ben Noordhuis notifications@github.com
wrote:

I'm running on battery power right now - not an ideal environment for
benchmarking - so I can't really confirm if it's faster. I did some quick
tests yesterday but the results were inconclusive.

I had a look at the existing benchmarks and I don't think any of them
really stress the 'split headers' case. I would suggest adding one that:

Starts a HTTP server on a UNIX socket / pipe instead of TCP to avoid
loopback TCP ACK latency. On Linux, that introduces 40 ms latency per TCP
packet; see orthecreedence/cl-async#98
orthecreedence/cl-async#98 for an example
of the disastrous effect that can have on throughput.
2.

Start the client in a sub-process and use the net module to write raw
HTTP. If you are feeling adventurous, you can use the raw bindings to
remove the overhead of the net streams layer. There are examples in
benchmark/net.

By the way, if you just want to run the buf tests, you can do so like
this:

$ node benchmark/http/end-vs-write-end.js type=buf

—
Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

novacrazy · 2014-12-26T18:27:20Z

I think the idea of having a tiny memory pool used in one area is a bad idea, especially with the implementation in the same file. It would probably improve performance some, but I think if memory pools were to be used in io.js it should be done all around with a larger subsystem for handling it.

Perhaps something like boost::pool, which is tried and tested. Otherwise, trusting the OS not to handle memory too slowly is reasonable, or just optimizing current code to reduce copying and reallocations.

rudi-cilibrasi · 2014-12-26T18:40:19Z

I wanted to let you know that the leak exists in v0.12 without the new code
I introduced.

In general I agree with you about memory allocation code that is over
specific. I also agree more general is better and would want to check out
and compare boost pool etc were this to go further. I am more interested
in the apparent memory leak when running the new benchmark code against the
old v0.12 release without my additions to http_parser. I found some old
bug reports against earlier versions that suggest there may be a leak in
the http_parser and that concerns me since iojs is so network oriented and
high speed enough for a memory leak to make a big difference in server
stability.

On Fri, Dec 26, 2014 at 10:27 AM, Aaron Trent notifications@github.com
wrote:

I think the idea of having a tiny memory pool used in one area is a bad
idea, especially with the implementation in the same file. It would
probably improve performance some, but I think if memory pools were to be
used in io.js it should be done all around with a larger subsystem for
handling it.

Perhaps something like boost::pool, which is tried and tested. Otherwise,
trusting the OS not to handle memory too slowly is reasonable, or just
optimizing current code to reduce copying and reallocations.

—
Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

bnoordhuis · 2014-12-28T17:42:43Z

@rudi-cilibrasi Can you make the benchmarks part of the benchmark suite? Take a look at the files in (for example) benchmark/net for guidance; the goal is to make them fully automatic. I'll leave some comments on the PR itself.

bnoordhuis · 2014-12-28T17:43:57Z

benchmark/chunky_http_client.js

+  if (has_keep_alive) {
+    WriteWithCRLF(channel, "Connection: keep-alive");
+  }
+  WriteWithCRLF(channel, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");


Please wrap at 80 columns and please use single quotes. (The latter is not a hard requirement but it's the norm in core and code that deviates always looks a little grating.)

This allocator attempts to increase performance by a few percent in benchmark/http/end-vs-write-end.js and other cases. This experimental effort was initiated by a TODO indicator from Ben.

Fishrock123 · 2015-02-10T17:01:41Z

ping @bnoordhuis

rudi-cilibrasi · 2015-02-13T18:08:59Z

i'm happy to let this one go if you like since the performance boost was uncertain last i checked and it is a bit more complex. if somebody likes it or gets good test results i am happy to see it in also. my feeling is that my other PR is more quality and hopefully more likely to go in now but I'm still waiting for more feedback. #228

Fishrock123 · 2015-05-17T15:21:01Z

Since this doesn't look like it's going anywhere, going to close in light of #1457, let me know if it should be re-opened.

- Delete the node_api_*internal.h headers - Don't crash when casting numbers when V8_ENABLE_CHECKS is defined - Return 0 when getting NaN value as an integer - Always set callback data internal field to prevent crashing when V8_ENABLE_CHECKS is defined - Style: replace `if (ptr)` with `if (ptr != nullptr)` - Fix extern "C" in node_api_async.h - Minor doc changes

bnoordhuis reviewed Dec 28, 2014
View reviewed changes

piscisaureus closed this Dec 30, 2014

piscisaureus reopened this Dec 30, 2014

src: improve allocation speed in node_http_parser

ec926a1

This allocator attempts to increase performance by a few percent in benchmark/http/end-vs-write-end.js and other cases. This experimental effort was initiated by a TODO indicator from Ben.

mscdex added the http Issues or PRs related to the http subsystem. label Mar 22, 2015

chrisdickinson added the land-on-master label Apr 17, 2015

brendanashworth added the c++ Issues and PRs that require attention from people who are familiar with C++. label May 1, 2015

Fishrock123 closed this May 17, 2015

ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Feb 16, 2016

coderhard mentioned this pull request Jun 28, 2018

internal error in node debugger while debuging node/mysql code #21576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve allocation speed and reduce memory copying to increase HTTP pars... #199

Improve allocation speed and reduce memory copying to increase HTTP pars... #199

rudi-cilibrasi commented Dec 23, 2014

bnoordhuis commented Dec 23, 2014

rudi-cilibrasi commented Dec 23, 2014

bnoordhuis commented Dec 24, 2014

rudi-cilibrasi commented Dec 24, 2014

rudi-cilibrasi commented Dec 25, 2014

novacrazy commented Dec 26, 2014

rudi-cilibrasi commented Dec 26, 2014

bnoordhuis commented Dec 28, 2014

bnoordhuis Dec 28, 2014

Fishrock123 commented Feb 10, 2015

rudi-cilibrasi commented Feb 13, 2015

Fishrock123 commented May 17, 2015

Improve allocation speed and reduce memory copying to increase HTTP pars... #199

Improve allocation speed and reduce memory copying to increase HTTP pars... #199

Conversation

rudi-cilibrasi commented Dec 23, 2014

bnoordhuis commented Dec 23, 2014

rudi-cilibrasi commented Dec 23, 2014

bnoordhuis commented Dec 24, 2014

rudi-cilibrasi commented Dec 24, 2014

rudi-cilibrasi commented Dec 25, 2014

novacrazy commented Dec 26, 2014

rudi-cilibrasi commented Dec 26, 2014

bnoordhuis commented Dec 28, 2014

bnoordhuis Dec 28, 2014

Choose a reason for hiding this comment

Fishrock123 commented Feb 10, 2015

rudi-cilibrasi commented Feb 13, 2015

Fishrock123 commented May 17, 2015