Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: update http.md for consistency #10715

Closed
wants to merge 9 commits into from
99 changes: 58 additions & 41 deletions doc/api/http.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,26 +48,39 @@ list like the following:
added: v0.3.4
-->

The HTTP Agent is used for pooling sockets used in HTTP client
requests.

The HTTP Agent also defaults client requests to using
`Connection: keep-alive`. If no pending HTTP requests are waiting on a
socket to become free the socket is closed. This means that Node.js's
pool has the benefit of keep-alive when under load but still does not
require developers to manually close the HTTP clients using
KeepAlive.

If you opt into using HTTP KeepAlive, you can create an Agent object
with that flag set to `true`. (See the [constructor options][].)
Then, the Agent will keep unused sockets in a pool for later use. They
will be explicitly marked so as to not keep the Node.js process running.
However, it is still a good idea to explicitly [`destroy()`][] KeepAlive
agents when they are no longer in use, so that the Sockets will be shut
down.

Sockets are removed from the agent's pool when the socket emits either
a `'close'` event or a special `'agentRemove'` event. This means that if
An `Agent` is responsible for managing connection persistence
and reuse for HTTP clients. It maintains a queue of pending requests
for a given host and port, reusing a single socket connection for each
until the queue is empty, at which time the socket is destroyed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:

at which time the socket is either destroyed, or put into a pool where it is kept to be used again for requests to the same host and port. Where it is destroyed or pooled depends on the keepAlive(link to Agent ctor docs) option.

When a socket connection is no longer writable, for example, if a server
Copy link
Contributor

@sam-github sam-github Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is useful, but I think its overly specific (what if the socket is no longer readable?). How about:

Pooled connections have TCP Keep-Alives enabled for them, but servers may still close idle connections, in which case they will be removed from the pool and a new connection will be made when a new HTTP request is made for that host and port. Servers may also refuse to allow multiple requests over the same connection, in which case the connection will be have to be remade for every request and cannot be pooled. The Agent will still make the requests to that server, but each one will occur over a new connection.

closes the connection, the socket is destroyed.

Unless an `Agent` is expressly provided in the [`http.request()`]
options, all HTTP client requests will use the default
[`http.globalAgent`].

In addition to managing connection persistence for queued HTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, they sound like the same thing. I think the concepts here are

  • when there are lots of requests, the connection can be kept open til they are all serviced
  • whether there are lots of requests of not, the connection can be kept open between when there are requests

The thing about describing this in terms of a request queue is that there is no request queue in the API, so its described in terms of something internal that isn't known to the user.

Copy link
Member Author

@lance lance Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's confusing. But this is how I read the code here https://github.com/nodejs/node/blob/master/lib/_http_agent.js#L47-L81. If there are pending requests the sockets are reused and the keepAlive option is ignored. But if no requests are pending and keepAlive is truthy, when a socket emits a 'free' event, they are pooled. Otherwise, they are destroyed on line80. We can see the client emits 'free' here https://github.com/nodejs/node/blob/master/lib/_http_client.js#L551-L568 if req.shouldKeepAlive is false, and the request's shouldKeepAlive mirrors agent.shouldKeepAlive here https://github.com/nodejs/node/blob/master/lib/_http_client.js#L216-L222.

Confusing, but see also @bnoordhuis comment here #10774 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addendum:

I would welcome some recommended wording here, because I have a hard time figuring out how to describe this behavior without leaking internals. The two concepts that you noted are correct. But by simply saying "when there are lots of requests" it's not clear what that means or how lots of simultaneous requests would be created. I admit it's not clear in either case. Ugh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my two suggestions above, together, make this entire paragraph unnecessary.

requests, an `Agent` can be used for pooling sockets across multiple
client requests. When providing `{ keepAlive: true }` among the
[constructor options], the `Agent` will not destroy sockets when
the request queue has been emptied. Rather, it will pool these
sockets for later requests to the same host.

By providing `{ keepAlive: true }` as a constructor option to the the
Copy link
Contributor

@mscdex mscdex Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be merged with the previous paragraph, otherwise it kind of seems like it's talking about a different situation, when in fact it's just providing more description of keepAlive: true.

Also, I think it would be good to note that a socket will be destroyed no matter what if the server responds with a Connection: close.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added text to this effect in my suggestions above, I think this paragraph can be removed now.

HTTP `Agent`, sockets will not only be pooled, but the HTTP header
`Connection: keep-alive` will be sent by default for all client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true, are you sure? Without Connection: keep-alive, your first use case above (queued HTTP requests) would not work, because the connection would be closed after every single request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the logic for the Connection headers in OutgoingMessage. A Connection: keep-alive header is sent for all requests by default, even without keepAlive: true as an Agent option. But by the logic above, once the request queue is empty, the socket is destroyed unless keepAlive is true.

So Connection: keep-alive is always sent. But unless the agent has been configured to pool sockets, this only matters for queued requests - despite the fact that it's not clear to me how you would queue a request.

Frankly, all of this code is super confusing and I could be wrong. The fact that keep-alive logic is spread across 3(!!!!) files is crazy, imo.

Copy link
Contributor

@sam-github sam-github Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is true. If Connection: keep-alive is not sent, then it is not possible to send multiple HTTP requests over the same TCP connection. I won't be able to look at this until next week, but at some point reading the code can be less productive than writing a unit test. If you don't get to it, I'll write one that exercises a the Agent in the various keep alive variants, and asserts the value of keep alive (EDIT: of Connection: xxx I meant) on the server. Writing a test is an excellent way to prove the docs are correct, and that the doc and implementation do not diverge. It may even be possible that this is tested, but I suspect that only the observed behaviour from the client is tested, not what the server is seeing in terms of connection reuse and/or pooling.

requests. This also results in periodic TCP `SO_KEEPALIVE` requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using the socket option name here makes sense. I don't think there is any harm in calling it 'TCP keepalive' here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex imma let you and @sam-github fight this one out. I am ambivalent about which is better, but I made this change based on earlier comments requesting low-level details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean SO_KEEPALIVE is really an OS-specific constant, although most platforms we support do currently use that name, but it's not a necessity.

from the client to the server.

When a connection is closedby the client or the server, it is removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/closedby/closed by/

from the pool. Any unused sockets in the pool
will be unrefed so as not to keep the Node.js process running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a better term could be used rather than 'unrefed' because not everyone will necessarily know what that means. How about 'detached from the event loop' or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex I like that phrasing. @sam-github you recommended unref here. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, Brian, but I don't think there is any phrasing that will convey what is happening unless the reader understands the concept of refs, and also of what keeps node's event loop alive - which many people admittedly don't. The socket is not really "detached from the loop", its still epolled in the loop, its just not counted when uv is considering whether the loop is complete. Its hard to find a good phrase, so I think using the unref jargon is better, because people who don't know what it means can follow the provided link to socket.unref(). I also think that all the .unref() methods in node should link to a single section somewhere (or a guide @Qard?) that describes handles and requests, and what keeps node alive, but that is out of scope here. I'd like not to see little, inconsistent and incomplete attempts at describing this peppered throughout the docs. That said, maybe the sentence can be completed:

process running when there are no outstanding requests

to make it more clear what the user-visible behaviour is (unref is just mechanism).

(see [socket.unref()]). It is good practice, however, to [`destroy()`][]
Copy link
Contributor

@sam-github sam-github Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the "however", and maybe put this sentence as a new paragraph. Its good practice to destroy resources not in use independent of node's concept of "is the loop still refed"

a client `Agent` when it is no longer in use, because unused sockets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps s/a client `Agent`/an `Agent` instance/ ?

consume OS resources.

Sockets are removed from an agent's pool when the socket emits either
a `'close'` event or an `'agentRemove'` event. This means that if
you intend to keep one HTTP request open for a long time and don't
want it to stay in the pool you can do something along the lines of:

Expand All @@ -79,7 +92,11 @@ http.get(options, (res) => {
});
```

Alternatively, you could just opt out of pooling entirely using
You may also use an agent for an individual request. By providing
`{agent: false}` as an option to the `http.get()` or `http.request()`
functions, a one-time use `Agent` with default options and no connection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true? An Agent is allocated even if it doesn't need one, as opposed to just making a direct HTTP request with no agent at all? I'm surprised, so if true, its good its documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

pooling will be used for the client connection.
Copy link
Contributor

@mscdex mscdex Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the phrase 'and no connection pooling will be used for the client connection' is a little confusing when talking about a single socket. Perhaps just remove the phrase altogether, since that is basically what was done in the description for the agent option in http.request().


`agent:false`:

```js
Expand All @@ -100,11 +117,12 @@ added: v0.3.4

* `options` {Object} Set of configurable options to set on the agent.
Can have the following fields:
* `keepAlive` {Boolean} Keep sockets around in a pool to be used by
other requests in the future. Default = `false`
* `keepAliveMsecs` {Integer} When using HTTP KeepAlive, how often
to send TCP KeepAlive packets over sockets being kept alive.
Default = `1000`. Only relevant if `keepAlive` is set to `true`.
* `keepAlive` {Boolean} Keep sockets around even when there are no
outstanding requests, so it can be used for future requests without
having to reestablish an HTTP connection. Default = `false`
Copy link
Contributor

@mscdex mscdex Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think s/an HTTP/a TCP/ is more correct.

* `keepAliveMsecs` {Integer} When using the `keepAlive` option, how
often to send TCP `SO_KEEPALIVE` `ACK` packets. Ignored when the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here about word usage (suggestion: 'TCP keepalive packets').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be consistent with the wording ('packets' vs 'requests' for TCP keepalive), since we use different terminology earlier in the document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how often to send TCP keep alives. It is used as the initialDelay option to https://nodejs.org/api/net.html#net_socket_setkeepalive_enable_initialdelay

^--- describe its purpose in general, and then link to the detailed description so you aren't redocumenting what a TCP keep alive is. For one thing, the text above is probably wrong. Its not how often the keep alives are sent, its how long after there has been no network activity until a keep alive is sent. No keep alives will ever be sent as long as there are TCP packets being sent.

`keepAlive` option is false or undefined. Default = `1000`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put backticks around 'false' and 'undefined'.

* `maxSockets` {Number} Maximum number of sockets to allow per
host. Default = `Infinity`.
* `maxFreeSockets` {Number} Maximum number of sockets to leave open
Expand All @@ -114,7 +132,7 @@ added: v0.3.4
The default [`http.globalAgent`][] that is used by [`http.request()`][] has all
of these values set to their respective defaults.

To configure any of them, you must create your own [`http.Agent`][] object.
To configure any of them, you must create your own [`http.Agent`][] instance.

```js
const http = require('http');
Expand All @@ -136,7 +154,7 @@ added: v0.11.4
Produces a socket/stream to be used for HTTP requests.

By default, this function is the same as [`net.createConnection()`][]. However,
custom Agents may override this method in case greater flexibility is desired.
custom agents may override this method in case greater flexibility is desired.

A socket/stream can be supplied in one of two ways: by returning the
socket/stream from this function, or by passing the socket/stream to `callback`.
Expand All @@ -151,7 +169,7 @@ added: v0.11.4
Destroy any sockets that are currently in use by the agent.

It is usually not necessary to do this. However, if you are using an
agent with KeepAlive enabled, then it is best to explicitly shut down
agent with `keepAlive` enabled, then it is best to explicitly shut down
the agent when you know that it will no longer be used. Otherwise,
sockets may hang open for quite a long time before the server
terminates them.
Expand All @@ -164,7 +182,7 @@ added: v0.11.4
* {Object}

An object which contains arrays of sockets currently awaiting use by
the Agent when HTTP KeepAlive is used. Do not modify.
the agent when `keepAlive` is enabled. Do not modify.

### agent.getName(options)
<!-- YAML
Expand All @@ -179,8 +197,8 @@ added: v0.11.4
* Returns: {String}

Get a unique name for a set of request options, to determine whether a
connection can be reused. In the http agent, this returns
`host:port:localAddress`. In the https agent, the name includes the
connection can be reused. For an HTTP agent, this returns
`host:port:localAddress`. For an HTTPS agent, the name includes the
CA, cert, ciphers, and other HTTPS/TLS-specific options that determine
socket reusability.

Expand All @@ -191,7 +209,7 @@ added: v0.11.7

* {Number}

By default set to 256. For Agents supporting HTTP KeepAlive, this
By default set to 256. For agents with `keepAlive` enabled, this
sets the maximum number of sockets that will be left open in the free
state.

Expand Down Expand Up @@ -224,7 +242,7 @@ added: v0.3.6
* {Object}

An object which contains arrays of sockets currently in use by the
Agent. Do not modify.
agent. Do not modify.

## Class: http.ClientRequest
<!-- YAML
Expand Down Expand Up @@ -652,7 +670,7 @@ added: v0.1.0
* `response` {http.ServerResponse}

Emitted each time there is a request. Note that there may be multiple requests
per connection (in the case of keep-alive connections).
per connection (in the case of HTTP Keep-Alive connections).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct usage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're saying this is the correct usage, right? Not that I should correct the usage here as well. I think this has the intended meaning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks correct to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I am saying this is the correct usage, in contrast to a number of other places where I commented that it wasn't.


### Event: 'upgrade'
<!-- YAML
Expand Down Expand Up @@ -1490,7 +1508,7 @@ added: v0.5.9

* {http.Agent}

Global instance of Agent which is used as the default for all HTTP client
Global instance of `Agent` which is used as the default for all HTTP client
requests.

## http.request(options[, callback])
Expand Down Expand Up @@ -1520,15 +1538,13 @@ added: v0.3.6
* `headers` {Object} An object containing request headers.
* `auth` {String} Basic authentication i.e. `'user:password'` to compute an
Authorization header.
* `agent` {http.Agent|Boolean} Controls [`Agent`][] behavior. When an Agent
is used request will default to `Connection: keep-alive`. Possible values:
* `agent` {http.Agent|Boolean} Controls [`Agent`][] behavior. Possible values:
* `undefined` (default): use [`http.globalAgent`][] for this host and port.
* `Agent` object: explicitly use the passed in `Agent`.
* `false`: opts out of connection pooling with an Agent, defaults request to
`Connection: close`.
* `false`: causes a new `Agent` with default values to be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really, really different behaviour from the original text. Are you absolutely sure that it is impossible to send a one-shot HTTP request with node, that disables HTTP Keep-Alive, aka sends Connection: close? Node.js docs have been wrong before, but this strikes me as a surprising bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my note above, but yes, I think this is the case based on this code: https://github.com/nodejs/node/blob/master/lib/_http_client.js#L63-L75

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also stumbled upon this while working on #10818 and I was like 😲

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Connection: close the default, since you have to explicitly opt-in via keepAlive: true when creating a new Agent? So if the defaults are used, then that means keepAlive: false is used ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex good point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex @lpinca not really, no. As I read the code, all OutgoingMessages use Connection: keep-alive whether the keepAlive option is specified or not. This is what allows the behavior I noted above - that multiple, queued requests will reuse the same socket. So Connection: keep-alive is always used, and once the queue of requests has been emptied, then only does OutgoingMessage send Connection: close.

You can see how sockets are reused even when keepAlive option is false here.

As I've said before, I could be mistaken and reading the code wrong, but nobody has actually given me another interpretation of the code itself that corresponds to the old documentation. My text changes are based on what I see in the code. I've tried to provide that analysis here and here. If you think it's wrong or mistaken, please let me know how and where. The code is spaghettiish and quite confusing.

Copy link
Contributor

@mscdex mscdex Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lance The socket is only reused if 'free' is emitted as you can see in your link.

Here's the entire client request flow:

  1. http.request() merely creates a new ClientRequest()
  2. ClientRequest() creates a new Agent instance using the default options when agent: false is passed
    a. As we see in the Agent() constructor, the default value for keepAlive is false
    b. We also see that in the Agent() constructor that the default value for maxSockets is Agent.defaultMaxSockets which is Infinity.
  3. Later in ClientRequest(), we check if the agent is using keepAlive and if the maxSockets is set to Infinity. If so, then we disable reuse of sockets by setting req.shouldKeepAlive to false.
    a. As a side note, if the server is sending Connection: close or something similar, then we definitely do not reuse the socket, even if agent.keepAlive is true.
  4. When we go to send the initial HTTP request line and headers, we see that if the user has not explicitly set a Connection header, that the default behavior is to check req.shouldKeepAlive (among other values) to determine whether to send Connection: keep-alive or not. Importantly, if req.shouldKeepAlive is false, then Connection: close is sent.
  5. When the response from the server is complete, req.shouldKeepAlive is checked and responseKeepAlive() is called because req.shouldKeepAlive is false.
  6. Once inside responseKeepAlive(), we immediately check if req.shouldKeepAlive is false. If so, we destroy the socket if it hasn't already been destroyed. You can see in the else block that when req.shouldKeepAlive is true, that we emit 'free', which leads to the execution of the code that you pointed out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex thank you for clarifying. Stay tuned.

* `createConnection` {Function} A function that produces a socket/stream to
use for the request when the `agent` option is not used. This can be used to
avoid creating a custom Agent class just to override the default
avoid creating a custom `Agent` class just to override the default
`createConnection` function. See [`agent.createConnection()`][] for more
details.
* `timeout` {Integer}: A number specifying the socket timeout in milliseconds.
Expand Down Expand Up @@ -1649,3 +1665,4 @@ There are a few special headers that should be noted.
[constructor options]: #http_new_agent_options
[Readable Stream]: stream.html#stream_class_stream_readable
[Writable Stream]: stream.html#stream_class_stream_writable
[socket.unref()]: net.html#net_socket_unref