Skip to content

Commit

Permalink
http/2: add stats and stream flush timeout (#139)
Browse files Browse the repository at this point in the history
This commit adds a new stream flush timeout to guard against a
remote server that does not open window once an entire stream has
been buffered for flushing. Additional stats have also been added
to better understand the codecs view of active streams as well as
amount of data buffered.

Signed-off-by: Matt Klein <mklein@lyft.com>
  • Loading branch information
mattklein123 authored and tonya11en committed Jun 30, 2020
1 parent 7ca28ff commit 0e49a49
Show file tree
Hide file tree
Showing 30 changed files with 392 additions and 53 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,16 @@ message HttpConnectionManager {
// is terminated with a 408 Request Timeout error code if no upstream response
// header has been received, otherwise a stream reset occurs.
//
// This timeout also specifies the amount of time that Envoy will wait for the peer to open enough
// window to write any remaining stream data once the entirety of stream data (local end stream is
// true) has been buffered pending available window. In other words, this timeout defends against
// a peer that does not release enough window to completely write the stream, even though all
// data has been proxied within available flow control windows. If the timeout is hit in this
// case, the :ref:`tx_flush_timeout <config_http_conn_man_stats_per_codec>` counter will be
// incremented. Note that :ref:`max_stream_duration
// <envoy_api_field_core.HttpProtocolOptions.max_stream_duration>` does not apply to this corner
// case.
//
// Note that it is possible to idle timeout even if the wire traffic for a stream is non-idle, due
// to the granularity of events presented to the connection manager. For example, while receiving
// very large request headers, it may be the case that there is traffic regularly arriving on the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,16 @@ message HttpConnectionManager {
// is terminated with a 408 Request Timeout error code if no upstream response
// header has been received, otherwise a stream reset occurs.
//
// This timeout also specifies the amount of time that Envoy will wait for the peer to open enough
// window to write any remaining stream data once the entirety of stream data (local end stream is
// true) has been buffered pending available window. In other words, this timeout defends against
// a peer that does not release enough window to completely write the stream, even though all
// data has been proxied within available flow control windows. If the timeout is hit in this
// case, the :ref:`tx_flush_timeout <config_http_conn_man_stats_per_codec>` counter will be
// incremented. Note that :ref:`max_stream_duration
// <envoy_api_field_config.core.v3.HttpProtocolOptions.max_stream_duration>` does not apply to
// this corner case.
//
// Note that it is possible to idle timeout even if the wire traffic for a stream is non-idle, due
// to the granularity of events presented to the connection manager. For example, while receiving
// very large request headers, it may be the case that there is traffic regularly arriving on the
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions docs/root/configuration/http/http_conn_man/stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,16 @@ All http2 statistics are rooted at *http2.*
rx_reset, Counter, Total number of reset stream frames received by Envoy
too_many_header_frames, Counter, Total number of times an HTTP2 connection is reset due to receiving too many headers frames. Envoy currently supports proxying at most one header frame for 100-Continue one non-100 response code header frame and one frame with trailers
trailers, Counter, Total number of trailers seen on requests coming from downstream
tx_flush_timeout, Counter, Total number of :ref:`stream idle timeouts <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.stream_idle_timeout>` waiting for open stream window to flush the remainder of a stream
tx_reset, Counter, Total number of reset stream frames transmitted by Envoy
streams_active, Gauge, Active streams as observed by the codec
pending_send_bytes, Gauge, Currently buffered body data in bytes waiting to be written when stream/connection window is opened.

.. attention::

The HTTP/2 `streams_active` gauge may be greater than the HTTP connection manager
`downstream_rq_active` gauge due to differences in stream accounting between the codec and the
HTTP connection manager.

Tracing statistics
------------------
Expand Down
4 changes: 3 additions & 1 deletion docs/root/faq/configuration/timeouts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,9 @@ context request/stream is interchangeable.
<envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.stream_idle_timeout>`
is the amount of time that the connection manager will allow a stream to exist with no upstream
or downstream activity. The default stream idle timeout is *5 minutes*. This timeout is strongly
recommended for streaming APIs (requests or responses that never end).
recommended for all requests (not just streaming requests/responses) as it additionally defends
against an HTTP/2 peer that does not open stream window once an entire response has been buffered
to be sent to a downstream client.
* The HTTP protocol :ref:`max_stream_duration <envoy_v3_api_field_config.core.v3.HttpProtocolOptions.max_stream_duration>`
is defined in a generic message used by the HTTP connection manager. The max stream duration is the
maximum time that a stream's lifetime will span. You can use this functionality when you want to reset
Expand Down
3 changes: 3 additions & 0 deletions docs/root/version_history/v1.14.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ Changes
-------

* http: fixed CVE-2020-11080 by rejecting HTTP/2 SETTINGS frames with too many parameters.
* http: the :ref:`stream_idle_timeout <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.stream_idle_timeout>`
now also defends against an HTTP/2 peer that does not open stream window once an entire response
has been buffered to be sent to a downstream client.
* listener: Add runtime support for `per-listener limits <config_listeners_runtime>` on
active/accepted connections.
* overload management: Add runtime support for :ref:`global limits <config_overload_manager>`
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions include/envoy/http/codec.h
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,13 @@ class Stream {
* with the stream.
*/
virtual const Network::Address::InstanceConstSharedPtr& connectionLocalAddress() PURE;

/**
* Set the flush timeout for the stream. At the codec level this is used to bound the amount of
* time the codec will wait to flush body data pending open stream window. It does *not* count
* small window updates as satisfying the idle timeout as this is a potential DoS vector.
*/
virtual void setFlushTimeout(std::chrono::milliseconds timeout) PURE;
};

/**
Expand Down
2 changes: 1 addition & 1 deletion source/common/http/codec_client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ namespace Http {
CodecClient::CodecClient(Type type, Network::ClientConnectionPtr&& connection,
Upstream::HostDescriptionConstSharedPtr host,
Event::Dispatcher& dispatcher)
: type_(type), connection_(std::move(connection)), host_(host),
: type_(type), host_(host), connection_(std::move(connection)),
idle_timeout_(host_->cluster().idleTimeout()) {
if (type_ != Type::HTTP3) {
// Make sure upstream connections process data and then the FIN, rather than processing
Expand Down
6 changes: 4 additions & 2 deletions source/common/http/codec_client.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,9 +155,11 @@ class CodecClient : Logger::Loggable<Logger::Id::client>,
}

const Type type_;
ClientConnectionPtr codec_;
Network::ClientConnectionPtr connection_;
// The order of host_, connection_, and codec_ matter as during destruction each can refer to
// the previous, at least in tests.
Upstream::HostDescriptionConstSharedPtr host_;
Network::ClientConnectionPtr connection_;
ClientConnectionPtr codec_;
Event::TimerPtr idle_timer_;
const absl::optional<std::chrono::milliseconds> idle_timeout_;

Expand Down
4 changes: 4 additions & 0 deletions source/common/http/conn_manager_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,7 @@ RequestDecoder& ConnectionManagerImpl::newStream(ResponseEncoder& response_encod
new_stream->state_.is_internally_created_ = is_internally_created;
new_stream->response_encoder_ = &response_encoder;
new_stream->response_encoder_->getStream().addCallbacks(*new_stream);
new_stream->response_encoder_->getStream().setFlushTimeout(new_stream->idle_timeout_ms_);
new_stream->buffer_limit_ = new_stream->response_encoder_->getStream().bufferLimit();
// If the network connection is backed up, the stream should be made aware of it on creation.
// Both HTTP/1.x and HTTP/2 codecs handle this in StreamCallbackHelper::addCallbacksHelper.
Expand Down Expand Up @@ -970,7 +971,10 @@ void ConnectionManagerImpl::ActiveStream::decodeHeaders(RequestHeaderMapPtr&& he
if (hasCachedRoute()) {
const Router::RouteEntry* route_entry = cached_route_.value()->routeEntry();
if (route_entry != nullptr && route_entry->idleTimeout()) {
// TODO(mattklein123): Technically if the cached route changes, we should also see if the
// route idle timeout has changed and update the value.
idle_timeout_ms_ = route_entry->idleTimeout().value();
response_encoder_->getStream().setFlushTimeout(idle_timeout_ms_);
if (idle_timeout_ms_.count()) {
// If we have a route-level idle timeout but no global stream idle timeout, create a timer.
if (stream_idle_timer_ == nullptr) {
Expand Down
5 changes: 5 additions & 0 deletions source/common/http/http1/codec_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ class StreamEncoderImpl : public virtual StreamEncoder,
uint32_t bufferLimit() override;
absl::string_view responseDetails() override { return details_; }
const Network::Address::InstanceConstSharedPtr& connectionLocalAddress() override;
void setFlushTimeout(std::chrono::milliseconds) override {
// HTTP/1 has one stream per connection, thus any data encoded is immediately written to the
// connection, invoking any watermarks as necessary. There is no internal buffering that would
// require a flush timeout not already covered by other timeouts.
}

void setIsResponseToHeadRequest(bool value) { is_response_to_head_request_ = value; }
void setIsResponseToConnectRequest(bool value) { is_response_to_connect_request_ = value; }
Expand Down
49 changes: 48 additions & 1 deletion source/common/http/http2/codec_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "common/http/exception.h"
#include "common/http/header_utility.h"
#include "common/http/headers.h"
#include "common/http/http2/codec_stats.h"
#include "common/http/utility.h"

#include "absl/container/fixed_array.h"
Expand Down Expand Up @@ -95,11 +96,25 @@ ConnectionImpl::StreamImpl::StreamImpl(ConnectionImpl& parent, uint32_t buffer_l
data_deferred_(false), waiting_for_non_informational_headers_(false),
pending_receive_buffer_high_watermark_called_(false),
pending_send_buffer_high_watermark_called_(false), reset_due_to_messaging_error_(false) {
parent_.stats_.streams_active_.inc();
if (buffer_limit > 0) {
setWriteBufferWatermarks(buffer_limit / 2, buffer_limit);
}
}

ConnectionImpl::StreamImpl::~StreamImpl() { ASSERT(stream_idle_timer_ == nullptr); }

void ConnectionImpl::StreamImpl::destroy() {
if (stream_idle_timer_ != nullptr) {
// To ease testing and the destructor assertion.
stream_idle_timer_->disableTimer();
stream_idle_timer_.reset();
}

parent_.stats_.streams_active_.dec();
parent_.stats_.pending_send_bytes_.sub(pending_send_data_.length());
}

static void insertHeader(std::vector<nghttp2_nv>& headers, const HeaderEntry& header) {
uint8_t flags = 0;
if (header.key().isReference()) {
Expand Down Expand Up @@ -206,6 +221,7 @@ void ConnectionImpl::StreamImpl::encodeTrailersBase(const HeaderMap& trailers) {
// waiting on window updates. We need to save the trailers so that we can emit them later.
ASSERT(!pending_trailers_to_encode_);
pending_trailers_to_encode_ = cloneTrailers(trailers);
createPendingFlushTimer();
} else {
submitTrailers(trailers);
parent_.sendPendingFrames();
Expand Down Expand Up @@ -364,6 +380,7 @@ int ConnectionImpl::StreamImpl::onDataSourceSend(const uint8_t* framehd, size_t
return NGHTTP2_ERR_FLOODED;
}

parent_.stats_.pending_send_bytes_.sub(length);
output.move(pending_send_data_, length);
parent_.connection_.write(output, false);
return 0;
Expand All @@ -385,9 +402,30 @@ void ConnectionImpl::ServerStreamImpl::submitHeaders(const std::vector<nghttp2_n
ASSERT(rc == 0);
}

void ConnectionImpl::ServerStreamImpl::createPendingFlushTimer() {
ASSERT(stream_idle_timer_ == nullptr);
if (stream_idle_timeout_.count() > 0) {
stream_idle_timer_ =
parent_.connection_.dispatcher().createTimer([this] { onPendingFlushTimer(); });
stream_idle_timer_->enableTimer(stream_idle_timeout_);
}
}

void ConnectionImpl::StreamImpl::onPendingFlushTimer() {
ENVOY_CONN_LOG(debug, "pending stream flush timeout", parent_.connection_);
stream_idle_timer_.reset();
parent_.stats_.tx_flush_timeout_.inc();
ASSERT(local_end_stream_ && !local_end_stream_sent_);
// This will emit a reset frame for this stream and close the stream locally. No reset callbacks
// will be run because higher layers think the stream is already finished.
resetStreamWorker(StreamResetReason::LocalReset);
parent_.sendPendingFrames();
}

void ConnectionImpl::StreamImpl::encodeData(Buffer::Instance& data, bool end_stream) {
ASSERT(!local_end_stream_);
local_end_stream_ = end_stream;
parent_.stats_.pending_send_bytes_.add(data.length());
pending_send_data_.move(data);
if (data_deferred_) {
int rc = nghttp2_session_resume_data(parent_.session_, stream_id_);
Expand All @@ -397,6 +435,9 @@ void ConnectionImpl::StreamImpl::encodeData(Buffer::Instance& data, bool end_str
}

parent_.sendPendingFrames();
if (local_end_stream_ && pending_send_data_.length() > 0) {
createPendingFlushTimer();
}
}

void ConnectionImpl::StreamImpl::resetStream(StreamResetReason reason) {
Expand Down Expand Up @@ -473,7 +514,12 @@ ConnectionImpl::ConnectionImpl(Network::Connection& connection, CodecStats& stat
http2_options.max_inbound_window_update_frames_per_data_frame_sent().value()),
dispatching_(false), raised_goaway_(false), pending_deferred_reset_(false) {}

ConnectionImpl::~ConnectionImpl() { nghttp2_session_del(session_); }
ConnectionImpl::~ConnectionImpl() {
for (const auto& stream : active_streams_) {
stream->destroy();
}
nghttp2_session_del(session_);
}

Http::Status ConnectionImpl::dispatch(Buffer::Instance& data) {
// TODO(#10878): Remove this wrapper when exception removal is complete. innerDispatch may either
Expand Down Expand Up @@ -839,6 +885,7 @@ int ConnectionImpl::onStreamClose(int32_t stream_id, uint32_t error_code) {
stream->runResetCallbacks(reason);
}

stream->destroy();
connection_.dispatcher().deferredDelete(stream->removeFromList(active_streams_));
// Any unconsumed data must be consumed before the stream is deleted.
// nghttp2 does not appear to track this internally, and any stream deleted
Expand Down
Loading

0 comments on commit 0e49a49

Please sign in to comment.