Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

headers: fixing fast fail of size-validation #4269

Merged
merged 7 commits into from
Aug 30, 2018

Conversation

alyssawilk
Copy link
Contributor

Fixing the validation #4211 to do a better job of fast-failing in case of header overflow.

Risk Level: Medium: Envoy should crash instead of OOM
Testing: new unit test
Docs Changes: n/a
Release Notes: n/a

Includes a unit test for why I filed #4268

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk requested review from ggreenway and htuch August 27, 2018 19:19
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@@ -88,7 +93,9 @@ void HeaderString::append(const char* data, uint32_t size) {
type_ = Type::Dynamic;
dynamic_capacity_ =
std::max(MinDynamicCapacity, static_cast<size_t>((string_length_ + size) * 2));
validateCapacity(dynamic_capacity_);
if (dynamic_capacity_ != MinDynamicCapacity) {
validateCapacity(newCapacity(string_length_, size));
Copy link
Member

@htuch htuch Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems scary to me. Can't we assign to a size_t with the new target capacity, validate that specific value and then allocate with the same value? Seems hard to reconcile this if block with what is actually allocated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's also bad because if we overflow to exactly MinDynamicCapacity it's broken. Overflow is hard


HeaderParserPtr req_header_parser = HeaderParser::configure(to_add);

Http::TestHeaderMapImpl headerMap{{":method", "POST"}};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: header_map

// This must be an inline header to get the append-in-place semantics.
const char* header_name = "connection";
Protobuf::RepeatedPtrField<envoy::api::v2::core::HeaderValueOption> to_add;
int loop_factor = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const uint32_t num_header_chunks or something might be clearer.

const char* header_name = "connection";
Protobuf::RepeatedPtrField<envoy::api::v2::core::HeaderValueOption> to_add;
int loop_factor = 10;
uint64_t length = std::numeric_limits<uint32_t>::max() / loop_factor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const

Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all of the size_t's be uint64_t's? sizeof(size_t) is platform dependent, which makes it harder to reason about. I think we want a full 64 bit value in all cases to avoid overflow issues.

void validateCapacity(size_t new_capacity) {
// If the resizing will cause buffer overflow due to hitting uint32_t::max, an OOM is likely
// imminent. Fast-fail rather than allow a buffer overflow attack (issue #1421)
RELEASE_ASSERT(new_capacity <= std::numeric_limits<uint32_t>::max(), "");
RELEASE_ASSERT(new_capacity <= std::numeric_limits<uint32_t>::max(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::numeric_limits::max() returns type T; should that part be cast to size_t before comparison?

@@ -19,10 +19,15 @@ constexpr size_t MinDynamicCapacity{32};
// This includes the NULL (StringUtil::itoa technically only needs 21).
constexpr size_t MaxIntegerLength{32};

size_t newCapacity(uint32_t existing_capacity, uint32_t size_to_append) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to make the function args both size_t instead of uint32_t? Then no casts would be needed in the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on switching everything to explicit uint32_t and uint64_t. Too hard to reason about as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mildly prefer the cast over changing inputs, because otherwise one could pass in a real uint64_t and overflow numeric_limits<uint64_t>::max()

I'm claiming having a death test validates that we got the types correct, though I can (and maybe should) add one in header_map_impl_test.cc as well as the user defined header test which motivated this change in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annnnnd found another overflow bug.
Sigh.
@htuch we're going to fuzz length limits, yeah?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will put this on my fuzzer roadmap. The issue today is that we fuzz the inputs, but the fuzzer itself doesn't generate massive inputs in terms of GB sized protos. What we need to do is make https://github.com/envoyproxy/envoy/blob/master/test/common/http/header_map_impl_fuzz.proto#L11 a oneof with an integer that could just add std::string(n, 'A').

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one last thing..

const uint64_t new_capacity = newCapacity(string_length_, size);
dynamic_capacity_ = std::max(MinDynamicCapacity, new_capacity);
if (dynamic_capacity_ != MinDynamicCapacity) {
validateCapacity(new_capacity);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, my comment was lost here. I was wondering if we could move validateCapacity up and make it unconditional?

// If the resizing will cause buffer overflow due to hitting uint32_t::max, an OOM is likely
// imminent. Fast-fail rather than allow a buffer overflow attack (issue #1421)
RELEASE_ASSERT(new_capacity <= std::numeric_limits<uint32_t>::max(), "");
RELEASE_ASSERT(new_capacity <= std::numeric_limits<uint32_t>::max(),
"Trying to allocate overly large headers.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a TODO here for a more graceful failure handling? Or is dying the only appropriate action at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dying before overflow is appropriate but I agree I'd like something before that.

I've had this discussion with Matt before. For some context on how headers get added it's
[incoming headers from client/upstream] which are subject to built in header limits
filters you've chosen to add (i.e. Google's internal "let's add googly trace headers") which should be scoped
Lua (which if you're not sanity checking, well joke's on you)
User defined headers (which I just filed an issue tracker for being too easy an OOM vector)

I think if each header input has at least some scoped / sane limits, having a relesae assert for bugs is Ok-but-not-great.

What I'd prefer is that we have a soft-fail option a lot earlier - a configured max header limit where instead of appending headers we do the logical equivalent of an internal GFE_BUG macro to note the error for both logging and monitoring, but not kill Envoy except on configuration of a build-in bugs-should-be-fatal flag for the benefit of folks on the fail-fast fail-hard side of the fence.

This is on my list of things to do to secure Envoy before widespread use for my project, but not high enough up I've filed an issue tracker for it.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
validateCapacity(new_capacity);
dynamic_capacity_ = new_capacity;
} else {
dynamic_capacity_ = MinDynamicCapacity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, much cleaner now, thanks.

@dnoe
Copy link
Contributor

dnoe commented Aug 28, 2018

If we can get an approval from both @ggreenway and @htuch on this before merging, I'd appreciate that. Thanks!

htuch
htuch previously approved these changes Aug 28, 2018
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -19,10 +19,15 @@ constexpr size_t MinDynamicCapacity{32};
// This includes the NULL (StringUtil::itoa technically only needs 21).
constexpr size_t MaxIntegerLength{32};

void validateCapacity(size_t new_capacity) {
uint64_t newCapacity(uint32_t existing_capacity, uint32_t size_to_append) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you could make existing_capacity a uint64_t to avoid the cast, but this doesn't really matter.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

Huh. Looks like not all of our builds are powerful enough to allocate a 2^32 header. In retrospect I'm not terribly surprised.
Given that one is a proof of concept test showing we need guards on user defined headers I'll just disable it for now. once we fix the issue and reject insane configs, we can just test the sanity checks.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
htuch
htuch previously approved these changes Aug 28, 2018
ggreenway
ggreenway previously approved these changes Aug 28, 2018
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk dismissed stale reviews from ggreenway and htuch via eefc16f August 29, 2018 13:22
@alyssawilk
Copy link
Contributor Author

Can I get one more LGTM for the merge conflict?

@htuch htuch merged commit 89e0f23 into envoyproxy:master Aug 30, 2018
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
Pulling the following changes from github.com/envoyproxy/envoy:

f936fc6 ssl: serialize accesses to SSL socket factory contexts (envoyproxy#4345)
e34dcd6 Fix crash in tcp_proxy (envoyproxy#4323)
ae6a252 router: fix matching when all domains have wildcards (envoyproxy#4326)
aa06142 test: Stop fake_upstream methods from accidentally succeeding (envoyproxy#4232)
5d73187 rbac: update the authenticated.user to a StringMatcher. (envoyproxy#4250)
c6bfc7d time: Event::TimeSystem abstraction to make it feasible to inject time with simulated timers (envoyproxy#4257)
752483e Fixing the fix (envoyproxy#4333)
83487f6 tls: update BoringSSL to ab36a84b (3497). (envoyproxy#4338)
7bc210e test: fixing interactions between waitFor and ignore_spurious_events (envoyproxy#4309)
69474b3 admin: order stats in clusters json admin (envoyproxy#4306)
2d155f9 ppc64le build (envoyproxy#4183)
07efc6d fix static initialization fiasco problem (envoyproxy#4314)
0b7e3b5 test: Remove declared but undefined class methods (envoyproxy#4297)
1485a13 lua: make sure resetting dynamic metadata wrapper when request info is marked dead
d243cd6 test: set to zero when start_time exceeds limit (envoyproxy#4328)
0a1e92a test: fix heap use-after-free in ~IntegrationTestServer. (envoyproxy#4319)
cddc732 CONTRIBUTING: Document 'kick-ci' trick. (envoyproxy#4335)
f13ef24 docs: remove reference to deprecated value field (envoyproxy#4322)
e947a27 router: minor doc fixes in stream idle timeout (envoyproxy#4329)
0c2e998 tcp-proxy: fixing a TCP proxy bug where we attempted to readDisable a closed connection (envoyproxy#4296)
00ffe44 utility: fix strftime overflow handling. (envoyproxy#4321)
af1183c Re-enable TcpProxySslIntegrationTest and make the tests pass again. (envoyproxy#4318)
3553461 fuzz: fix H2 codec fuzzer post envoyproxy#4262. (envoyproxy#4311)
42f6048 Proto string issue fix (envoyproxy#4320)
9c492a0 Support Envoy to fetch secrets using SDS service. (envoyproxy#4256)
a857219 ratelimit: revert `revert rate limit failure mode config` and add tests (envoyproxy#4303)
1d34172 dns: fix exception unsafe behavior in c-ares callbacks. (envoyproxy#4307)
1212423 alts: add gRPC TSI socket (envoyproxy#4153)
f0363ae fuzz: detect client-side resets in H2 codec fuzzer. (envoyproxy#4300)
01aa3f8 test: hopefully deflaking echo integration test (envoyproxy#4304)
1fc0f4b ratelimit: link legacy proto when message is being used (envoyproxy#4308)
aa4481e fix rare List::remove(&target) segfault (envoyproxy#4244)
89e0f23 headers: fixing fast fail of size-validation (envoyproxy#4269)
97eba59 build: bump googletest version. (envoyproxy#4293)
0057e22 fuzz: avoid false positives in HCM fuzzer. (envoyproxy#4262)
9d094e5 Revert ac0bd74 (envoyproxy#4295)
ddb28a4 Add validation context provider (envoyproxy#4264)
3b47cba added histogram latency information to Hystrix dashboard stream (envoyproxy#3986)
cf87d50 docs: update SNI FAQ. (envoyproxy#4285)
f952033 config: fix update empty stat for eds (envoyproxy#4276)
329e591 router: Add ability of custom headers to rely on per-request data (envoyproxy#4219)
68d20b4  thrift: refactor build files and imports (envoyproxy#4271)
5fa8192 access_log: log requested_server_name in tcp proxy (envoyproxy#4144)
fa45bb4 fuzz: libc++ clocks don't like nanos. (envoyproxy#4282)
53f8944 stats: add symbol table for future stat name encoding (envoyproxy#3927)
c987b42 test infra: Remove timeSource() from the ClusterManager api (envoyproxy#4247)
cd171d9 websocket: tunneling websockets (and upgrades in general) over H2 (envoyproxy#4188)
b9dc5d9 router: disallow :path/host rewriting in request_headers_to_add. (envoyproxy#4220)
0c91011 network: skip socket options and source address for UDS client connections (envoyproxy#4252)
da1857d build: fixing a downstream compile error by noting explicit fallthrough (envoyproxy#4265)
9857cfe fuzz: cleanup per-test environment after each fuzz case. (envoyproxy#4253)
52beb06 test: Wrap proto string in std::string before comparison (envoyproxy#4238)
f5e219e extensions/thrift_proxy: Add header matching to thrift router (envoyproxy#4239)
c9ce5d2 fuzz: track read_disable_count bidirectionally in codec_impl_fuzz_test. (envoyproxy#4260)
35103b3 fuzz: use nanoseconds for SystemTime in RequestInfo. (envoyproxy#4255)
ba6ba98 fuzz: make runtime root hermetic in server_fuzz_test. (envoyproxy#4258)
b0a9014 time: Add 'format' test to ensure no one directly instantiates Prod*Time from source. (envoyproxy#4248)
8567460 access_log: support beginning of epoch in START_TIME. (envoyproxy#4254)
28d5f41 proto: unify envoy_proto_library/api_proto_library. (envoyproxy#4233)
f7d3cb6 http: fix allocation bug introduced in envoyproxy#4211. (envoyproxy#4245)

Fixes istio/istio#8310 (once pulled into istio/istio).

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
@alyssawilk alyssawilk deleted the RELEASE_ASSERT branch November 28, 2018 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants