Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: Avoid asserts in fuzz-tests by eliminating arbitrary length limits, using the new MemBlock wrapper for memcpy #8779

Merged
merged 50 commits into from
Dec 20, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5d40fc4
Avoid crashes in fuzz-tests due to input strings being too long.
jmarantz Oct 27, 2019
8f9ab93
Eliminate 64k limit on # of segments in a real symbol-table stat-name…
jmarantz Oct 28, 2019
beb4684
some cleanup
jmarantz Oct 28, 2019
d143365
Share helper methods for encoding/decoding symbols and sizes.
jmarantz Oct 29, 2019
5ef1a49
cleanup
jmarantz Oct 29, 2019
c425551
typo
jmarantz Oct 29, 2019
f680ccc
fix confusing about whether the encoding-length was the number of byt…
jmarantz Oct 29, 2019
9fee9f5
fix over-aggressive inequality for fake symbol tables
jmarantz Oct 29, 2019
184fbed
don't recompute the number of bytes consumed.
jmarantz Oct 29, 2019
6ac3dee
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Oct 29, 2019
9c62a9e
remove dynamo-stats test change which is not needed
jmarantz Oct 29, 2019
4482ca7
use totalSizeBytes to simplify some code.
jmarantz Oct 29, 2019
14220bb
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 1, 2019
4d345d0
add MemBlock abstraction to isolate risky memory operations.
jmarantz Nov 2, 2019
5b6f0db
reduce need to do byte access to the memory buffers.
jmarantz Nov 2, 2019
70ffe33
remove remainder of memcpy and pointer arithmetic outside MemBlock cl…
jmarantz Nov 2, 2019
9f1eae7
format
jmarantz Nov 2, 2019
cf33e18
some cleanup -- still needs more unit tests.
jmarantz Nov 2, 2019
14b8740
remove superfluous methods.
jmarantz Nov 2, 2019
eb6eb51
add unit test.
jmarantz Nov 2, 2019
6d4b8bc
improve class comment.
jmarantz Nov 2, 2019
cf71f0b
rename MemBlock to MemBlockBuilder.
jmarantz Nov 2, 2019
256a857
Rename MemBlock to MemBlockBuilder to better reflect intended usage m…
jmarantz Nov 2, 2019
cb17d28
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 2, 2019
7b3150e
format.
jmarantz Nov 2, 2019
c4ddd4a
improve comments and function naming.
jmarantz Nov 3, 2019
c728651
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 3, 2019
4824977
guard memcpy against 0-byte calls, which show up as asan failures.
jmarantz Nov 3, 2019
3c9b184
Merge branch 'stats-fuzzer-not-too-long' of github.com:jmarantz/envoy…
jmarantz Nov 14, 2019
5f4ea86
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 14, 2019
64c0af4
RELEASE_ASSERT rather than ASSERT, use non-constructing allocation.
jmarantz Nov 17, 2019
06e3fbc
add test to make sure we don't construct anything.
jmarantz Nov 18, 2019
b2131df
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 18, 2019
a281e28
update gold values
jmarantz Nov 18, 2019
cd98dd2
Go back to new-ing an array and clean up.
jmarantz Nov 19, 2019
cf7b451
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 19, 2019
8069e89
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 20, 2019
9ae8144
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Nov 22, 2019
922a860
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Dec 10, 2019
6dcc218
typo and gold values update.
jmarantz Dec 10, 2019
006b5c7
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Dec 14, 2019
aa70e5e
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Dec 15, 2019
9033ac6
post-merge cleanup
jmarantz Dec 15, 2019
d19d722
more cleanups
jmarantz Dec 15, 2019
ae1da1b
Cleanup lifetime issues in test infrastructure.
jmarantz Dec 15, 2019
bc9871e
update MemBlockBuilder interface change
jmarantz Dec 16, 2019
ae0b1dd
Merge branch 'master' into stats-fuzzer-not-too-long
jmarantz Dec 16, 2019
f4ef7c6
popuplate -> setCacacity
jmarantz Dec 16, 2019
3edfac7
add direct fuzzing of the encoder.
jmarantz Dec 18, 2019
4f5be6c
Share testing infrastructure between fuzz-tests and unit-tests.
jmarantz Dec 18, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions source/common/stats/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ envoy_cc_library(
"//include/envoy/stats:symbol_table_interface",
"//source/common/common:assert_lib",
"//source/common/common:logger_lib",
"//source/common/common:mem_block_builder_lib",
"//source/common/common:stack_array",
"//source/common/common:thread_lib",
"//source/common/common:utility_lib",
Expand Down
37 changes: 17 additions & 20 deletions source/common/stats/fake_symbol_table_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,36 +64,32 @@ class FakeSymbolTableImpl : public SymbolTable {
// than 256 of them.
RELEASE_ASSERT(num_names < 256, "Maximum number elements in a StatNameList exceeded");

// First encode all the names. The '1' here represents the number of
// names. The num_names * StatNameSizeEncodingBytes reserves space for the
// lengths of each name.
size_t total_size_bytes = 1 + num_names * StatNameSizeEncodingBytes;

size_t total_size_bytes = 1; /* one byte for holding the number of names */
for (uint32_t i = 0; i < num_names; ++i) {
total_size_bytes += names[i].size();
size_t name_size = names[i].size();
total_size_bytes += SymbolTableImpl::Encoding::totalSizeBytes(name_size);
}

// Now allocate the exact number of bytes required and move the encodings
// into storage.
auto storage = std::make_unique<Storage>(total_size_bytes);
uint8_t* p = &storage[0];
*p++ = num_names;
MemBlockBuilder<uint8_t> mem_block(total_size_bytes);
mem_block.appendOne(num_names);
for (uint32_t i = 0; i < num_names; ++i) {
auto& name = names[i];
size_t sz = name.size();
p = SymbolTableImpl::writeLengthReturningNext(sz, p);
SymbolTableImpl::Encoding::appendEncoding(sz, mem_block);
if (!name.empty()) {
memcpy(p, name.data(), sz * sizeof(uint8_t));
p += sz;
mem_block.appendData(
absl::MakeConstSpan(reinterpret_cast<const uint8_t*>(name.data()), sz));
}
}

// This assertion double-checks the arithmetic where we computed
// total_size_bytes. After appending all the encoded data into the
// allocated byte array, we should wind up with a pointer difference of
// total_size_bytes from the beginning of the allocation.
ASSERT(p == &storage[0] + total_size_bytes);
list.moveStorageIntoList(std::move(storage));
// allocated byte array, we should have exhausted all the memory
// we though we needed.
ASSERT(mem_block.capacityRemaining() == 0);
list.moveStorageIntoList(mem_block.release());
}

std::string toString(const StatName& stat_name) const override {
Expand Down Expand Up @@ -143,10 +139,11 @@ class FakeSymbolTableImpl : public SymbolTable {

StoragePtr encodeHelper(absl::string_view name) const {
name = StringUtil::removeTrailingCharacters(name, '.');
auto bytes = std::make_unique<Storage>(name.size() + StatNameSizeEncodingBytes);
uint8_t* buffer = SymbolTableImpl::writeLengthReturningNext(name.size(), bytes.get());
memcpy(buffer, name.data(), name.size());
return bytes;
MemBlockBuilder<uint8_t> mem_block(SymbolTableImpl::Encoding::totalSizeBytes(name.size()));
SymbolTableImpl::Encoding::appendEncoding(name.size(), mem_block);
mem_block.appendData(
absl::MakeConstSpan(reinterpret_cast<const uint8_t*>(name.data()), name.size()));
return mem_block.release();
}
};

Expand Down
130 changes: 76 additions & 54 deletions source/common/stats/symbol_table_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ namespace Stats {
static const uint32_t SpilloverMask = 0x80;
static const uint32_t Low7Bits = 0x7f;

uint64_t StatName::dataSize() const {
if (size_and_data_ == nullptr) {
return 0;
}
return SymbolTableImpl::Encoding::decodeNumber(size_and_data_).first;
}

#ifndef ENVOY_CONFIG_COVERAGE
void StatName::debugPrint() {
if (size_and_data_ == nullptr) {
Expand All @@ -38,12 +45,22 @@ void StatName::debugPrint() {
#endif

SymbolTableImpl::Encoding::~Encoding() {
// Verifies that moveToStorage() was called on this encoding. Failure
// to call moveToStorage() will result in leaks symbols.
ASSERT(vec_.empty());
// Verifies that moveToMemBlock() was called on this encoding. Failure
// to call moveToMemBlock() will result in leaks symbols.
ASSERT(mem_block_.capacity() == 0);
}

void SymbolTableImpl::Encoding::addSymbol(Symbol symbol) {
uint64_t SymbolTableImpl::Encoding::encodingSizeBytes(uint64_t number) {
uint64_t num_bytes = 0;
do {
++num_bytes;
number >>= 7;
} while (number != 0);
return num_bytes;
}

void SymbolTableImpl::Encoding::appendEncoding(uint64_t number,
MemBlockBuilder<uint8_t>& mem_block) {
// UTF-8-like encoding where a value 127 or less gets written as a single
// byte. For higher values we write the low-order 7 bits with a 1 in
// the high-order bit. Then we right-shift 7 bits and keep adding more bytes
Expand All @@ -52,45 +69,54 @@ void SymbolTableImpl::Encoding::addSymbol(Symbol symbol) {
// When decoding, we stop consuming uint8_t when we see a uint8_t with
// high-order bit 0.
do {
if (symbol < (1 << 7)) {
vec_.push_back(symbol); // symbols <= 127 get encoded in one byte.
if (number < (1 << 7)) {
mem_block.appendOne(number); // number <= 127 gets encoded in one byte.
} else {
vec_.push_back((symbol & Low7Bits) | SpilloverMask); // symbols >= 128 need spillover bytes.
mem_block.appendOne((number & Low7Bits) | SpilloverMask); // >= 128 need spillover bytes.
}
symbol >>= 7;
} while (symbol != 0);
number >>= 7;
} while (number != 0);
}

void SymbolTableImpl::Encoding::addSymbols(const std::vector<Symbol>& symbols) {
ASSERT(data_bytes_required_ == 0);
for (Symbol symbol : symbols) {
data_bytes_required_ += encodingSizeBytes(symbol);
}
mem_block_.setCapacity(data_bytes_required_);
for (Symbol symbol : symbols) {
appendEncoding(symbol, mem_block_);
}
}

std::pair<uint64_t, uint64_t> SymbolTableImpl::Encoding::decodeNumber(const uint8_t* encoding) {
uint64_t number = 0;
uint64_t uc = SpilloverMask;
const uint8_t* start = encoding;
for (uint32_t shift = 0; (uc & SpilloverMask) != 0; ++encoding, shift += 7) {
uc = static_cast<uint32_t>(*encoding);
number |= (uc & Low7Bits) << shift;
}
return std::make_pair(number, encoding - start);
}

SymbolVec SymbolTableImpl::Encoding::decodeSymbols(const SymbolTable::Storage array,
uint64_t size) {
SymbolVec symbol_vec;
Symbol symbol = 0;
for (uint32_t shift = 0; size > 0; --size, ++array) {
uint32_t uc = static_cast<uint32_t>(*array);

// Inverse addSymbol encoding, walking down the bytes, shifting them into
// symbol, until a byte with a zero high order bit indicates this symbol is
// complete and we can move to the next one.
symbol |= (uc & Low7Bits) << shift;
if ((uc & SpilloverMask) == 0) {
symbol_vec.push_back(symbol);
shift = 0;
symbol = 0;
} else {
shift += 7;
}
symbol_vec.reserve(size);
while (size > 0) {
std::pair<uint64_t, uint64_t> symbol_consumed = decodeNumber(array);
symbol_vec.push_back(symbol_consumed.first);
size -= symbol_consumed.second;
array += symbol_consumed.second;
}
return symbol_vec;
}

uint64_t SymbolTableImpl::Encoding::moveToStorage(SymbolTable::Storage symbol_array) {
const uint64_t sz = dataBytesRequired();
symbol_array = writeLengthReturningNext(sz, symbol_array);
if (sz != 0) {
memcpy(symbol_array, vec_.data(), sz * sizeof(uint8_t));
}
vec_.clear(); // Logically transfer ownership, enabling empty assert on destruct.
return sz + StatNameSizeEncodingBytes;
void SymbolTableImpl::Encoding::moveToMemBlock(MemBlockBuilder<uint8_t>& mem_block) {
appendEncoding(data_bytes_required_, mem_block);
mem_block.appendBlock(mem_block_);
mem_block_.reset(); // Logically transfer ownership, enabling empty assert on destruct.
}

SymbolTableImpl::SymbolTableImpl()
Expand Down Expand Up @@ -130,9 +156,7 @@ void SymbolTableImpl::addTokensToEncoding(const absl::string_view name, Encoding
}

// Now efficiently encode the array of 32-bit symbols into a uint8_t array.
for (Symbol symbol : symbols) {
encoding.addSymbol(symbol);
}
encoding.addSymbols(symbols);
}

uint64_t SymbolTableImpl::numSymbols() const {
Expand Down Expand Up @@ -380,18 +404,19 @@ SymbolTable::StoragePtr SymbolTableImpl::encode(absl::string_view name) {
name = StringUtil::removeTrailingCharacters(name, '.');
Encoding encoding;
addTokensToEncoding(name, encoding);
auto bytes = std::make_unique<Storage>(encoding.bytesRequired());
encoding.moveToStorage(bytes.get());
return bytes;
MemBlockBuilder<uint8_t> mem_block(Encoding::totalSizeBytes(encoding.bytesRequired()));
encoding.moveToMemBlock(mem_block);
return mem_block.release();
}

StatNameStorage::StatNameStorage(absl::string_view name, SymbolTable& table)
: bytes_(table.encode(name)) {}

StatNameStorage::StatNameStorage(StatName src, SymbolTable& table) {
const uint64_t size = src.size();
bytes_ = std::make_unique<SymbolTable::Storage>(size);
src.copyToStorage(bytes_.get());
MemBlockBuilder<uint8_t> storage(size);
src.copyToMemBlock(storage);
bytes_ = storage.release();
table.incRefCount(statName());
}

Expand Down Expand Up @@ -457,14 +482,12 @@ SymbolTable::StoragePtr SymbolTableImpl::join(const StatNameVec& stat_names) con
for (StatName stat_name : stat_names) {
num_bytes += stat_name.dataSize();
}
auto bytes = std::make_unique<Storage>(num_bytes + StatNameSizeEncodingBytes);
uint8_t* p = writeLengthReturningNext(num_bytes, bytes.get());
MemBlockBuilder<uint8_t> mem_block(Encoding::totalSizeBytes(num_bytes));
Encoding::appendEncoding(num_bytes, mem_block);
for (StatName stat_name : stat_names) {
const uint64_t stat_name_bytes = stat_name.dataSize();
memcpy(p, stat_name.data(), stat_name_bytes);
p += stat_name_bytes;
stat_name.appendDataToMemBlock(mem_block);
}
return bytes;
return mem_block.release();
}

void SymbolTableImpl::populateList(const absl::string_view* names, uint32_t num_names,
Expand All @@ -483,19 +506,18 @@ void SymbolTableImpl::populateList(const absl::string_view* names, uint32_t num_

// Now allocate the exact number of bytes required and move the encodings
// into storage.
auto storage = std::make_unique<Storage>(total_size_bytes);
uint8_t* p = &storage[0];
*p++ = num_names;
MemBlockBuilder<uint8_t> mem_block(total_size_bytes);
mem_block.appendOne(num_names);
for (auto& encoding : encodings) {
p += encoding.moveToStorage(p);
encoding.moveToMemBlock(mem_block);
}

// This assertion double-checks the arithmetic where we computed
// total_size_bytes. After appending all the encoded data into the
// allocated byte array, we should wind up with a pointer difference of
// total_size_bytes from the beginning of the allocation.
ASSERT(p == &storage[0] + total_size_bytes);
list.moveStorageIntoList(std::move(storage));
// allocated byte array, we should have exhausted all the memory
// we though we needed.
ASSERT(mem_block.capacityRemaining() == 0);
list.moveStorageIntoList(mem_block.release());
}

StatNameList::~StatNameList() { ASSERT(!populated()); }
Expand Down
Loading