Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: Add fake symbol table as an intermediate state to move to SymbolTable API without taking locks. #5414

Merged
merged 23 commits into from
Jan 30, 2019
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b1fcd49
catch up with symtab-read-lock and to-string-on-symtab.
jmarantz Dec 18, 2018
121ec75
refactor toString
jmarantz Dec 22, 2018
a54a2bc
Merge branch 'master' into fake-symbol-table
jmarantz Dec 22, 2018
4b9570c
virtualize symbol-table.
jmarantz Dec 23, 2018
0e9303b
use virtual interface in tests.
jmarantz Dec 23, 2018
4fa1eb2
all tests working.
jmarantz Dec 24, 2018
8a6aec1
Fix asan failures, add comments, cleanup.
jmarantz Dec 25, 2018
2209115
clang-tidy fixes.
jmarantz Dec 25, 2018
adf956e
Merge branch 'master' into fake-symbol-table
jmarantz Jan 14, 2019
a92121d
Merge branch 'master' into fake-symbol-table
jmarantz Jan 17, 2019
c5e25e1
Merge branch 'master' into fake-symbol-table
jmarantz Jan 22, 2019
a5a112e
Sink Storage type nicknames into SymbolTable class.
jmarantz Jan 22, 2019
b37dc32
comment cleanup.
jmarantz Jan 23, 2019
9140665
Merge branch 'master' into fake-symbol-table
jmarantz Jan 26, 2019
11fd3c0
Privatize SymbolTable::free and incRefCount, friending helper classes…
jmarantz Jan 27, 2019
675b9d6
Improve comments, fix nits, typos, etc.
jmarantz Jan 28, 2019
35709b3
Remove 2-arg form of join().
jmarantz Jan 28, 2019
392a0be
Merge branch 'master' into fake-symbol-table
jmarantz Jan 29, 2019
e9f2b50
Review style nits and actually test for zero contentions in fake symb…
jmarantz Jan 29, 2019
71df963
Only start tracking the contentions right before doing the accesses.
jmarantz Jan 30, 2019
ecb1b88
Add missing include for vector.
jmarantz Jan 30, 2019
586103c
Merge branch 'master' into fake-symbol-table
jmarantz Jan 30, 2019
d804664
Merge branch 'master' into fake-symbol-table
jmarantz Jan 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 147 additions & 3 deletions include/envoy/stats/symbol_table.h
Original file line number Diff line number Diff line change
@@ -1,13 +1,157 @@
#pragma once

#include <memory>

#include "envoy/common/pure.h"

#include "absl/strings/string_view.h"

namespace Envoy {
namespace Stats {

// Interface for referencing a stat name.
class StatName;
class SymbolEncoding;

/**
* Efficient byte-encoded storage of an array of tokens. The most common tokens
* are typically < 127, and are represented directly. tokens >= 128 spill into
* the next byte, allowing for tokens of arbitrary numeric value to be stored.
* As long as the most common tokens are low-valued, the representation is
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* space-efficient. This scheme is similar to UTF-8.
*/
using SymbolStorage = uint8_t[];
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
using SymbolStoragePtr = std::unique_ptr<SymbolStorage>;

/**
* SymbolTable manages a namespace optimized for stats, which are typically
* composed of arrays of "."-separated tokens, with a significant overlap
* between the tokens. Each token is mapped to a Symbol (uint32_t) and
* reference-counted so that no-longer-used symbols can be reclaimed.
*
* We use a uint8_t array to encode arrays of symbols in order to conserve
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* space, as in practice the majority of token instances in stat names draw from
* a fairly small set of common names, typically less than 100. The format is
* somewhat similar to UTF-8, with a variable-length array of uint8_t. See the
* implementation for details.
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
*
* StatNameStorage can be used to manage memory for the byte-encoding. Not all
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* StatNames are backed by StatNameStorage -- the storage may be inlined into
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* another object such as HeapStatData. StaNameStorage is not fully RAII --
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* instead the owner must call free(SymbolTable&) explicitly before
* StatNameStorage is destructed. This saves 8 bytes of storage per stat,
* relative to holding a SymbolTable& in each StatNameStorage object.
*
* A StatName is a copyable and assignable reference to this storage. It does
* not own the storage or keep it alive via reference counts; the owner must
* ensure the backing store lives as long as the StatName.
*
* The underlying Symbol / SymbolVec data structures are private to the
* impl. One side effect of the non-monotonically-increasing symbol counter is
* that if a string is encoded, the resulting stat is destroyed, and then that
* same string is re-encoded, it may or may not encode to the same underlying
* symbol.
*/
class SymbolTable {
public:
virtual ~SymbolTable() = default;

/**
* Encodes a stat name using the symbol table, returning a SymbolEncoding. The
* SymbolEncoding is not intended for long-term storage, but is used to help
* allocate and StatName with the correct amount of storage.
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
*
* When a name is encoded, it bumps reference counts held in the table for
* each symbol. The caller is responsible for creating a StatName using this
* SymbolEncoding and ultimately disposing of it by calling
* StatName::free(). Otherwise the symbols will leak for the lifetime of the
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* table, though they won't show up as a C++ leaks as the memory is still
* reachable from the SymolTable.
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
*
* @param name The name to encode.
* @return SymbolEncoding the encoded symbols.
*/
virtual SymbolEncoding encode(absl::string_view name) PURE;

/**
* @return uint64_t the number of symbols in the symbol table.
*/
virtual uint64_t numSymbols() const PURE;

/**
* Decodes a vector of symbols back into its period-delimited stat name. If
* decoding fails on any part of the symbol_vec, we release_assert and crash
* hard, since this should never happen, and we don't want to continue running
* with a corrupt stats set.
*
* @param stat_name the stat name.
* @return std::string stringifiied stat_name.
*/
virtual std::string toString(const StatName& stat_name) const PURE;

/**
* Deterines whether one StatName lexically precedes another. Note that
* the lexical order may not exactly match the lexical order of the
* elaborated strings. For example, stat-name of "-.-" would lexically
* sort after "---" but when encoded as a StatName would come lexically
* earlier. In practice this is unlikely to matter as those are not
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* reasonable names for Envoy stats.
*
* Note that this operation has to be performed with the context of the
* SymbolTable so that the individual Symbol objects can be converted
* into strings for lexical comparison.
*
* @param a the first stat name
* @param b the second stat name
* @return bool true if a lexically precedes b.
*/
virtual bool lessThan(const StatName& a, const StatName& b) const PURE;

/**
* Since SymbolTable does manual reference counting, a client of SymbolTable
* must manually call free(symbol_vec) when it is freeing the backing store
* for a StatName. This way, the symbol table will grow and shrink
* dynamically, instead of being write-only.
*
* @param symbol_vec the vector of symbols to be freed.
*/
virtual void free(const StatName& stat_name) PURE;

/**
* StatName backing-store can be managed by callers in a variety of ways
* to minimize overhead. But any persistent reference to a StatName needs
* to hold onto its own reference-counts for all symbols. This method
* helps callers ensure the symbol-storage is maintained for the lifetime
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* of a reference.
*
* @param symbol_vec the vector of symbols to be freed.
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
*/
virtual void incRefCount(const StatName& stat_name) PURE;

/**
* Joins two or more StatNames. For example if we have StatNames for {"a.b",
* "c.d", "e.f"} then the joined stat-name matches "a.b.c.d.e.f". The
* advantage of using this representation is that it avoids having to
* decode/encode into the elaborted form, and does not require locking the
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
* SymbolTable.
*
* The caveat is that this representation does not bump reference counts on
* the referenced Symbols in the SymbolTable, so it's only valid as long for
* the lifetime of the joined StatNames.
*
* This is intended for use doing cached name lookups of scoped stats, where
* the scope prefix and the names to combine it with are already in StatName
* form. Using this class, they can be combined without acessingm the
* SymbolTable or, in particular, taking its lock.
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
*/
virtual SymbolStoragePtr join(const StatName& a, const StatName& b) const PURE;
virtual SymbolStoragePtr join(const std::vector<StatName>& stat_names) const PURE;

#ifndef ENVOY_CONFIG_COVERAGE
virtual void debugPrint() const PURE;
#endif
};

// Interface for managing symbol tables.
class SymbolTable;
using SharedSymbolTable = std::shared_ptr<SymbolTable>;

} // namespace Stats
} // namespace Envoy
6 changes: 6 additions & 0 deletions source/common/stats/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,12 @@ envoy_cc_library(
],
)

envoy_cc_library(
name = "fake_symbol_table_lib",
hdrs = ["fake_symbol_table_impl.h"],
deps = [":symbol_table_lib"],
)

envoy_cc_library(
name = "stats_options_lib",
hdrs = ["stats_options_impl.h"],
Expand Down
101 changes: 101 additions & 0 deletions source/common/stats/fake_symbol_table_impl.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#pragma once

#include <algorithm>
#include <cstring>
#include <memory>
#include <stack>
#include <string>
#include <unordered_map>
#include <vector>

#include "envoy/common/exception.h"
#include "envoy/stats/symbol_table.h"

#include "common/common/assert.h"
#include "common/common/hash.h"
#include "common/common/lock_guard.h"
#include "common/common/non_copyable.h"
#include "common/common/thread.h"
#include "common/common/utility.h"
#include "common/stats/symbol_table_impl.h"

#include "absl/strings/str_join.h"
#include "absl/strings/str_split.h"

namespace Envoy {
namespace Stats {

/**
* Implements the SymbolTable interface without taking locks or saving memory.
* This implementation is intended as a transient state for the Envoy codebase
* to allow incremental conversion of Envoy stats call-sites to use the
* SymbolTable interface, pre-allocating symbols during construction time for
* all stats tokens.
*
* Once all stat tokens are symbolized at construction time, this
* FakeSymbolTable implementation can be deleted, and real-symbol tables can be
* used, thereby reducing memory and improving stat construction time.
*
* Note that it is not necessary to pre-allocate all elaborated stat names
* because multiple StatNames can be joined together without taking locks,
* even in SymbolTableImpl.
*
* This implementation simply stores the characters directly in the uint8_t[]
* that backs each StatName, so there is no sharing or memory savings, but also
* no state associated with the SymbolTable, and thus no locks needed.
*
* TODO(jmarantz): delete this class once SymbolTable is fully deployed in the
* Envoy codebase.
*/
class FakeSymbolTableImpl : public SymbolTable {
public:
SymbolEncoding encode(absl::string_view name) override { return encodeHelper(name); }

std::string toString(const StatName& stat_name) const override {
return std::string(toStringView(stat_name));
}
uint64_t numSymbols() const override { return 0; }
bool lessThan(const StatName& a, const StatName& b) const override {
return toStringView(a) < toStringView(b);
}
void free(const StatName&) override {}
void incRefCount(const StatName&) override {}
SymbolStoragePtr join(const StatName& a, const StatName& b) const override {
return join({a, b});
}
SymbolStoragePtr join(const std::vector<StatName>& names) const override {
std::vector<absl::string_view> strings;
for (StatName name : names) {
absl::string_view str = toStringView(name);
if (!str.empty()) {
strings.push_back(str);
}
}
return stringToStorage(absl::StrJoin(strings, "."));
}

#ifndef ENVOY_CONFIG_COVERAGE
void debugPrint() const override {}
#endif

private:
SymbolEncoding encodeHelper(absl::string_view name) const {
SymbolEncoding encoding;
encoding.addStringForFakeSymbolTable(name);
return encoding;
}

absl::string_view toStringView(const StatName& stat_name) const {
return {reinterpret_cast<const char*>(stat_name.data()), stat_name.dataSize()};
}

SymbolStoragePtr stringToStorage(absl::string_view name) const {
SymbolEncoding encoding = encodeHelper(name);
auto bytes = std::make_unique<uint8_t[]>(encoding.bytesRequired());
encoding.moveToStorage(bytes.get());
return bytes;
}
};

} // namespace Stats
} // namespace Envoy
Loading