Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: reduce the memory usage of LC Trie construction #4117

Merged
merged 6 commits into from
Aug 14, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 9 additions & 17 deletions source/common/network/lc_trie.h
Original file line number Diff line number Diff line change
Expand Up @@ -445,19 +445,13 @@ template <class T> class LcTrie {
ip_prefixes_ = data;
std::sort(ip_prefixes_.begin(), ip_prefixes_.end());

// In theory, the trie_ vector can have at most twice the number of ip_prefixes entries - 1.
// However, due to the fill factor a buffer is added to the size of the
// trie_. The buffer value(2000000) is reused from the reference implementation in
// http://www.csc.kth.se/~snilsson/software/router/C/trie.c.
// TODO(ccaraman): Define a better buffer value when resizing the trie_.
maximum_trie_node_size = 2 * ip_prefixes_.size() + 2000000;
trie_.resize(maximum_trie_node_size);

// Build the trie_.
trie_.reserve(size_t(ip_prefixes_.size() / fill_factor_));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size_t constructor looks bizarre, but I'm not C++ expert. @ccaraman / @mattklein123 could you take a look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine, could also use a static cast.

uint32_t next_free_index = 1;
buildRecursive(0u, 0u, ip_prefixes_.size(), 0u, next_free_index);

// The value of next_free_index is the final size of the trie_.
ASSERT(next_free_index <= trie_.size());
trie_.resize(next_free_index);
trie_.shrink_to_fit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that I agree with removing those 2 lines... We still need to shrink it to release the memory, don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the default case where fill_ratio is 0.5, it arguably doesn't make sense to release the memory here: the capacity of the buffer at this point can potentially be up to 2x the used size, but that's true of most vectors if the std::vector implementation uses a typical power-of-two allocator internally. For smaller values of fill_ratio, though, reclaiming space here might be a win. I'll try reenabling the shrink-to-fit operation here to find out if it negatively impacts performance in the common case (by requiring a reallocation + memcpy).

}
Expand Down Expand Up @@ -575,21 +569,23 @@ template <class T> class LcTrie {
*/
void buildRecursive(uint32_t prefix, uint32_t first, uint32_t n, uint32_t position,
uint32_t& next_free_index) {
// Setting a leaf, the branch and skip are 0.
if (n == 1) {
if (position >= trie_.size()) {
// There is no way to predictably determine the number of trie nodes required to build a
// LC-Trie. If while building the trie the position that is being set exceeds the maximum
// number of supported trie_ entries, throw an Envoy Exception.
if (position >= maximum_trie_node_size) {
if (position >= MaxLcTrieNodes) {
// Adding 1 to the position to count how many nodes are trying to be set.
throw EnvoyException(
fmt::format("The number of internal nodes required for the LC-Trie "
"exceeded the maximum number of "
"supported nodes. Minimum number of internal nodes required: "
"'{0}'. Maximum number of supported nodes: '{1}'.",
(position + 1), maximum_trie_node_size));
(position + 1), MaxLcTrieNodes));
}

trie_.resize(position + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve() gives additional 5% performance boost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll switch to reserve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you didn't...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I realized it would cause a buffer write overflow...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come? If trie_.reserve(position + 1) doesn't increase capacity, then how come trie_.reserve(size_t(ip_prefixes_.size() / fill_factor_)) does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both have the potential to increase capacity. But what's needed in order for trie_[position] to be valid isn't an increase in trie_'s capacity, but rather an increase in its size. The reserve method doesn't increase the size, whereas resize does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's >= size() and not >= capacity() check. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are really trying to go for micro/max perf you could probably switch to a C style array here, but probably not worth it right now.

}
// Setting a leaf, the branch and skip are 0.
if (n == 1) {
trie_[position].address_ = first;
return;
}
Expand Down Expand Up @@ -682,10 +678,6 @@ template <class T> class LcTrie {
uint32_t address_ : 20; // If this 20-bit size changes, please change MaxLcTrieNodes too.
};

// During build(), an estimate of the number of nodes required will be made and set this
// value. This is used to ensure no out_of_range exception is thrown.
uint32_t maximum_trie_node_size;

// The CIDR range and data needs to be maintained separately from the LC-Trie. A LC-Trie skips
// chunks of data while searching for a match. This means that the node found in the LC-Trie
// is not guaranteed to have the IP address in range. The last step prior to returning
Expand Down
72 changes: 68 additions & 4 deletions test/common/network/lc_trie_speed_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,55 @@ namespace {

std::vector<Envoy::Network::Address::InstanceConstSharedPtr> addresses;

std::vector<std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>> tag_data;

std::vector<std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>>
tag_data_nested_prefixes;

std::vector<std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>>
tag_data_minimal;

std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> lc_trie;

std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> lc_trie_nested_prefixes;

std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> lc_trie_minimal;

} // namespace

namespace Envoy {

static void BM_LcTrieConstruct(benchmark::State& state) {
Copy link
Contributor

@PiotrSikora PiotrSikora Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add benchmarks for construct with:

  • nested prefixes (1420μs before vs 980μs after, i.e. ~30% faster),
  • 0.0.0.0/0 (390μs before vs 500ns after, i.e. ~800x faster).

std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> trie;
for (auto _ : state) {
trie = std::make_unique<Envoy::Network::LcTrie::LcTrie<std::string>>(tag_data);
}
benchmark::DoNotOptimize(trie);
}

BENCHMARK(BM_LcTrieConstruct);

static void BM_LcTrieConstructNested(benchmark::State& state) {
std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> trie;
for (auto _ : state) {
trie = std::make_unique<Envoy::Network::LcTrie::LcTrie<std::string>>(tag_data_nested_prefixes);
}
benchmark::DoNotOptimize(trie);
}

BENCHMARK(BM_LcTrieConstructNested);

static void BM_LcTrieConstructMinimal(benchmark::State& state) {

std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> trie;
for (auto _ : state) {
trie = std::make_unique<Envoy::Network::LcTrie::LcTrie<std::string>>(tag_data_minimal);
}
benchmark::DoNotOptimize(trie);
}

BENCHMARK(BM_LcTrieConstructMinimal);

static void BM_LcTrieLookup(benchmark::State& state) {
static size_t i = 0;
size_t output_tags = 0;
Expand All @@ -41,6 +82,19 @@ static void BM_LcTrieLookupWithNestedPrefixes(benchmark::State& state) {

BENCHMARK(BM_LcTrieLookupWithNestedPrefixes);

static void BM_LcTrieLookupMinimal(benchmark::State& state) {
static size_t i = 0;
size_t output_tags = 0;
for (auto _ : state) {
i++;
i %= addresses.size();
output_tags += lc_trie_minimal->getData(addresses[i]).size();
}
benchmark::DoNotOptimize(output_tags);
}

BENCHMARK(BM_LcTrieLookupMinimal);

} // namespace Envoy

// Boilerplate main(), which discovers benchmarks in the same file and runs them.
Expand All @@ -54,19 +108,29 @@ int main(int argc, char** argv) {
addresses.push_back(Envoy::Network::Utility::parseInternetAddress(address));
}

std::vector<std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>> tag_data;
// Construct three sets of prefixes: one consisting of 1,024 addresses in an
// RFC 5737 netblock, another consisting of those same addresses plus
// 0.0.0.0/0 (to exercise the LC Trie's support for nested prefixes),
// and finally a set containing only 0.0.0.0/0.
for (int i = 0; i < 32; i++) {
for (int j = 0; j < 32; j++) {
tag_data.emplace_back(std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>(
{"tag_1",
{Envoy::Network::Address::CidrRange::create(fmt::format("192.0.{}.{}/32", i, j))}}));
}
}
tag_data_nested_prefixes = tag_data;
tag_data_nested_prefixes.emplace_back(
std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>(
{"tag_0", {Envoy::Network::Address::CidrRange::create("0.0.0.0/0")}}));
tag_data_minimal.emplace_back(
std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>(
{"tag_1", {Envoy::Network::Address::CidrRange::create("0.0.0.0/0")}}));

lc_trie = std::make_unique<Envoy::Network::LcTrie::LcTrie<std::string>>(tag_data);
tag_data.emplace_back(std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>(
{"tag_0", {Envoy::Network::Address::CidrRange::create("0.0.0.0/0")}}));
lc_trie_nested_prefixes = std::make_unique<Envoy::Network::LcTrie::LcTrie<std::string>>(tag_data);
lc_trie_nested_prefixes =
std::make_unique<Envoy::Network::LcTrie::LcTrie<std::string>>(tag_data_nested_prefixes);
lc_trie_minimal = std::make_unique<Envoy::Network::LcTrie::LcTrie<std::string>>(tag_data_minimal);

benchmark::Initialize(&argc, argv);
if (benchmark::ReportUnrecognizedArguments(argc, argv)) {
Expand Down