perf: reduce the memory usage of LC Trie construction #4117

brian-pane · 2018-08-12T04:51:01Z

Description:
Rather than preallocating a very large buffer to hold the LC Trie nodes
the way the reference implementation from the original Nilsson & Karlsson
paper did, the trie construction now starts out with a size based on the
number of prefixes and lets the std::vector manage any needed reallocation
during trie construction.

In addition to using less memory during trie construction, this new approach
ended up being faster:

Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
Before:
BM_LcTrieConstruct                  26631971 ns   26522462 ns         26

After:
BM_LcTrieConstruct                   3654633 ns    3646942 ns        189

Risk Level: medium
Testing: unit tests plus new speed benchmark test
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Brian Pane bpane@pinterest.com

Rather than preallocating a very large buffer to hold the LC Trie nodes the way the reference implementation from the original Nilsson & Karlsson paper did, the trie construction now starts out with a size based on the number of prefixes and lets the `std::vector` manage any needed reallocation during trie construction. In addition to using less memory during trie construction, this new approach ended up being faster: ``` Benchmark Time CPU Iterations ------------------------------------------------------------------------- Before: BM_LcTrieConstruct 26631971 ns 26522462 ns 26 After: BM_LcTrieConstruct 3654633 ns 3646942 ns 189 ``` Signed-off-by: Brian Pane <bpane@pinterest.com>

brian-pane · 2018-08-12T04:59:56Z

@PiotrSikora @rshriram @ccaraman this PR is a follow-up to 3980 that aims to further reduce the temporary memory footprint during trie construction.

Signed-off-by: Brian Pane <bpane@pinterest.com>

PiotrSikora · 2018-08-13T23:21:30Z

source/common/network/lc_trie.h

      // Build the trie_.
+      trie_.reserve(ip_prefixes_.size());


From my experiments, it seems that the majority of the CPU savings here come from using reserve() and not reducing the number of the items.

Since you've also added protection against overflow, is there any reason not to use 2 * ip_prefixes_.size() - 1 to avoid reallocations? This should be still small enough.

The number of elements needed in the vector can be bigger than 2 * ip_prefixes_.size(), depending on the fill_factor. I think size_t(ip_prefixes_.size() / fill_factor) will work.

PiotrSikora · 2018-08-13T23:21:35Z

source/common/network/lc_trie.h

        }
-
+        trie_.resize(position + 1);


reserve() gives additional 5% performance boost.

Thanks! I'll switch to reserve.

But you didn't...

Because I realized it would cause a buffer write overflow...

How come? If trie_.reserve(position + 1) doesn't increase capacity, then how come trie_.reserve(size_t(ip_prefixes_.size() / fill_factor_)) does?

They both have the potential to increase capacity. But what's needed in order for trie_[position] to be valid isn't an increase in trie_'s capacity, but rather an increase in its size. The reserve method doesn't increase the size, whereas resize does.

Right, it's >= size() and not >= capacity() check. Thanks!

If you are really trying to go for micro/max perf you could probably switch to a C style array here, but probably not worth it right now.

PiotrSikora · 2018-08-13T23:39:01Z

test/common/network/lc_trie_speed_test.cc

@@ -15,6 +17,16 @@ std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> lc_trie_nested_pref

 namespace Envoy {

+static void BM_LcTrieConstruct(benchmark::State& state) {


Can you add benchmarks for construct with:

nested prefixes (1420μs before vs 980μs after, i.e. ~30% faster),

0.0.0.0/0 (390μs before vs 500ns after, i.e. ~800x faster).

Before: ``` ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_LcTrieConstruct 26580552 ns 26462148 ns 27 BM_LcTrieConstructNested 27428018 ns 27263654 ns 26 BM_LcTrieConstructMinimal 21800259 ns 21758937 ns 32 ``` After: ``` ------------------------------------------------------------------------- BM_LcTrieConstruct 3616441 ns 3613198 ns 192 (7x speedup) BM_LcTrieConstructNested 5053845 ns 5046373 ns 134 (5x) BM_LcTrieConstructMinimal 4400 ns 4393 ns 159035 (4,900x) ``` Signed-off-by: Brian Pane <bpane@pinterest.com>

PiotrSikora · 2018-08-14T01:21:36Z

source/common/network/lc_trie.h

        }
-
+        trie_.resize(position + 1);


But you didn't...

PiotrSikora · 2018-08-14T01:22:57Z

source/common/network/lc_trie.h

      uint32_t next_free_index = 1;
      buildRecursive(0u, 0u, ip_prefixes_.size(), 0u, next_free_index);

      // The value of next_free_index is the final size of the trie_.
-      trie_.resize(next_free_index);
-      trie_.shrink_to_fit();


i don't think that I agree with removing those 2 lines... We still need to shrink it to release the memory, don't we?

In the default case where fill_ratio is 0.5, it arguably doesn't make sense to release the memory here: the capacity of the buffer at this point can potentially be up to 2x the used size, but that's true of most vectors if the std::vector implementation uses a typical power-of-two allocator internally. For smaller values of fill_ratio, though, reclaiming space here might be a win. I'll try reenabling the shrink-to-fit operation here to find out if it negatively impacts performance in the common case (by requiring a reallocation + memcpy).

PiotrSikora · 2018-08-14T01:24:34Z

source/common/network/lc_trie.h

      // Build the trie_.
+      trie_.reserve(size_t(ip_prefixes_.size() / fill_factor_));


The size_t constructor looks bizarre, but I'm not C++ expert. @ccaraman / @mattklein123 could you take a look?

It's fine, could also use a static cast.

PiotrSikora · 2018-08-14T01:25:56Z

test/common/network/lc_trie_speed_test.cc

+
+static void BM_LcTrieConstructMinimal(benchmark::State& state) {
+  std::vector<std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>> tags;
+  tags.emplace_back(std::pair<std::string, std::vector<Envoy::Network::Address::CidrRange>>(


Could you move it to global variable to match others and to avoid recreating this for every run?

Also, could you add BM_LcTrieLookupMinimal?

…pace after construction Latest results ``` Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_LcTrieConstruct 3623593 ns 3621521 ns 188 BM_LcTrieConstructNested 5187460 ns 5182913 ns 103 BM_LcTrieConstructMinimal 5328 ns 5191 ns 153615 ``` Based on these benchmark results, reclaiming space after construction _does_ hurt construction speed, although the trie construction is still at least 4x faster and uses many megabytes less memory than it did prior to this PR. Signed-off-by: Brian Pane <bpane@pinterest.com>

mattklein123 · 2018-08-14T03:26:50Z

source/common/network/lc_trie.h

      // Build the trie_.
+      trie_.reserve(size_t(ip_prefixes_.size() / fill_factor_));


It's fine, could also use a static cast.

mattklein123 · 2018-08-14T03:28:13Z

source/common/network/lc_trie.h

        }
-
+        trie_.resize(position + 1);


If you are really trying to go for micro/max perf you could probably switch to a C style array here, but probably not worth it right now.

Signed-off-by: Brian Pane <bpane@pinterest.com>

* origin/master: (38 commits) test: add tests for corner-cases around sending requests before run() starts or after run() ends. (envoyproxy#4114) perf: reduce the memory usage of LC Trie construction (envoyproxy#4117) test: moving redundant code in websocket_integration_test to utilities (envoyproxy#4127) test: make YamlLoadFromStringFail less picky about error msg. (envoyproxy#4141) rbac: add rbac network filter. (envoyproxy#4083) fuzz: route lookup and header finalization fuzzer. (envoyproxy#4116) Set content-type and content-length (envoyproxy#4113) fault: use FractionalPercent for percent (envoyproxy#3978) test: Fix inverted exact match logic in IntegrationTcpClient::waitForData() (envoyproxy#4134) Added cluster_name to load assignment config for static cluster (envoyproxy#4123) ssl: refactor ContextConfig to use TlsCertificateConfig (envoyproxy#4115) syscall: refactor OsSysCalls for deeper errno latching (envoyproxy#4111) thrift_proxy: fix oneway bugs (envoyproxy#4025) Do not crash when converting YAML to JSON fails (envoyproxy#4110) config: allow unknown fields flag (take 2) (envoyproxy#4096) Use a jittered backoff strategy for handling HdsDelegate stream/connection failures (envoyproxy#4108) bazel: use GCS remote cache (envoyproxy#4050) Add thread local cache of overload action states (envoyproxy#4090) Added TCP healthcheck capabilities to the HdsDelegate (envoyproxy#4079) secret: add secret provider interface and use it for TlsCertificates (envoyproxy#4086) ... Signed-off-by: Snow Pettersen <snowp@squareup.com>

mattklein123 assigned ccaraman Aug 12, 2018

louiscryan mentioned this pull request Aug 13, 2018

Epic - Performance and scale issues in 1.0 istio/istio#7879

Closed

Brian Pane added 2 commits August 13, 2018 15:32

Kick CI

171bdbf

Signed-off-by: Brian Pane <bpane@pinterest.com>

Kick CI

9675509

Signed-off-by: Brian Pane <bpane@pinterest.com>

PiotrSikora suggested changes Aug 13, 2018

View reviewed changes

PiotrSikora reviewed Aug 13, 2018

View reviewed changes

PiotrSikora suggested changes Aug 14, 2018

View reviewed changes

PiotrSikora mentioned this pull request Aug 14, 2018

build: use clang to build release binary. #4138

Closed

PiotrSikora previously approved these changes Aug 14, 2018

View reviewed changes

mattklein123 previously approved these changes Aug 14, 2018

View reviewed changes

Switch from C-style cast to C++ static_cast

9dfc9fd

Signed-off-by: Brian Pane <bpane@pinterest.com>

brian-pane dismissed stale reviews from mattklein123 and PiotrSikora via 9dfc9fd August 14, 2018 04:23

PiotrSikora approved these changes Aug 14, 2018

View reviewed changes

mattklein123 approved these changes Aug 14, 2018

View reviewed changes

PiotrSikora mentioned this pull request Aug 14, 2018

Listener addition takes 0.3s istio/istio#7759

Closed

mattklein123 merged commit 3527f77 into envoyproxy:master Aug 14, 2018

PiotrSikora mentioned this pull request Aug 15, 2018

Use clang to build release binary. #4158

Closed

brian-pane deleted the lc_trie_memory branch May 25, 2019 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce the memory usage of LC Trie construction #4117

perf: reduce the memory usage of LC Trie construction #4117

brian-pane commented Aug 12, 2018

brian-pane commented Aug 12, 2018

PiotrSikora Aug 13, 2018

brian-pane Aug 13, 2018

PiotrSikora Aug 13, 2018

brian-pane Aug 13, 2018

PiotrSikora Aug 14, 2018

brian-pane Aug 14, 2018

PiotrSikora Aug 14, 2018

brian-pane Aug 14, 2018

PiotrSikora Aug 14, 2018

mattklein123 Aug 14, 2018

PiotrSikora Aug 13, 2018 •

edited

Loading

PiotrSikora Aug 14, 2018

PiotrSikora Aug 14, 2018

brian-pane Aug 14, 2018

PiotrSikora Aug 14, 2018

mattklein123 Aug 14, 2018

PiotrSikora Aug 14, 2018

mattklein123 Aug 14, 2018

mattklein123 Aug 14, 2018

		@@ -15,6 +17,16 @@ std::unique_ptr<Envoy::Network::LcTrie::LcTrie<std::string>> lc_trie_nested_pref

		namespace Envoy {

		static void BM_LcTrieConstruct(benchmark::State& state) {

		// Build the trie_.
		trie_.reserve(size_t(ip_prefixes_.size() / fill_factor_));

perf: reduce the memory usage of LC Trie construction #4117

perf: reduce the memory usage of LC Trie construction #4117

Conversation

brian-pane commented Aug 12, 2018

brian-pane commented Aug 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PiotrSikora Aug 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PiotrSikora Aug 13, 2018 •

edited

Loading