runtime: codifying runtime guarded features #6134

alyssawilk · 2019-02-28T22:03:08Z

Documenting best practices and adding some tooling for guarding high risk features

Risk Level: Low (mostly new code)
Testing: Added new unit tests
Docs Changes: updated CONTRIBUTING.md
Release Notes: n/a
Part of #5693 (TODO: handling deprecating and adding tooling scripts)

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

dnoe · 2019-03-01T16:00:43Z

CONTRIBUTING.md

+guarded due to performance concerns) and a full deprecation cycle (if it is a high risk behavioral
+change).
+
+The canonican way to runtime guard a feature is


Unless canonicans are some sci-fi aliens that always speak canonically, I assume this is typo :)

dnoe · 2019-03-01T16:01:43Z

CONTRIBUTING.md

+```
+bool use_new_code_path_ = Runtime::LoaderSingleton::getExisting() &&
+  Runtime::LoaderSingleton::getExisting()->snapshot()->runtimeFeatureEnabled(
+      "envoy.reloadable_features.my_feature_name"));


Is it worth trying to make this a bit less verbose at the sites where it is used? There's a lot of boilerplate to get to runtimeFeatureEnabled.

Yeah we could likely have a wrapper that internally loads the singleton and does the right thing if it doesn't exist.

Definitely the right way to go.

Less clear, should we use this in tests? I was a bit bummed at the heavyweight work in TestEnvironment to be able to flag override - setting up all those fake components. With the helper if we want to have a friend test-only class in ConstSingleton we can skip all the static state and just muck with the const singleton directly. I think that might also work for integration tests, which would be pretty awesome.

That said, it's making our const-singleton non-const and I know how much the larger Envoy community loves hacks like that ;-)

What about an approach like #6139 with some test class mixin that changes the singleton behavior? That seems like a really cleanly structured way to have a "mostly const" singleton that alters behavior only within a particular test.

dnoe · 2019-03-01T16:03:11Z

CONTRIBUTING.md

+2. Set up integration tests with custom runtime defaults as documented in the
+   [integration test README](https://github.com/envoyproxy/envoy/blob/master/test/integration/README.md)
+3. Run a given unit test with the new runtime value explicitly set true as done
+   for [runtime_flag_override_test](https://github.com/envoyproxy/envoy/blob/master/test/common/runtime/BUILD) 


Is it also worth specifically calling out parameterized tests as a useful tool here? I find that particularly useful when the logical behavior is supposed to be the same regardless of flag value (ie, goal is pure refactoring or performance improvement).

dnoe · 2019-03-01T16:06:56Z

test/common/runtime/runtime_flag_override_test.cc

+// In the envoy_cc_test declaration, the flag is set
+// "--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false"
+TEST(RuntimeFlagOverrideTest, OverridesWork) {
+  Snapshot& snapshot = Runtime::LoaderSingleton::getExisting()->snapshot();


const Snapshot& ?

mattklein123

Very nice, just took a doc pass will look again once other reviewers are finished.

mattklein123 · 2019-03-01T18:04:02Z

CONTRIBUTING.md

+```
+bool use_new_code_path_ = Runtime::LoaderSingleton::getExisting() &&
+  Runtime::LoaderSingleton::getExisting()->snapshot()->runtimeFeatureEnabled(
+      "envoy.reloadable_features.my_feature_name"));


Yeah we could likely have a wrapper that internally loads the singleton and does the right thing if it doesn't exist.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

lizan · 2019-03-04T21:18:33Z

include/envoy/runtime/runtime.h

+  // Runtime features are used to easily allow switching between old and new code paths for high
+  // risk changes. The intent is for the old code path to be short lived - the old code path is
+  // deprecated as the feature is defaulted true, and removed with the following Envoy release.
+  virtual bool runtimeFeatureEnabled(const std::string& key) const PURE;


use absl::string_view? It will avoid constructing std::string when caller call this with literals. but might require underlying methods take string_view.

As written I think someone could
static std::string feature = "envoy.reloadable_features.my_feature_name"));
if (Runtime::runtimeFeatureEnabled(feature) { ...} and avoid the string conversion on hash map lookup each time.

If they do the inefficient
if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")
they're creating a string every time.

If I switch to string_view I think the map lookup will convert the string piece to the string on every single call, and there's no way to avoid it, so I think that's strictly less performant?

I could update the instructions to encourage static string on the canonical non-latch path if we think that makes more sense

As long as it is an absl::flat_hash_{map,set} I think heterogenous lookup means that there is no string construction in the map lookup, even though the type in the map is std::string: https://abseil.io/tips/144

TIL! Ok, got that to work for the flat hash, but unfortunately to string_view all the way up the API we need to also change Runtime::EntryMap in include/envoy/runtime/runtime.h to have a comparator. I'd prefer to land this as-is and do a separate PR converting to string piece, especially as none of the call sites will need to change. What do you think of a TODO and a follow-up?

Yeah fine to leave as TODO. static std::string feature = "envoy.reloadable_features.my_feature_name"; is a static non-POD object initialization so there might be static initialization fiasco. string_view constructor is safe as it is constexpr.

That sounds fine to me.

lizan · 2019-03-04T21:23:56Z

source/common/runtime/runtime_features.cc

+// These features should not be overridden with run-time guards without a bug
+// being filed on github as once high risk features are true by default, the
+// old code path will be removed with the next release.
+const char* runtime_features[] = {


nit: constexpr

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

dnoe

LGTM, thanks!

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk · 2019-03-06T20:09:17Z

@mattklein123 I think this is ready for your pass

mattklein123

Thanks this is super awesome. Generally looks good but a few questions/comments.

/wait

mattklein123 · 2019-03-06T22:30:41Z

CONTRIBUTING.md

+
+```
+bool use_new_code_path_ =
+    Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")


Should we have a different namespace for this type of thing? envoy.startup_features. or similar? It's a little confusing that the docs point out that everything named like this is reloadable at runtime and then we talk about how it's not always true. :)

Yeah, I need to remove the buffer mentions on this one. My thought for latching is if we latched at the hcm, or at a network::connection or some such, where the lifetime were transient. As called out on the buffer PR that's truly restart-only not reloadable.

I do want to support envoy.restart_features but I also need to talk to you about how to do that. In my priors, restart flags were done on the command line (only when the binary was brought up) and it was illegal to change then when pushing config (would cause that config to get rejected). Otherwise you do a config push with a new-but-buggy restart flag, the config "passes", you hot restart, and your fleet is down. Only with Envoy and hot restarts I'm less clear on how we push config which is restart only. My default plan would be to that test-only runtime tweaking flag to be a actual production runtime tweaking flag, so folks can change restart values at start-up and we can hard-fail attempts to reload them at runtime, but I recall you saying that most people don't do flags.

OK yeah let's chat about that offline and maybe figure out in a follow up? I think for now maybe just clarify somehow that this should only be used for truly reloadable features and if there are features that should be startup configured it should be discussed? (Up to you how to phrase)

mattklein123 · 2019-03-06T22:33:45Z

source/common/runtime/runtime_features.cc

+    // Enabled
+    "envoy.reloadable_features.test_feature_true",
+    // Disabled
+    // "envoy.reloadable_features.test_feature_false",


Do we mean for this to be commented out here? If this is a documented way of defaulting to disabled, can you add more comments?

I was thinking at some point it'd be nice to have all flags documented in that file, so if folks want to preemptively flip they can do so.

I'll remove it and wait until we have utils and see if it still makes sense

mattklein123 · 2019-03-06T22:36:11Z

source/common/runtime/runtime_impl.cc

+bool SnapshotImpl::runtimeFeatureEnabled(const std::string& key) const {
+  bool enabled = false;
+  // See if this value is explicitly set as a runtime boolean.
+  bool stored = getBoolean(key, enabled);


nit: this can just be inlined in the if statement below?

mattklein123 · 2019-03-06T22:40:53Z

source/common/runtime/runtime_features.cc

+
+// Add additional features here to enable the new code paths by default.
+//
+// These features should not be overridden with run-time guards without a bug


I think you are talking to end users here, right? Can you clarify slightly and say something like, "If issues are found that that require a feature to be disabled, it should be reported upstream ASAP, ..."

mattklein123 · 2019-03-06T22:42:54Z

test/common/runtime/BUILD

+    name = "runtime_flag_override_test",
+    srcs = ["runtime_flag_override_test.cc"],
+    args = [
+        "--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false",


FWIW, this seems like a generally useful feature for runtime configuration where people don't want to use the filesystem (CLI/gflags implementation like we discussed). Is it worth adding a TODO or opening a help wanted issue on that?

Also, if possible can you rename envoy.reloadable_features.test_feature_false to something else? I found it pretty confusing to try to sort out the logic with the built in envoy.reloadable_features.test_feature_true. Maybe more comments would help also or just a different name like envoy.reloadable_features.injected_test_feature and envoy.reloadable_features.built_in_test_feature or something?

Do you think other folks would use this? I thought from prior discussions you thought google was flag happy but most folks wanted to use the existing runtime.
I think if we're going to use it out of test we have to allow for feature=true,feature2=false which I can do pretty easily.

Not sure if removing test_feature_false helps - I've clarified the language in both tests to hopefully clarify things.

I do think that Google is an outlier in using flags for this type of stuff, but I was just suggesting that this code might be generally useful to someone, so probably worth tracking somewhere in case it comes up again. I think it's totally fine to leave as test only code for now.

Thanks for all the comments, it's much easier to understand for me now.

mattklein123 · 2019-03-06T22:45:46Z

test/test_common/environment.cc

+  // As each test ends, clean up the RuntimeFeaturesDefaults state.
+  void OnTestEnd(const ::testing::TestInfo&) override {
+    if (!runtime_override_.empty()) {
+      Runtime::RuntimeFeaturesPeer::addFeature(runtime_override_);


Don't you want to remove here?

mattklein123 · 2019-03-06T22:51:11Z

test/test_common/environment.cc

+  // Before latching argv and argc, remove any runtime override flag.
+  // This allows doing test overrides of Envoy runtime features without adding
+  // test flags to the Envoy production command line.
+  const std::regex PATTERN{"--runtime-feature-override-for-tests=(.*)", std::regex::optimize};


I have a probably stupid question: Is there any reason we couldn't just hit this directly from a test given that we already have static support and also have some work that @jmarantz did to clean things up at the end of tests? I'm just wondering if we could avoid some of the regex/listener/etc. magic here.

Hm, now that I did the const-singleton refactor this is much less tied to environment.cc than it was.

I'd be inclined to forklift all this code over to TestRunner::RunTests - we still need the regex magic and I'd prefer a separate listener just to avoid the set up and teardown checks for the vast majority of tests which don't need them.

I'm going to hold off until we've decided what to do with command line flags - if we want to make it general purpose I'd be inclined to leave it here with the TODO, and the move it into the main server TCLAP in the follow-up.

I think my feeling is to not making it general purpose for now, and just track somewhere in a TODO/issue/etc. that we might want to think about this in the future. So in that case, I guess I would opt for making the test code as simple as possible so maybe do the move you suggested? Though if we do a move, why do we need all the regex stuff? Couldn't we just hit a static method with a string to alter runtime state for a test?

Oh, we can do this statically from test, but in-house when introducing a high risk flag we often just dup a full test (or tests) with no edits, so we have one variant running with true and the other running with the flag false. We've found it pretty handy especially for tests which are already parameterized so hard to muck with.

It's not a lot of code so I'd be inclined to check it in and we can always remove it if no one uses it and it turns out to be overkill. WDYT?

I don't have a strong opinion on this one way or the other, but couldn't we accomplish roughly the same thing with a TEST_P and some type of param helper that just toggles a specific string? I think @dnoe mentioned this? I'm mainly just wondering if the same behavior can be achieved with less code?

We can, but if a test is already a TEST_P it can mean updating all the test params from GetParam() to GetParam(0), adding GetParam(1) for every test fixture in the whole class, creating or editing the constructur etc, then tearing it all back down in a few months. Given many of our files have 3-5 fixtures this can end up being a lot of throwaway work per test you want to dual-dun, which is why we bother with the flag based solution internally.

OK SGTM if you think this is the best option.

Again it's really handy to have, but TDB if folks use it.
I've added a calendar reminder 6 months out to revisit - if no one uses it I'll tear these changes out then.

Any other comments you're waiting on?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Thanks for the clarifications, much easier to understand, I followed up with a few more questions.

/wait

mattklein123 · 2019-03-08T16:51:42Z

CONTRIBUTING.md

+
+```
+bool use_new_code_path_ =
+    Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")


OK yeah let's chat about that offline and maybe figure out in a follow up? I think for now maybe just clarify somehow that this should only be used for truly reloadable features and if there are features that should be startup configured it should be discussed? (Up to you how to phrase)

mattklein123 · 2019-03-08T16:53:19Z

source/common/runtime/runtime_features.cc

+namespace Envoy {
+namespace Runtime {
+
+// Add additional features here to enable the new code paths by default.


Sorry to harp on this, but I think what would help me/readers here is a short description of the behavior of what an entry in runtime_features and disallowed_features do. I know this is described elsewhere, but it's hard enough to keep track of (and important enough) that IMO we should replicate a short version here?

mattklein123 · 2019-03-08T16:55:45Z

test/common/runtime/BUILD

+    name = "runtime_flag_override_test",
+    srcs = ["runtime_flag_override_test.cc"],
+    args = [
+        "--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false",


I do think that Google is an outlier in using flags for this type of stuff, but I was just suggesting that this code might be generally useful to someone, so probably worth tracking somewhere in case it comes up again. I think it's totally fine to leave as test only code for now.

Thanks for all the comments, it's much easier to understand for me now.

mattklein123 · 2019-03-08T16:56:05Z

test/common/runtime/runtime_flag_override_test.cc

+namespace Envoy {
+namespace Runtime {
+
+// Fl not in runtime_features.cc are false by default (and this particular one is verified to be


typo "fl" ?

mattklein123 · 2019-03-08T16:57:55Z

test/test_common/environment.cc

+  // Before latching argv and argc, remove any runtime override flag.
+  // This allows doing test overrides of Envoy runtime features without adding
+  // test flags to the Envoy production command line.
+  const std::regex PATTERN{"--runtime-feature-override-for-tests=(.*)", std::regex::optimize};


I think my feeling is to not making it general purpose for now, and just track somewhere in a TODO/issue/etc. that we might want to think about this in the future. So in that case, I guess I would opt for making the test code as simple as possible so maybe do the move you suggested? Though if we do a move, why do we need all the regex stuff? Couldn't we just hit a static method with a string to alter runtime state for a test?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Thanks this is super awesome, just some typos I found. If you want to fix them in a follow up that's fine.

mattklein123 · 2019-03-12T15:52:12Z

CONTRIBUTING.md

+
+This should only be done if the lifetime of the object in question is relatively short compared to
+the lifetime of most Envoy instances, i.e. latching state on creation of the
+Http::ConnectionManagerImpl or all Network::ConnectionImpl classes, to ensure that the new behavio


typo behavio

mattklein123 · 2019-03-12T15:53:31Z

source/common/runtime/runtime_features.cc

+// by runtime feature guards, i.e
+//
+// if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")) {
+//     [new code path]


nit: indent since there is the other typo change

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

* master: (59 commits) http fault: add response rate limit injection (envoyproxy#6267) xds: introduce initial_fetch_timeout option to limit initialization time (envoyproxy#6048) test: fix cpuset-threads tests (envoyproxy#6278) server: add an API for registering for notifications for server instance life… (envoyproxy#6254) remove remains of TestBase (envoyproxy#6286) dubbo_proxy: Implement the routing of Dubbo requests (envoyproxy#5973) Revert "stats: add new BoolIndicator stat type (envoyproxy#5813)" (envoyproxy#6280) runtime: codifying runtime guarded features (envoyproxy#6134) mysql_filter: fix integration test flakes (envoyproxy#6272) tls: update BoringSSL to debed9a4 (3683). (envoyproxy#6273) rewrite buffer implementation to eliminate evbuffer dependency (envoyproxy#5441) Remove the dependency from TimeSystem to libevent by using the Event::Scheduler abstraction as a delegate. (envoyproxy#6240) fuzz: fix use of literal in default initialization. (envoyproxy#6268) http: add HCM functionality required for rate limiting (envoyproxy#6242) Disable mysql_integration_test until it is deflaked. (envoyproxy#6250) test: use ipv6_only IPv6 addresses in custom cluster integration tests. (envoyproxy#6260) tracing: If parent span is propagated with empty string, it causes th… (envoyproxy#6263) upstream: fix oss-fuzz issue envoyproxy#11095. (envoyproxy#6220) Wire up panic mode subset to receive updates (envoyproxy#6221) docs: clarify xds docs with warming information (envoyproxy#6236) ...

runtime: codifying runtime guarded features

5f368a3

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk assigned lizan Feb 28, 2019

mattklein123 self-assigned this Mar 1, 2019

dnoe reviewed Mar 1, 2019

View reviewed changes

mattklein123 reviewed Mar 1, 2019

View reviewed changes

alyssawilk added 4 commits March 4, 2019 09:16

reviewer comments

0147b76

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

reviewer comments

7f3fc25

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

updating test tools

3ac22ff

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

moving test utils to their own file

33bbf5e

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

lizan reviewed Mar 4, 2019

View reviewed changes

constexpr

ad5d657

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk force-pushed the flags branch from d11f371 to ad5d657 Compare March 5, 2019 18:53

TODO

5b838e8

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

dnoe previously approved these changes Mar 6, 2019

View reviewed changes

Merge branch 'refs/heads/master' into flags

26bdbd6

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk dismissed dnoe’s stale review via 26bdbd6 March 6, 2019 20:08

alyssawilk mentioned this pull request Mar 6, 2019

rewrite buffer implementation to eliminate evbuffer dependency #5441

Merged

mattklein123 requested changes Mar 6, 2019

View reviewed changes

repokitteh-read-only bot added waiting and removed waiting labels Mar 6, 2019

reviewer comments

1cc4738

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk force-pushed the flags branch from 7ee3327 to 1cc4738 Compare March 7, 2019 18:59

mattklein123 requested changes Mar 8, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Mar 8, 2019

reviewer comments

b5cf76c

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

repokitteh-read-only bot removed the waiting label Mar 11, 2019

mattklein123 previously approved these changes Mar 12, 2019

View reviewed changes

tyops

e063208

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk dismissed mattklein123’s stale review via e063208 March 12, 2019 16:09

mattklein123 approved these changes Mar 12, 2019

View reviewed changes

alyssawilk merged commit 23da112 into envoyproxy:master Mar 13, 2019

alyssawilk mentioned this pull request Jul 11, 2019

envoy.reloadable_features.test_feature_true deprecation #7547

Closed

alyssawilk deleted the flags branch July 31, 2019 20:32

junr03 mentioned this pull request Sep 26, 2020

dns: add dns resolver implementation based on Apple APIs #13074

Merged

runtime: codifying runtime guarded features #6134

runtime: codifying runtime guarded features #6134

Conversation

alyssawilk commented Feb 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnoe left a comment

Choose a reason for hiding this comment

alyssawilk commented Mar 6, 2019

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment