-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: codifying runtime guarded features #6134
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
CONTRIBUTING.md
Outdated
guarded due to performance concerns) and a full deprecation cycle (if it is a high risk behavioral | ||
change). | ||
|
||
The canonican way to runtime guard a feature is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless canonicans are some sci-fi aliens that always speak canonically, I assume this is typo :)
CONTRIBUTING.md
Outdated
``` | ||
bool use_new_code_path_ = Runtime::LoaderSingleton::getExisting() && | ||
Runtime::LoaderSingleton::getExisting()->snapshot()->runtimeFeatureEnabled( | ||
"envoy.reloadable_features.my_feature_name")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth trying to make this a bit less verbose at the sites where it is used? There's a lot of boilerplate to get to runtimeFeatureEnabled
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we could likely have a wrapper that internally loads the singleton and does the right thing if it doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely the right way to go.
Less clear, should we use this in tests? I was a bit bummed at the heavyweight work in TestEnvironment to be able to flag override - setting up all those fake components. With the helper if we want to have a friend test-only class in ConstSingleton we can skip all the static state and just muck with the const singleton directly. I think that might also work for integration tests, which would be pretty awesome.
That said, it's making our const-singleton non-const and I know how much the larger Envoy community loves hacks like that ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about an approach like #6139 with some test class mixin that changes the singleton behavior? That seems like a really cleanly structured way to have a "mostly const" singleton that alters behavior only within a particular test.
2. Set up integration tests with custom runtime defaults as documented in the | ||
[integration test README](https://github.com/envoyproxy/envoy/blob/master/test/integration/README.md) | ||
3. Run a given unit test with the new runtime value explicitly set true as done | ||
for [runtime_flag_override_test](https://github.com/envoyproxy/envoy/blob/master/test/common/runtime/BUILD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also worth specifically calling out parameterized tests as a useful tool here? I find that particularly useful when the logical behavior is supposed to be the same regardless of flag value (ie, goal is pure refactoring or performance improvement).
// In the envoy_cc_test declaration, the flag is set | ||
// "--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false" | ||
TEST(RuntimeFlagOverrideTest, OverridesWork) { | ||
Snapshot& snapshot = Runtime::LoaderSingleton::getExisting()->snapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const Snapshot&
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, just took a doc pass will look again once other reviewers are finished.
CONTRIBUTING.md
Outdated
``` | ||
bool use_new_code_path_ = Runtime::LoaderSingleton::getExisting() && | ||
Runtime::LoaderSingleton::getExisting()->snapshot()->runtimeFeatureEnabled( | ||
"envoy.reloadable_features.my_feature_name")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we could likely have a wrapper that internally loads the singleton and does the right thing if it doesn't exist.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
// Runtime features are used to easily allow switching between old and new code paths for high | ||
// risk changes. The intent is for the old code path to be short lived - the old code path is | ||
// deprecated as the feature is defaulted true, and removed with the following Envoy release. | ||
virtual bool runtimeFeatureEnabled(const std::string& key) const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use absl::string_view
? It will avoid constructing std::string when caller call this with literals. but might require underlying methods take string_view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written I think someone could
static std::string feature = "envoy.reloadable_features.my_feature_name"));
if (Runtime::runtimeFeatureEnabled(feature) { ...} and avoid the string conversion on hash map lookup each time.
If they do the inefficient
if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")
they're creating a string every time.
If I switch to string_view I think the map lookup will convert the string piece to the string on every single call, and there's no way to avoid it, so I think that's strictly less performant?
I could update the instructions to encourage static string on the canonical non-latch path if we think that makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it is an absl::flat_hash_{map,set}
I think heterogenous lookup means that there is no string construction in the map lookup, even though the type in the map is std::string
: https://abseil.io/tips/144
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL! Ok, got that to work for the flat hash, but unfortunately to string_view all the way up the API we need to also change Runtime::EntryMap in include/envoy/runtime/runtime.h to have a comparator. I'd prefer to land this as-is and do a separate PR converting to string piece, especially as none of the call sites will need to change. What do you think of a TODO and a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fine to leave as TODO. static std::string feature = "envoy.reloadable_features.my_feature_name";
is a static non-POD object initialization so there might be static initialization fiasco. string_view constructor is safe as it is constexpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine to me.
// These features should not be overridden with run-time guards without a bug | ||
// being filed on github as once high risk features are true by default, the | ||
// old code path will be removed with the next release. | ||
const char* runtime_features[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@mattklein123 I think this is ready for your pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is super awesome. Generally looks good but a few questions/comments.
/wait
|
||
``` | ||
bool use_new_code_path_ = | ||
Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a different namespace for this type of thing? envoy.startup_features.
or similar? It's a little confusing that the docs point out that everything named like this is reloadable at runtime and then we talk about how it's not always true. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to remove the buffer mentions on this one. My thought for latching is if we latched at the hcm, or at a network::connection or some such, where the lifetime were transient. As called out on the buffer PR that's truly restart-only not reloadable.
I do want to support envoy.restart_features but I also need to talk to you about how to do that. In my priors, restart flags were done on the command line (only when the binary was brought up) and it was illegal to change then when pushing config (would cause that config to get rejected). Otherwise you do a config push with a new-but-buggy restart flag, the config "passes", you hot restart, and your fleet is down. Only with Envoy and hot restarts I'm less clear on how we push config which is restart only. My default plan would be to that test-only runtime tweaking flag to be a actual production runtime tweaking flag, so folks can change restart values at start-up and we can hard-fail attempts to reload them at runtime, but I recall you saying that most people don't do flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah let's chat about that offline and maybe figure out in a follow up? I think for now maybe just clarify somehow that this should only be used for truly reloadable features and if there are features that should be startup configured it should be discussed? (Up to you how to phrase)
// Enabled | ||
"envoy.reloadable_features.test_feature_true", | ||
// Disabled | ||
// "envoy.reloadable_features.test_feature_false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we mean for this to be commented out here? If this is a documented way of defaulting to disabled, can you add more comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking at some point it'd be nice to have all flags documented in that file, so if folks want to preemptively flip they can do so.
I'll remove it and wait until we have utils and see if it still makes sense
bool SnapshotImpl::runtimeFeatureEnabled(const std::string& key) const { | ||
bool enabled = false; | ||
// See if this value is explicitly set as a runtime boolean. | ||
bool stored = getBoolean(key, enabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can just be inlined in the if statement below?
|
||
// Add additional features here to enable the new code paths by default. | ||
// | ||
// These features should not be overridden with run-time guards without a bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are talking to end users here, right? Can you clarify slightly and say something like, "If issues are found that that require a feature to be disabled, it should be reported upstream ASAP, ..."
name = "runtime_flag_override_test", | ||
srcs = ["runtime_flag_override_test.cc"], | ||
args = [ | ||
"--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this seems like a generally useful feature for runtime configuration where people don't want to use the filesystem (CLI/gflags implementation like we discussed). Is it worth adding a TODO or opening a help wanted issue on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if possible can you rename envoy.reloadable_features.test_feature_false
to something else? I found it pretty confusing to try to sort out the logic with the built in envoy.reloadable_features.test_feature_true
. Maybe more comments would help also or just a different name like envoy.reloadable_features.injected_test_feature
and envoy.reloadable_features.built_in_test_feature
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think other folks would use this? I thought from prior discussions you thought google was flag happy but most folks wanted to use the existing runtime.
I think if we're going to use it out of test we have to allow for feature=true,feature2=false which I can do pretty easily.
Not sure if removing test_feature_false helps - I've clarified the language in both tests to hopefully clarify things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that Google is an outlier in using flags for this type of stuff, but I was just suggesting that this code might be generally useful to someone, so probably worth tracking somewhere in case it comes up again. I think it's totally fine to leave as test only code for now.
Thanks for all the comments, it's much easier to understand for me now.
test/test_common/environment.cc
Outdated
// As each test ends, clean up the RuntimeFeaturesDefaults state. | ||
void OnTestEnd(const ::testing::TestInfo&) override { | ||
if (!runtime_override_.empty()) { | ||
Runtime::RuntimeFeaturesPeer::addFeature(runtime_override_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to remove here?
test/test_common/environment.cc
Outdated
// Before latching argv and argc, remove any runtime override flag. | ||
// This allows doing test overrides of Envoy runtime features without adding | ||
// test flags to the Envoy production command line. | ||
const std::regex PATTERN{"--runtime-feature-override-for-tests=(.*)", std::regex::optimize}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a probably stupid question: Is there any reason we couldn't just hit this directly from a test given that we already have static support and also have some work that @jmarantz did to clean things up at the end of tests? I'm just wondering if we could avoid some of the regex/listener/etc. magic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, now that I did the const-singleton refactor this is much less tied to environment.cc than it was.
I'd be inclined to forklift all this code over to TestRunner::RunTests - we still need the regex magic and I'd prefer a separate listener just to avoid the set up and teardown checks for the vast majority of tests which don't need them.
I'm going to hold off until we've decided what to do with command line flags - if we want to make it general purpose I'd be inclined to leave it here with the TODO, and the move it into the main server TCLAP in the follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my feeling is to not making it general purpose for now, and just track somewhere in a TODO/issue/etc. that we might want to think about this in the future. So in that case, I guess I would opt for making the test code as simple as possible so maybe do the move you suggested? Though if we do a move, why do we need all the regex stuff? Couldn't we just hit a static method with a string to alter runtime state for a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we can do this statically from test, but in-house when introducing a high risk flag we often just dup a full test (or tests) with no edits, so we have one variant running with true and the other running with the flag false. We've found it pretty handy especially for tests which are already parameterized so hard to muck with.
It's not a lot of code so I'd be inclined to check it in and we can always remove it if no one uses it and it turns out to be overkill. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this one way or the other, but couldn't we accomplish roughly the same thing with a TEST_P and some type of param helper that just toggles a specific string? I think @dnoe mentioned this? I'm mainly just wondering if the same behavior can be achieved with less code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but if a test is already a TEST_P it can mean updating all the test params from GetParam() to GetParam(0), adding GetParam(1) for every test fixture in the whole class, creating or editing the constructur etc, then tearing it all back down in a few months. Given many of our files have 3-5 fixtures this can end up being a lot of throwaway work per test you want to dual-dun, which is why we bother with the flag based solution internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK SGTM if you think this is the best option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again it's really handy to have, but TDB if folks use it.
I've added a calendar reminder 6 months out to revisit - if no one uses it I'll tear these changes out then.
Any other comments you're waiting on?
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications, much easier to understand, I followed up with a few more questions.
/wait
|
||
``` | ||
bool use_new_code_path_ = | ||
Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah let's chat about that offline and maybe figure out in a follow up? I think for now maybe just clarify somehow that this should only be used for truly reloadable features and if there are features that should be startup configured it should be discussed? (Up to you how to phrase)
namespace Envoy { | ||
namespace Runtime { | ||
|
||
// Add additional features here to enable the new code paths by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to harp on this, but I think what would help me/readers here is a short description of the behavior of what an entry in runtime_features and disallowed_features do. I know this is described elsewhere, but it's hard enough to keep track of (and important enough) that IMO we should replicate a short version here?
name = "runtime_flag_override_test", | ||
srcs = ["runtime_flag_override_test.cc"], | ||
args = [ | ||
"--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that Google is an outlier in using flags for this type of stuff, but I was just suggesting that this code might be generally useful to someone, so probably worth tracking somewhere in case it comes up again. I think it's totally fine to leave as test only code for now.
Thanks for all the comments, it's much easier to understand for me now.
namespace Envoy { | ||
namespace Runtime { | ||
|
||
// Fl not in runtime_features.cc are false by default (and this particular one is verified to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "fl" ?
test/test_common/environment.cc
Outdated
// Before latching argv and argc, remove any runtime override flag. | ||
// This allows doing test overrides of Envoy runtime features without adding | ||
// test flags to the Envoy production command line. | ||
const std::regex PATTERN{"--runtime-feature-override-for-tests=(.*)", std::regex::optimize}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my feeling is to not making it general purpose for now, and just track somewhere in a TODO/issue/etc. that we might want to think about this in the future. So in that case, I guess I would opt for making the test code as simple as possible so maybe do the move you suggested? Though if we do a move, why do we need all the regex stuff? Couldn't we just hit a static method with a string to alter runtime state for a test?
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is super awesome, just some typos I found. If you want to fix them in a follow up that's fine.
CONTRIBUTING.md
Outdated
|
||
This should only be done if the lifetime of the object in question is relatively short compared to | ||
the lifetime of most Envoy instances, i.e. latching state on creation of the | ||
Http::ConnectionManagerImpl or all Network::ConnectionImpl classes, to ensure that the new behavio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo behavio
// by runtime feature guards, i.e | ||
// | ||
// if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")) { | ||
// [new code path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent since there is the other typo change
* master: (59 commits) http fault: add response rate limit injection (envoyproxy#6267) xds: introduce initial_fetch_timeout option to limit initialization time (envoyproxy#6048) test: fix cpuset-threads tests (envoyproxy#6278) server: add an API for registering for notifications for server instance life… (envoyproxy#6254) remove remains of TestBase (envoyproxy#6286) dubbo_proxy: Implement the routing of Dubbo requests (envoyproxy#5973) Revert "stats: add new BoolIndicator stat type (envoyproxy#5813)" (envoyproxy#6280) runtime: codifying runtime guarded features (envoyproxy#6134) mysql_filter: fix integration test flakes (envoyproxy#6272) tls: update BoringSSL to debed9a4 (3683). (envoyproxy#6273) rewrite buffer implementation to eliminate evbuffer dependency (envoyproxy#5441) Remove the dependency from TimeSystem to libevent by using the Event::Scheduler abstraction as a delegate. (envoyproxy#6240) fuzz: fix use of literal in default initialization. (envoyproxy#6268) http: add HCM functionality required for rate limiting (envoyproxy#6242) Disable mysql_integration_test until it is deflaked. (envoyproxy#6250) test: use ipv6_only IPv6 addresses in custom cluster integration tests. (envoyproxy#6260) tracing: If parent span is propagated with empty string, it causes th… (envoyproxy#6263) upstream: fix oss-fuzz issue envoyproxy#11095. (envoyproxy#6220) Wire up panic mode subset to receive updates (envoyproxy#6221) docs: clarify xds docs with warming information (envoyproxy#6236) ...
Documenting best practices and adding some tooling for guarding high risk features
Risk Level: Low (mostly new code)
Testing: Added new unit tests
Docs Changes: updated CONTRIBUTING.md
Release Notes: n/a
Part of #5693 (TODO: handling deprecating and adding tooling scripts)