Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config switch to turn on/off Quic processing #9679

Merged
merged 67 commits into from
May 15, 2020

Conversation

nezdolik
Copy link
Member

@nezdolik nezdolik commented Jan 14, 2020

Signed-off-by: Kateryna Nezdolii nezdolik@spotify.com

Description:
Extending quic listener configuration with runtime feature flag for enabling/disabling quic processing. Flag controls onReadReady() and onWriteReady() callback methods and is enabled by default.
Risk Level: Low/Medium
Testing: Unit tests+Integration tests.
Docs Changes: NA?
Release Notes:
Fixes ##9230

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #9679 was opened by nezdolik.

see: more, trace.

@@ -30,6 +30,17 @@ message UdpListenerConfig {

google.protobuf.Any typed_config = 3;
}

enum UdpShutDownMode {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @alyssawilk @danzh2010 @mattklein123 this is initial draft for config layout.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Some part of me wants to encourage one big enum (since when we disable QUIC we make the shutdown mode call based on the reason we're shutting down) but honestly this is more clear and will hopefully avoid any decision paralysis when dealing with production fires. And folks can still push a different shut down mode with their disable if their default doesn't match their disasater :-)

@mattklein123 mattklein123 self-assigned this Jan 16, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Flushing out a few comments.

/wait

api/envoy/api/v2/listener/udp_listener_config.proto Outdated Show resolved Hide resolved
Comment on lines 41 to 43
UdpShutDownMode shutdown_mode = 4 [(validate.rules).enum = {defined_only: true}];

bool udp_enabled = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm a little confused about what is being proposed here. Do you mind adding some more comments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 @htuch for turning off/on udp/quick processing, my proposal is to have boolean switch on udp listener config + shutdown mode field (for shutdown type). Submitted this draft config to get community agreement on the approach, code will follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just remove the listener and drain as normal? Trying to grok what is distinct here vs. any other listener that comes to life at some point, starts accepting, and then is removed and drained.

Copy link
Member Author

@nezdolik nezdolik Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch thanks for suggestion, i did not work much with LDS, so missed that opportunity. Agree that this is cleaner way, however currently listener manager supports only draining (graceful shutdown) of active listeners:
https://github.com/envoyproxy/envoy/blob/master/source/server/listener_manager_impl.cc#L668
In case we go with LDS approach, it will require code change to support multiple draining modes, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends how we structure shared listeners.
I thought if we had QUIC and QUIC-proxy listening on UDP/443 they'd share a listener, and some higher level filter would determine which packets went where. In that case as said it'd be nice to able to disable them one at a time. We can deal without but it's a nice to have.
If we think we'd have a QUIC/443 listener and a QUIC-proxy/443 listener just sharing incoming traffic via some combination of SO_REUSEPORT and BPF this would be fine.
I'd assumed we were going to do more of the former, at which point a non-listener kill switch has value IMO (though again we can live without it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My actual assumption (potentially naive) was that QUIC and QUIC-proxy would probably be different deployments of the binary in different auto scaling groups, etc. What is the use case in which you envision the same process doing both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic example is for edge proxies with low-value certs that do transparent proxying for latency gains. QUIC-terminate locally served content, UDP proxy where you'd TCP proxy because DNS SRV doesn't work.
We have two different frontline deployments which do TLS / TCP proxy based on VIP and one of them did QUIC+Udp-proxy for years.
We haven't had an issue where we needed to turn off one and not the other yet, but as tunneling over QUIC picks up steam (and that team has agreed to OSS once we land core QUIC) I think it gets more useful to turn different modes down separately.
We can always start simple and add more later though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always start simple and add more later though!

That would be my feeling. Right now we don't have any filter chain concept for UDP (I would like to fix that as part of a general listener API cleanup), and without that I'm not sure how we would do something like ^ today. IMO right now I think we could just have a per-listener runtime kill switch for UDP listeners? Or even just for the QUIC listener config itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback, it's valuable information to get familiar with. So we agreed to introduce separate config switches for upd and quic listeners.

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@@ -24,7 +24,9 @@ ProtobufTypes::MessagePtr ActiveRawUdpListenerConfigFactory::createEmptyConfigPr

Network::ActiveUdpListenerFactoryPtr
ActiveRawUdpListenerConfigFactory::createActiveUdpListenerFactory(
const Protobuf::Message& /*message*/) {
const Protobuf::Message& message) {
auto& config =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now this cast will not work, need to reorg udp listener proto config to have one top level message

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Flushing out a few more comments.

/wait

@@ -31,3 +32,20 @@ message QuicProtocolOptions {
// 20000ms if not specified.
google.protobuf.Duration crypto_handshake_timeout = 3;
}

message ActiveQuicListenerConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just name this QuicListenerConfig. I think we can drop the Active (having it in the other place is a mistake IMO).

In terms of the shutoff, what I'm proposing is just having a RuntimeFeatureFlag which says whether to process packets or not. I think this is ultimately the kill switch we are looking for? @alyssawilk @danzh2010 WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think it's not the switch I'm looking for in the long term (because not granular enough) but I'm fine landing whatever controls we think are useful today and we can add the finicky ones once we're running in prod and care more :-)

@@ -33,4 +34,17 @@ message UdpListenerConfig {
}

message ActiveRawUdpListenerConfig {
enum UdpShutDownMode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely positive we want to bother with this kill switch for raw UDP right now. Maybe defer that for later? If we do it here, we should de-dup the messages and have a common configuration message of some type.

Kateryna Nezdolii added 3 commits January 31, 2020 11:11
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@stale
Copy link

stale bot commented Feb 3, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 3, 2020
Kateryna Nezdolii added 3 commits February 3, 2020 17:07
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 4, 2020
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Member Author

nezdolik commented Feb 4, 2020

Spent most of the time on task, which may be out of scope for this change, basically trying to propagate Envoy primitives (like Runtime) all the way down from server instance->worker->connection handler->active quic listener factory->active quic listener, which introduced cyclic dependency in bazel modules. As current code is not production ready (and ActiveQuicListenerFactory not being instantiated from conn_handler), change may be fine as it is (adding Runtime field to ActiveQuicListenerFactory), but in the long run code needs to be refactored, possibly introducing UdpListenerFactoryContext that would provide to resources like Dispatcher, Runtime::Loader etc.
Still need to complete tests.
Would appreciate opinions if refactoring change should be part of this task or not.

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@mattklein123
Copy link
Member

Check out Runtime::LoaderSingleton::getExisting() which will probably make this task easier. This is one case where we break our rule about no statics.

Kateryna Nezdolii added 2 commits May 6, 2020 23:39
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
listen_socket_ =
std::make_shared<Network::UdpListenSocket>(local_address_, nullptr, /*bind*/ true);
listen_socket_->addOptions(Network::SocketOptionFactory::buildIpPacketInfoOptions());
listen_socket_->addOptions(Network::SocketOptionFactory::buildRxQueueOverFlowOptions());

quic_listener_ = std::make_unique<ActiveQuicListener>(
*dispatcher_, connection_handler_, listen_socket_, listener_config_, quic_config_, nullptr);
EXPECT_CALL(listener_config_, listenSocketFactory())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed to wire up mocks, that are used inside listener factory when creating quic listener

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this type of thing it's better to use ON_CALL(...).WillByDefault()

Kateryna Nezdolii added 10 commits May 7, 2020 00:06
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Member Author

nezdolik commented May 8, 2020

There is failing coverage: https://343614-65214191-gh.circle-artifacts.com/0/coverage/index.html but none if it is related to this PR. Going to merge master and see if it helps

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than small nits. @danzh2010 can you take a final pass? Thank you!

/wait

@@ -25,12 +27,14 @@ class ActiveQuicListener : public Network::UdpListenerCallbacks,

ActiveQuicListener(Event::Dispatcher& dispatcher, Network::ConnectionHandler& parent,
Network::ListenerConfig& listener_config, const quic::QuicConfig& quic_config,
Network::Socket::OptionsSharedPtr options);
Network::Socket::OptionsSharedPtr options,
const envoy::config::core::v3::RuntimeFeatureFlag enabled);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pass by const ref here and below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

listen_socket_ =
std::make_shared<Network::UdpListenSocket>(local_address_, nullptr, /*bind*/ true);
listen_socket_->addOptions(Network::SocketOptionFactory::buildIpPacketInfoOptions());
listen_socket_->addOptions(Network::SocketOptionFactory::buildRxQueueOverFlowOptions());

quic_listener_ = std::make_unique<ActiveQuicListener>(
*dispatcher_, connection_handler_, listen_socket_, listener_config_, quic_config_, nullptr);
EXPECT_CALL(listener_config_, listenSocketFactory())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this type of thing it's better to use ON_CALL(...).WillByDefault()

danzh2010
danzh2010 previously approved these changes May 13, 2020
Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a few nits in test.

@@ -112,7 +118,7 @@ void ActiveQuicListener::shutdownListener() {

ActiveQuicListenerFactory::ActiveQuicListenerFactory(
const envoy::config::listener::v3::QuicProtocolOptions& config, uint32_t concurrency)
: concurrency_(concurrency) {
: concurrency_(concurrency), enabled_(config.enabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

@@ -270,5 +348,55 @@ TEST_P(ActiveQuicListenerTest, ProcessBufferedChlos) {
ReadFromClientSockets();
}

TEST_P(ActiveQuicListenerTest, QuicProcessingDisabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is already tested in QuicProcessingDisabledAndEnabled, right? Maybe remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Kateryna Nezdolii added 2 commits May 14, 2020 18:02
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Member Author

FAILED_PRECONDITION: there are no bots capable of executing.... in ci, need to wait for fix...

Kateryna Nezdolii added 2 commits May 15, 2020 10:26
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Member Author

@danzh2010 @mattklein123 applied review comments.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 853ecaa into envoyproxy:master May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants