Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-demand VHDS implementation #8617

Merged
merged 73 commits into from
Jan 11, 2020
Merged

On-demand VHDS implementation #8617

merged 73 commits into from
Jan 11, 2020

Conversation

dmitri-d
Copy link
Contributor

Signed-off-by: Dmitri Dolguikh ddolguik@redhat.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: Implements on-demand resolution of VirtualHosts via VHDS
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]

@dmitri-d
Copy link
Contributor Author

I decided to open a new PR instead of re-opening the original one (can be found here: https://github.com/envoyproxy/envoy/pull/6552/files).

Ping @htuch, @fredlas.

@mattklein123
Copy link
Member

Drive by: please merge #7415 into this PR so we make sure we have docs for this feature. Thank you!

Copy link
Contributor

@fredlas fredlas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly reviewing the parts that touch code I'm familiar with. It doesn't look like #8478 should dramatically change anything out from under you, so that's good.

source/common/http/conn_manager_impl.h Outdated Show resolved Hide resolved
source/common/config/delta_subscription_state.cc Outdated Show resolved Hide resolved
source/common/config/delta_subscription_state.cc Outdated Show resolved Hide resolved
@dmitri-d
Copy link
Contributor Author

  • rebased
  • fixed several formatting issues
  • responded to feedback

@dmitri-d
Copy link
Contributor Author

  • rebased

@dmitri-d dmitri-d closed this Oct 17, 2019
@dmitri-d dmitri-d reopened this Oct 17, 2019
Copy link
Contributor

@fredlas fredlas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review because github has started not accepting my comments? Let's see what's going on with that...

include/envoy/http/filter.h Outdated Show resolved Hide resolved
include/envoy/router/rds.h Outdated Show resolved Hide resolved
include/envoy/http/filter.h Outdated Show resolved Hide resolved
source/common/config/delta_subscription_impl.cc Outdated Show resolved Hide resolved
source/common/config/delta_subscription_state.h Outdated Show resolved Hide resolved
source/common/router/rds_impl.cc Outdated Show resolved Hide resolved
// response has been propagated to the worker thread that was the request origin.
bool RdsRouteConfigProviderImpl::requestVirtualHostsUpdate(const std::string& for_domain,
std::function<void()> cb) {
if (!config()->usesVhds()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if seems a little suspect. It seems like it is checking for a not-really-error but not-really-right condition. The sort of thing where things can muddle along mostly working, and it's hard to track down what's going wrong. Going by the names of the functions involved, I think it might be more safe and solid to, rather than doing this if in here, instead guard all uses of requestVirtualHostsUpdate with it. I think that would also let you drop the bool return type. (If all of that is actually possible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the check for whether requestVirtualHostsUpdate is available to the very top (see extensions/filters/http/on_demand/on_demand_update.

source/common/router/rds_impl.h Outdated Show resolved Hide resolved
source/common/router/route_config_update_receiver_impl.cc Outdated Show resolved Hide resolved
source/common/router/route_config_update_receiver_impl.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@fredlas fredlas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now I am done with this round of review.

source/common/router/route_config_update_receiver_impl.cc Outdated Show resolved Hide resolved
source/common/router/route_config_update_receiver_impl.cc Outdated Show resolved Hide resolved
test/common/router/vhds_test.cc Outdated Show resolved Hide resolved
test/extensions/filters/http/router/config_test.cc Outdated Show resolved Hide resolved
@dmitri-d
Copy link
Contributor Author

  • rebased

@mattklein123 mattklein123 mentioned this pull request Oct 19, 2019
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually reviewed this change but I'm marking this requested changes to make sure we get VHDS fully documented for whatever exists today in the code base. This should include:

  1. Merging Vhds readme #7415 into this PR as proper RST docs with arch overview, etc.
  2. Getting VHDS up to parity with the fixes that I did in docs: misc doc debt #8678.

Thank you!

@fredlas
Copy link
Contributor

fredlas commented Oct 21, 2019

LGTM

@dmitri-d
Copy link
Contributor Author

  • rebased

@dmitri-d
Copy link
Contributor Author

@lambdai: the PR I was talking about during the last xDS call, could you take a look when you get a chance?

@dmitri-d
Copy link
Contributor Author

  • fixed formatting.

@htuch htuch self-assigned this Oct 22, 2019
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dmitri-d. This is epic. It's a pretty big PR, so I've started my review with a few comments. I think that this is going to need a fair bit of review to get through. Also, as a general principle, PRs to the Envoy code base should either be large and mechanical or small and complicated (or even simple :) ). Please consider what can be done to simplify or reduce the amount of change in this PR.

My main initial set of questions are around the additional plumbing added for aliases. I find it somewhat surprising, I had envisaged aliases being far simpler to handle.

Tagging @lambdai and @stevenzzzz for implications of the update callbacks to the ownership and lifecycle of route config provider resources.
/wait

CODEOWNERS Outdated Show resolved Hide resolved
source/common/http/conn_manager_impl.h Outdated Show resolved Hide resolved
source/common/config/delta_subscription_state.h Outdated Show resolved Hide resolved
include/envoy/http/filter.h Outdated Show resolved Hide resolved
include/envoy/http/filter.h Outdated Show resolved Hide resolved
source/common/config/delta_subscription_state.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the design of putting aliasResolution into mux and subscription.
Since it is not implemented yet: if you can extract it to anther PR, we can focus on the rds and ondemand vhds.

@htuch WDYT?

include/envoy/config/subscription.h Outdated Show resolved Hide resolved
source/common/config/delta_subscription_state.cc Outdated Show resolved Hide resolved
@@ -37,6 +37,8 @@ class NewGrpcMuxImpl : public GrpcMux,
void pause(const std::string& type_url) override;
void resume(const std::string& type_url) override;
bool paused(const std::string& type_url) const override;
void requestAliasesResolution(const std::string& type_url,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to promote alias req/resp to first class in mux and subscription?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the method (pls. see the conversation above re: delta_subscription_state's requested_aliases_ field).

Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to digest this PR with design doc since ondemand involves close interaction with worker threads and main thread. Is the design doc named "on demand RDS"? I need to make sure I am reading the correct doc. Thanks!

class RouteConfigUpdateRequester {
public:
virtual ~RouteConfigUpdateRequester() = default;
virtual void requestRouteConfigUpdate(const HeaderString&, std::function<void()>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think requestRouteConfigUpdate() will be PURE when the PR is merged

@@ -211,6 +222,37 @@ void RdsRouteConfigProviderImpl::onConfigUpdate() {
prev_config->config_ = new_config;
return previous;
});

const auto aliases = config_update_info_->aliasesInLastVhdsUpdate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are way too many copies of aliases. IMHO this one should be shared_ptr and the below runOnAllThreads will capture the shared_ptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a counter-point: I don't think it's ever going to be large number of aliases in the set at one time (I expect a couple at most). Only two copies are being made: one from the original protobuf, the other one is put in the lambda context. Do you think shared_ptr is still worth it considering small size of aliases being copied and the overhead of accessing a shared_ptr?

source/common/router/rds_impl.cc Outdated Show resolved Hide resolved
source/common/router/rds_impl.cc Outdated Show resolved Hide resolved
source/common/router/rds_impl.cc Outdated Show resolved Hide resolved
source/common/http/conn_manager_impl.cc Outdated Show resolved Hide resolved
source/common/http/conn_manager_impl.cc Outdated Show resolved Hide resolved
@@ -164,6 +169,15 @@ class RdsRouteConfigSubscription : Envoy::Config::SubscriptionCallbacks,

using RdsRouteConfigSubscriptionSharedPtr = std::shared_ptr<RdsRouteConfigSubscription>;

struct UpdateOnDemandCallback {
const std::set<std::string> aliases_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The const give me the hint that the set should be a shared_ptr among all threads

source/common/router/rds_impl.h Outdated Show resolved Hide resolved
source/common/router/route_config_update_receiver_impl.cc Outdated Show resolved Hide resolved
@stevenzzzz
Copy link
Contributor

@dmitri-d could you pls link the issue or the design doc ?

Copy link
Contributor

@stevenzzzz stevenzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if we can split this PR into a control plane side PR + a data-plane side PR.
Please link a design doc or add some more doc about how on-demand rds works.
In my previous understanding, on-demand xDS is a xDS-horizontal framework change.

include/envoy/config/grpc_mux.h Outdated Show resolved Hide resolved
include/envoy/router/rds.h Outdated Show resolved Hide resolved
source/common/router/rds_impl.cc Outdated Show resolved Hide resolved
source/common/router/rds_impl.cc Outdated Show resolved Hide resolved
}

Http::FilterHeadersStatus OnDemandRouteUpdate::decodeHeaders(Http::HeaderMap&, bool) {
requestRouteConfigUpdate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also check if the host is already in the route-table and continue from there. IIUC If RDS request doesn't return, this essentially hangs processing of every requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source/common/http/conn_manager_impl.h Outdated Show resolved Hide resolved
source/common/http/conn_manager_impl.cc Outdated Show resolved Hide resolved
aliases.end(), std::back_inserter(aliases_not_in_update));
if (aliases_not_in_update.empty()) {
callbacks.pop();
update_on_demand_callback.cb_();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some life-circle concern here, a stream may not live as long as a subscription, in which case this callback ends in unknown behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the gRPC stream carrying the subscription? If so, it is a safe assumption; the subscription is a shared owner of a GrpcMux, which is the owner of the GrpcStream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I mean the per HTTP request ActiveStream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pause the filter chain until a response arrives and the callback resumes its execution. Wouldn't that mean that the ActiveConnection sticks around too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am saying a subscription may outlive an activeConnection. For example, with lambdai@'s recent change, a RDS subscription will outlive a listener. when that happens, the update_on_demand_callback.cb_() end in undefined behavior.

Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush my thoughts

include/envoy/config/subscription.h Outdated Show resolved Hide resolved
source/common/http/conn_manager_impl.h Outdated Show resolved Hide resolved
ENVOY_LOG(
debug,
"rds: vhds configuration present/changed, (re)starting vhds: config_name={} hash={}",
route_config_name_, config_update_info_->configHash());
maybeCreateInitManager(version_info, noop_init_manager, resume_rds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reminder of #9254

route_config_proto_ = rc;
last_config_hash_ = new_hash;
const uint64_t new_vhds_config_hash = rc.has_vhds() ? MessageUtil::hash(rc.vhds()) : 0ul;
vhds_configuration_changed_ = new_vhds_config_hash != last_vhds_config_hash_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other last_ series members. These members represents the ongoing update but is used by dump function. I am able to reasonable about the life time now

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jan 9, 2020

  • added WatchMap::convertAliasWatchesToNameWatches tests

Dmitri Dolguikh added 3 commits January 9, 2020 10:53
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
…started

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jan 9, 2020

ping @lambdai: fixed RouteConfigUpdatedCallback lifecycle issues that you pointed out

@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jan 9, 2020

  • merged in latest changes

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
source/common/router/route_config_update_receiver_impl.cc Outdated Show resolved Hide resolved
void RouteConfigUpdateReceiverImpl::collectResourceIdsInUpdate(
const Protobuf::RepeatedPtrField<envoy::service::discovery::v3alpha::Resource>&
added_resources) {
resource_ids_in_last_update_.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how actionable this is, but not a huge fan of building these mutable stateful objects, would have preferred to just deliver a LastUpdateInfo object along with the update to the receivers and then discarded. But, it's probably not a deal breaker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/common/config/new_grpc_mux_impl_test.cc Outdated Show resolved Hide resolved
@htuch
Copy link
Member

htuch commented Jan 9, 2020

LGTM modulo nits (and I'd like to verify the coverage report, I can't see it for some reason on CircleCI). Thanks for the fantastic work @dmitri-d!

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jan 9, 2020

lambdai
lambdai previously approved these changes Jan 9, 2020
Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DS code path LGTM
I am not very familiar with the http filter operation.
Throw the ball to @htuch @stevenzzzz

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jan 9, 2020

  • fixed weird formatting in source/common/http/conn_manager_impl.h

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@dmitri-d
Copy link
Contributor Author

  • added on_demand filter tests

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, this is really a huge contribution to xDS, our first on-demand API and a big enabler of scalability work. Thanks for all the work on review iterations as well.

@htuch htuch dismissed mattklein123’s stale review January 11, 2020 23:32

Docs seem solid now.

@htuch htuch merged commit 8e2d909 into envoyproxy:master Jan 11, 2020
@dmitri-d
Copy link
Contributor Author

Thank you everyone for reviews and feedback!

@dmitri-d dmitri-d deleted the vhds-on-demand-restarted branch January 12, 2020 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants