-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On-demand VHDS implementation #8617
Conversation
I decided to open a new PR instead of re-opening the original one (can be found here: https://github.com/envoyproxy/envoy/pull/6552/files). |
Drive by: please merge #7415 into this PR so we make sure we have docs for this feature. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly reviewing the parts that touch code I'm familiar with. It doesn't look like #8478 should dramatically change anything out from under you, so that's good.
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review because github has started not accepting my comments? Let's see what's going on with that...
source/common/router/rds_impl.cc
Outdated
// response has been propagated to the worker thread that was the request origin. | ||
bool RdsRouteConfigProviderImpl::requestVirtualHostsUpdate(const std::string& for_domain, | ||
std::function<void()> cb) { | ||
if (!config()->usesVhds()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
seems a little suspect. It seems like it is checking for a not-really-error but not-really-right condition. The sort of thing where things can muddle along mostly working, and it's hard to track down what's going wrong. Going by the names of the functions involved, I think it might be more safe and solid to, rather than doing this if
in here, instead guard all uses of requestVirtualHostsUpdate
with it. I think that would also let you drop the bool
return type. (If all of that is actually possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled the check for whether requestVirtualHostsUpdate is available to the very top (see extensions/filters/http/on_demand/on_demand_update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now I am done with this round of review.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't actually reviewed this change but I'm marking this requested changes to make sure we get VHDS fully documented for whatever exists today in the code base. This should include:
- Merging Vhds readme #7415 into this PR as proper RST docs with arch overview, etc.
- Getting VHDS up to parity with the fixes that I did in docs: misc doc debt #8678.
Thank you!
LGTM |
|
@lambdai: the PR I was talking about during the last xDS call, could you take a look when you get a chance? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dmitri-d. This is epic. It's a pretty big PR, so I've started my review with a few comments. I think that this is going to need a fair bit of review to get through. Also, as a general principle, PRs to the Envoy code base should either be large and mechanical or small and complicated (or even simple :) ). Please consider what can be done to simplify or reduce the amount of change in this PR.
My main initial set of questions are around the additional plumbing added for aliases. I find it somewhat surprising, I had envisaged aliases being far simpler to handle.
Tagging @lambdai and @stevenzzzz for implications of the update callbacks to the ownership and lifecycle of route config provider resources.
/wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the design of putting aliasResolution into mux and subscription.
Since it is not implemented yet: if you can extract it to anther PR, we can focus on the rds and ondemand vhds.
@htuch WDYT?
@@ -37,6 +37,8 @@ class NewGrpcMuxImpl : public GrpcMux, | |||
void pause(const std::string& type_url) override; | |||
void resume(const std::string& type_url) override; | |||
bool paused(const std::string& type_url) const override; | |||
void requestAliasesResolution(const std::string& type_url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to promote alias req/resp to first class in mux and subscription?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the method (pls. see the conversation above re: delta_subscription_state's requested_aliases_ field).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to digest this PR with design doc since ondemand involves close interaction with worker threads and main thread. Is the design doc named "on demand RDS"? I need to make sure I am reading the correct doc. Thanks!
class RouteConfigUpdateRequester { | ||
public: | ||
virtual ~RouteConfigUpdateRequester() = default; | ||
virtual void requestRouteConfigUpdate(const HeaderString&, std::function<void()>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think requestRouteConfigUpdate() will be PURE when the PR is merged
source/common/router/rds_impl.cc
Outdated
@@ -211,6 +222,37 @@ void RdsRouteConfigProviderImpl::onConfigUpdate() { | |||
prev_config->config_ = new_config; | |||
return previous; | |||
}); | |||
|
|||
const auto aliases = config_update_info_->aliasesInLastVhdsUpdate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are way too many copies of aliases
. IMHO this one should be shared_ptr and the below runOnAllThreads will capture the shared_ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a counter-point: I don't think it's ever going to be large number of aliases in the set at one time (I expect a couple at most). Only two copies are being made: one from the original protobuf, the other one is put in the lambda context. Do you think shared_ptr is still worth it considering small size of aliases being copied and the overhead of accessing a shared_ptr?
source/common/router/rds_impl.h
Outdated
@@ -164,6 +169,15 @@ class RdsRouteConfigSubscription : Envoy::Config::SubscriptionCallbacks, | |||
|
|||
using RdsRouteConfigSubscriptionSharedPtr = std::shared_ptr<RdsRouteConfigSubscription>; | |||
|
|||
struct UpdateOnDemandCallback { | |||
const std::set<std::string> aliases_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The const give me the hint that the set should be a shared_ptr among all threads
@dmitri-d could you pls link the issue or the design doc ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we can split this PR into a control plane side PR + a data-plane side PR.
Please link a design doc or add some more doc about how on-demand rds works.
In my previous understanding, on-demand xDS is a xDS-horizontal framework change.
} | ||
|
||
Http::FilterHeadersStatus OnDemandRouteUpdate::decodeHeaders(Http::HeaderMap&, bool) { | ||
requestRouteConfigUpdate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also check if the host is already in the route-table and continue from there. IIUC If RDS request doesn't return, this essentially hangs processing of every requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is made inside the method: https://github.com/envoyproxy/envoy/pull/8617/files#diff-53e919b583d3290e6fddf324e7e69885R13
source/common/router/rds_impl.cc
Outdated
aliases.end(), std::back_inserter(aliases_not_in_update)); | ||
if (aliases_not_in_update.empty()) { | ||
callbacks.pop(); | ||
update_on_demand_callback.cb_(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some life-circle concern here, a stream may not live as long as a subscription, in which case this callback ends in unknown behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the gRPC stream carrying the subscription? If so, it is a safe assumption; the subscription is a shared owner of a GrpcMux, which is the owner of the GrpcStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I mean the per HTTP request ActiveStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pause the filter chain until a response arrives and the callback resumes its execution. Wouldn't that mean that the ActiveConnection sticks around too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I am saying a subscription may outlive an activeConnection. For example, with lambdai@'s recent change, a RDS subscription will outlive a listener. when that happens, the update_on_demand_callback.cb_() end in undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flush my thoughts
ENVOY_LOG( | ||
debug, | ||
"rds: vhds configuration present/changed, (re)starting vhds: config_name={} hash={}", | ||
route_config_name_, config_update_info_->configHash()); | ||
maybeCreateInitManager(version_info, noop_init_manager, resume_rds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reminder of #9254
route_config_proto_ = rc; | ||
last_config_hash_ = new_hash; | ||
const uint64_t new_vhds_config_hash = rc.has_vhds() ? MessageUtil::hash(rc.vhds()) : 0ul; | ||
vhds_configuration_changed_ = new_vhds_config_hash != last_vhds_config_hash_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the other last_
series members. These members represents the ongoing update but is used by dump function. I am able to reasonable about the life time now
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
…started Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
ping @lambdai: fixed RouteConfigUpdatedCallback lifecycle issues that you pointed out |
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
void RouteConfigUpdateReceiverImpl::collectResourceIdsInUpdate( | ||
const Protobuf::RepeatedPtrField<envoy::service::discovery::v3alpha::Resource>& | ||
added_resources) { | ||
resource_ids_in_last_update_.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how actionable this is, but not a huge fan of building these mutable stateful objects, would have preferred to just deliver a LastUpdateInfo
object along with the update to the receivers and then discarded. But, it's probably not a deal breaker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, not optimal. There's a TODO to split the state here: https://github.com/envoyproxy/envoy/pull/8617/files/8e7b8f2f43bf00042d538501a7a076da76a42c6e#diff-dcbe1eb34bb1b29e349d7c87c0f20337R61
LGTM modulo nits (and I'd like to verify the coverage report, I can't see it for some reason on CircleCI). Thanks for the fantastic work @dmitri-d! |
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DS code path LGTM
I am not very familiar with the http filter operation.
Throw the ball to @htuch @stevenzzzz
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, this is really a huge contribution to xDS, our first on-demand API and a big enabler of scalability work. Thanks for all the work on review iterations as well.
Thank you everyone for reviews and feedback! |
Signed-off-by: Dmitri Dolguikh ddolguik@redhat.com
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Description: Implements on-demand resolution of VirtualHosts via VHDS
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]