-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns resolvers: add All lookup mode #18464
Conversation
Signed-off-by: Jose Nino <jnino@lyft.com>
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
source/common/network/dns_impl.cc
Outdated
@@ -104,16 +104,24 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |||
// ARES_ECONNREFUSED. If the PendingResolution has not been cancelled that means that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The c-ares based resolver changed a bit more because it was not originally written to accumulate responses from multiple c-ares callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored for ease of understanding but open to comments to simplify further!
source/common/network/dns_impl.cc
Outdated
dns_name_); | ||
|
||
if (!pending_response_.address_list_.empty()) { | ||
ASSERT(dns_lookup_family_ == DnsLookupFamily::All && !dual_resolution_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below I considered a failure in the second resolution an overall success if the first resolution was successful, and hence fired the callback with the existing addresses and a success state. Up to discussing if this should be the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah IMO a resolve success of either v4 or v6 should be considered success.
cc @DavidSchinazi @RyanTheOptimist for informed opinions :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, success of v4 OR v6 should mean overall success since we have usable addresses
Signed-off-by: Jose Nino <jnino@lyft.com>
2344225
to
386c2cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
source/common/network/dns_impl.cc
Outdated
dns_name_); | ||
|
||
if (!pending_response_.address_list_.empty()) { | ||
ASSERT(dns_lookup_family_ == DnsLookupFamily::All && !dual_resolution_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah IMO a resolve success of either v4 or v6 should be considered success.
cc @DavidSchinazi @RyanTheOptimist for informed opinions :-)
source/common/network/dns_impl.cc
Outdated
return; | ||
} | ||
if (!fallback_if_failed_) { | ||
|
||
if (!dual_resolution_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, are we holding all forward progress on the dns resolver returning v4 and v6 addreses? That might work for a first pass but I believe that the chrome folks have told me it's a bad idea in general for latency / Quality of Experience reasons. If they confirm, I think we need a TODO to handle ipv4 and ipv6 asynchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking about this as well from two different angles:
- Should both c-ares function calls happen concurrently?
- This applies both for c-ares and for apple: should we stream results as they come in to the callback target.
(1) is easy and I could do that in this PR. (2) is a bit harder, and I wouldn't want to do it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would do (1) and not (2) for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alyssawilk @mattklein123 re:(1) do we have a preference for all three modes (All
,Auto
,V4Preferred
) to do the calls concurrently, vs only All
doing it concurrently and Auto
/V4Preferred
only do the second call if the first one fails/returns no addresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, less sure on that - I'll defer to Matt and David.
Totally fine with not doing 2 in this PR but I think we should either comment or TODO that it has latency implications and should probably be fixed at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alyssawilk @mattklein123 I settled on doing both lookups concurrrently for All
and leaving Auto
and V4Preferred
with sequential lookups. I did not want to block this PR on making the necessary changes to enable concurrent lookups for Auto/V4Preferred given that the existing behavior is already sequential, and concurrent lookup for All
did not need bigger changes.
I am already part of the way on the TODO, so will put that up as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that waiting for both means users are waiting unnecessarily. But tackling in a separate PR with TODO here sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Do we have a TODO on not waiting for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decided to create an issue #18572
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely seems like a good way to handle this for now. While suboptimal because it block progres until both return, Chrome does exactly this today. Solving #18572 (later) is a great chance to make envoy mobile better than Chrome!
oh,I had one other question which is if we're doing the scoping to 5 as disussed on slack, if we should do it here, or where we get the resolution? I have a TODO in my code for where we scope it, but no strong opinion I'll send mine out in draft form today so you can TAL |
My preference is for the callback target to decide what it does with all the addresses that the resolver gives it. That was static dns cluster takes everything, logical takes one, dns cache with happy eyeballs does it thing, etc. |
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
@alyssawilk @mattklein123 updated! |
return pending_response_.v4_responses_; | ||
case DnsLookupFamily::All: | ||
ASSERT(pending_response_.all_responses_.empty()); | ||
pending_response_.all_responses_.insert(pending_response_.all_responses_.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding this correctly, it's adding v4 addresses before v6 addresses. In general I'd suggest v6 first but you can get fancier as per the happy eyeballs spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that's the case. But the callback receiver can do what they want with the list, and is where I would expect the ordering complexity to happen; so I am not too opinionated with order here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the RFC 8305 (the happy eyeballs spec) as David suggest, and I agree with Jose that we should probably do that in HappyEyeballsConnectionImpl. I'd be happy to take a whack at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to do this outside.
source/common/network/dns_impl.cc
Outdated
dns_name_); | ||
|
||
if (!pending_response_.address_list_.empty()) { | ||
ASSERT(dns_lookup_family_ == DnsLookupFamily::All && !dual_resolution_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, success of v4 OR v6 should mean overall success since we have usable addresses
source/common/network/dns_impl.cc
Outdated
return; | ||
} | ||
if (!fallback_if_failed_) { | ||
|
||
if (!dual_resolution_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that waiting for both means users are waiting unnecessarily. But tackling in a separate PR with TODO here sounds good
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
I see the coverage failure. Writing a test for the 1 uncovered line |
Signed-off-by: Jose Nino <jnino@lyft.com>
(needs main merge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for working on this. A few comments from a read through, thanks.
/wait
return pending_response_.v4_responses_; | ||
case DnsLookupFamily::All: | ||
ASSERT(pending_response_.all_responses_.empty()); | ||
pending_response_.all_responses_.insert(pending_response_.all_responses_.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to do this outside.
source/common/network/dns_impl.h
Outdated
// or Auto, perform a second resolution if the first one fails. If dns_lookup_family_ is All | ||
// perform the second resolution whether the first one fails or succeeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment out of date? It conflicts with the one above about concurrent? The behavior is that both are concurrent, but we don't raise the final callback until both are done? Or is that not correct? (From further reading I'm pretty sure it's correct but I would do a 2nd comment pass as the logic is actually pretty tricky and I would be concerned we might be missing some corner cases.
source/common/network/dns_impl.cc
Outdated
@@ -97,23 +97,35 @@ void DnsResolverImpl::initializeChannel(ares_options* options, int optmask) { | |||
|
|||
void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, int timeouts, | |||
ares_addrinfo* addrinfo) { | |||
// Purely for checking for invalid state. This will change is Auto and V4Preferred are changed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Purely for checking for invalid state. This will change is Auto and V4Preferred are changed to | |
// Purely for checking for invalid state. This will change if Auto and V4Preferred are changed to |
?
source/common/network/dns_impl.cc
Outdated
// The only way the resolver would have addresses here is if it is configured to resolve All | ||
// families, and the first resolution was successful. In that case, we might as well return | ||
// the addresses that resolved. | ||
ASSERT(dns_lookup_family_ == DnsLookupFamily::All && !dual_resolution_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this seems a bit fragile. Should you change status to ARES_SUCCESS in this case? Do we have test coverage for this case?
source/common/network/dns_impl.cc
Outdated
if (pending_response_.address_list_.empty()) { | ||
pending_response_.status_ = ResolutionStatus::Failure; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? Couldn't the list actually be empty? Or is this due to the new concurrent behavior and the 2nd one failing? Maybe this is related to my comment above about switching status? Can you add more comments?
52d1a88
@mattklein123 Thanks for the comments! I needed to rearrange a piece of code because there was not way to get coverage i.e., I couldn't force c-ares to behave in a way to get us on that path. At first glance I think I covered the new arrangement, but will make sure to double check. At the very least I will add more comments in order to leave everything a bit more clear. I will update when I am back 10/26. cc @alyssawilk |
/wait |
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Alright, @mattklein123 when I came back to this with fresh eyes I realized that constructing the pending response as a failure and only changing to success when the c-ares callback gave us an ARES_SUCCESS made the code simpler. All the cases I can think about are covered with the existing and added test cases. Let me know if this is better! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback
is a lot more complicated than I would like and pretty hard to follow, so if you see any further opportunities to make that easier to understand that would be a nice follow up. Otherwise LGTM after staring at this for a while. Thanks for adding!
Yeah it has never been the easiest to follow. I think this PR is a good first step by taking out |
* main: (221 commits) deps: Bump `protobuf` -> 3.19.0 (envoyproxy#18471) tooling: auto-assign dependency shephards (envoyproxy#18794) clang-tidy: Return from diff fun if empty diff (envoyproxy#18815) repokitteh: Block PRs pending deps approval (envoyproxy#18814) deps: Bump `org_llvm_llvm` -> 12.0.1, `com_github_wavm_wavm` -> 9ffd3e2 (envoyproxy#18747) dns resolvers: add All lookup mode (envoyproxy#18464) doc: fix link formatting for TLS session_timeout (envoyproxy#18790) ext_authz: Set response flag and code details to UAEX when denied (envoyproxy#18740) socket options: add support for directly creating ipv4/ipv6 pairs (envoyproxy#18769) ecds: make onConfigUpdate generic over filter type (envoyproxy#18061) bazel: update CMake instructions in EXTERNAL_DEPS.md (envoyproxy#18799) upstream: fix typo in comment (envoyproxy#18798) runtime: removing envoy.reloadable_features.grpc_json_transcoder_adhere_to_buffer_limits (envoyproxy#18696) bazel: Add CC=clang to clang configuration (envoyproxy#18732) fix error request id in the dubbbo local reply (envoyproxy#18741) event: assert the case of both read and closed event registered (envoyproxy#18265) tcp proxy connect tunneling: improved testing (envoyproxy#18784) deps: Bump `protoc-gen-validate` -> 0.6.2 (envoyproxy#18742) deps: Bump `rules_pkg` -> ad57589 (envoyproxy#18746) bazel: copy .bazelversion for envoy filter examples (envoyproxy#18730) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Commit Message: dns resolvers - add All lookup mode
Additional Description: this change will enable callback targets to access both address families for a given hostname.
Risk Level: low - new behavior guarded by config option
Testing: added UT
Docs Changes: updated
Release Notes: updated