-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce configurable DNS resolution timeout #1113
base: main
Are you sure you want to change the base?
Conversation
|
/// Changes DNS hostname resolution timeout. | ||
/// The default is 5 seconds. | ||
/// | ||
/// # Example | ||
/// ``` | ||
/// # use scylla::{Session, SessionBuilder}; | ||
/// # use std::time::Duration; | ||
/// # async fn example() -> Result<(), Box<dyn std::error::Error>> { | ||
/// let session: Session = SessionBuilder::new() | ||
/// .known_node("127.0.0.1:9042") | ||
/// .hostname_resolution_timeout(Duration::from_secs(10)) | ||
/// .build() // Turns SessionBuilder into Session | ||
/// .await?; | ||
/// # Ok(()) | ||
/// # } | ||
/// ``` | ||
pub fn hostname_resolution_timeout(mut self, duration: Duration) -> Self { | ||
self.config.hostname_resolution_timeout = Some(duration); | ||
self | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should accept Option<Duration>
, especially that the default is Some
. Now it's impossible to set no timeout with SessionBuilder
's API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to be consistent with other optional timeout options. See for example SessionBuilder::keepalive_interval()
It accepts Duration
, and sets the corresponding option to Some(duration)
as well.
If user wants to set any of these to None
, he can do so by direct access to the field (fields are public):
sb.config.hostname_resolution_timeout = None;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I dislike the idea of SessionBuilder
exposing only subset of possible configuration options using methods, with some being available only through fields.
WDYT @Lorak-mmk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could expose another methods (disable_hostname_resolution_timeout
, disable_keepalive
) which would be a bit more descriptive.
/// Performs a DNS lookup with provided optional timeout. | ||
async fn lookup_host_with_timeout<T: ToSocketAddrs>( | ||
host: T, | ||
hostname_resolution_timeout: Option<Duration>, | ||
) -> Result<impl Iterator<Item = SocketAddr>, DnsLookupError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use impl ToSocketAddrs
, as it's more concise and we get nothing with explicit type parameter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I'm suprised that such a trait exists
let addrs = match lookup_host_with_timeout(hostname, hostname_resolution_timeout).await { | ||
Ok(addrs) => itertools::Either::Left(addrs), | ||
// Use a default port in case of error, but propagate the original error on failure | ||
Err(e) => { | ||
let addrs = lookup_host((hostname, 9042)).await.or(Err(e))?; | ||
let addrs = lookup_host_with_timeout((hostname, 9042), hostname_resolution_timeout) | ||
.await | ||
.or(Err(e))?; | ||
itertools::Either::Right(addrs) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the timeout is effectively twice the value provided in the config...
I'm not convinced it's correct. This must be at least noted in documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... you are right. I wonder whether we should leave it as is, or rather recompute the timeout for the second try in case first failed.
let deadline = hostname_resolution_timeout.map(|dur| tokio::time::Instant::now() + dur);
let addrs = match lookup_host_with_timeout(hostname, hostname_resolution_timeout).await {
Ok(addrs) => itertools::Either::Left(addrs),
// Use a default port in case of error, but propagate the original error on failure
Err(e) => {
let new_timeout =
deadline.map(|deadline| deadline.duration_since(tokio::time::Instant::now()));
let addrs = lookup_host_with_timeout((hostname, 9042), new_timeout)
.await
.or(Err(e))?;
itertools::Either::Right(addrs)
}
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first attempt failed from reasons different that a timeout, then such recomputation makes perfect sense. However, if it failed due to a timeout (because the wrong port somehow caused a timeout - idk how likely this scenario is), then we won't be able to do the second attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas @Lorak-mmk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also execute resolve_hostname
function with a given timeout (i.e. tokio::time::timeout(t, resolve_hostname)
). It's much easier and cleaner than recomputing the timeout. However, in such case it's still true that if first attempt takes too long, we won't be able to do a second attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can it happen that the port is not specified and what is the expected behavior of lookup_host
in this case?
If it always fails immediately with a specific error, then we can check this error and perform lookup with default port with the same timeout.
If it can timeout, or take some time in general, then I'd perform another lookup with the same timeout and document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my investigation, it looks like port is not used in the actual DNS lookup (which makes sense).
What's the difference between String
and (String, u16)
, then?
When String
is provided, both tokio
and std::net
expect that it is of form "host:port"
. If it's not the case, the error is returned immidiately. Otherwise, host
part is used in DNS lookup which may timeout.
When (String, u16)
is provided, the DNS lookup is performed based on the string - port is ignored during lookup (same as above).
This means that:
- if hostname is of form
"addr:port"
, then the first lookup timeouts iff the second lookup (with default port) timeouts - if hostname is not of this form, then the first lookup will fail immidiately. We can try with default port.
In that case, I think that we can match the error returned from lookup_host_with_timeout
If it's something other than DnsLookupError::Timeout(_)
, then we can try again with the same timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let's describe this behavior in the docs.
This option is responsible for setting the DNS hostname resolution timeout. The default timeout is set to 5 seconds.
…nfig These structs need this context to pass it forward to functions responsible for DNS lookup.
Passed the hostname_resolution_timeout down to the functions responsible for DNS resolution logic. Created a pub(crate) error type to distinguish between errors that can occur during hostname resolution. Notice: it's pub(crate) since the users of this API only emit logs in case of error. The errors are not passed to public API. Created a `lookup_host_with_timeout` function, and extracted some logic here. The purpose is not to introduce complex branching in original function.
a01ba38
to
86e69d8
Compare
Rebased on main |
fix: #1033
Pre-review checklist
[ ] I have adjusted the documentation in./docs/source/
.Fixes:
annotations to PR description.