-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce configurable DNS resolution timeout #1113
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
use itertools::Itertools; | ||
use tokio::net::lookup_host; | ||
use thiserror::Error; | ||
use tokio::net::{lookup_host, ToSocketAddrs}; | ||
use tracing::warn; | ||
use uuid::Uuid; | ||
|
||
|
@@ -13,6 +14,7 @@ use crate::transport::errors::{ConnectionPoolError, QueryError}; | |
use std::fmt::Display; | ||
use std::io; | ||
use std::net::IpAddr; | ||
use std::time::Duration; | ||
use std::{ | ||
hash::{Hash, Hasher}, | ||
net::SocketAddr, | ||
|
@@ -267,27 +269,53 @@ pub(crate) struct ResolvedContactPoint { | |
pub(crate) datacenter: Option<String>, | ||
} | ||
|
||
#[derive(Error, Debug)] | ||
pub(crate) enum DnsLookupError { | ||
#[error("Failed to perform DNS lookup within {0}ms")] | ||
Timeout(u128), | ||
#[error("Empty address list returned by DNS for {0}")] | ||
EmptyAddressListForHost(String), | ||
#[error(transparent)] | ||
IoError(#[from] io::Error), | ||
} | ||
|
||
/// Performs a DNS lookup with provided optional timeout. | ||
async fn lookup_host_with_timeout<T: ToSocketAddrs>( | ||
host: T, | ||
hostname_resolution_timeout: Option<Duration>, | ||
) -> Result<impl Iterator<Item = SocketAddr>, DnsLookupError> { | ||
if let Some(timeout) = hostname_resolution_timeout { | ||
match tokio::time::timeout(timeout, lookup_host(host)).await { | ||
Ok(res) => res.map_err(Into::into), | ||
// Elapsed error from tokio library does not provide any context. | ||
Err(_) => Err(DnsLookupError::Timeout(timeout.as_millis())), | ||
} | ||
} else { | ||
lookup_host(host).await.map_err(Into::into) | ||
} | ||
} | ||
|
||
// Resolve the given hostname using a DNS lookup if necessary. | ||
// The resolution may return multiple IPs and the function returns one of them. | ||
// It prefers to return IPv4s first, and only if there are none, IPv6s. | ||
pub(crate) async fn resolve_hostname(hostname: &str) -> Result<SocketAddr, io::Error> { | ||
let addrs = match lookup_host(hostname).await { | ||
pub(crate) async fn resolve_hostname( | ||
hostname: &str, | ||
hostname_resolution_timeout: Option<Duration>, | ||
) -> Result<SocketAddr, DnsLookupError> { | ||
let addrs = match lookup_host_with_timeout(hostname, hostname_resolution_timeout).await { | ||
Ok(addrs) => itertools::Either::Left(addrs), | ||
// Use a default port in case of error, but propagate the original error on failure | ||
Err(e) => { | ||
let addrs = lookup_host((hostname, 9042)).await.or(Err(e))?; | ||
let addrs = lookup_host_with_timeout((hostname, 9042), hostname_resolution_timeout) | ||
.await | ||
.or(Err(e))?; | ||
itertools::Either::Right(addrs) | ||
} | ||
Comment on lines
+305
to
313
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This means that the timeout is effectively twice the value provided in the config... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm... you are right. I wonder whether we should leave it as is, or rather recompute the timeout for the second try in case first failed. let deadline = hostname_resolution_timeout.map(|dur| tokio::time::Instant::now() + dur);
let addrs = match lookup_host_with_timeout(hostname, hostname_resolution_timeout).await {
Ok(addrs) => itertools::Either::Left(addrs),
// Use a default port in case of error, but propagate the original error on failure
Err(e) => {
let new_timeout =
deadline.map(|deadline| deadline.duration_since(tokio::time::Instant::now()));
let addrs = lookup_host_with_timeout((hostname, 9042), new_timeout)
.await
.or(Err(e))?;
itertools::Either::Right(addrs)
}
}; There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the first attempt failed from reasons different that a timeout, then such recomputation makes perfect sense. However, if it failed due to a timeout (because the wrong port somehow caused a timeout - idk how likely this scenario is), then we won't be able to do the second attempt. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any ideas @Lorak-mmk? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could also execute There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When can it happen that the port is not specified and what is the expected behavior of If it always fails immediately with a specific error, then we can check this error and perform lookup with default port with the same timeout. If it can timeout, or take some time in general, then I'd perform another lookup with the same timeout and document it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my investigation, it looks like port is not used in the actual DNS lookup (which makes sense). What's the difference between When This means that:
In that case, I think that we can match the error returned from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. Let's describe this behavior in the docs. |
||
}; | ||
|
||
addrs | ||
.find_or_last(|addr| matches!(addr, SocketAddr::V4(_))) | ||
.ok_or_else(|| { | ||
io::Error::new( | ||
io::ErrorKind::Other, | ||
format!("Empty address list returned by DNS for {}", hostname), | ||
) | ||
}) | ||
.ok_or_else(|| DnsLookupError::EmptyAddressListForHost(hostname.to_owned())) | ||
} | ||
|
||
/// Transforms the given [`InternalKnownNode`]s into [`ContactPoint`]s. | ||
|
@@ -296,6 +324,7 @@ pub(crate) async fn resolve_hostname(hostname: &str) -> Result<SocketAddr, io::E | |
/// In case of a plain IP address, parses it and uses straight. | ||
pub(crate) async fn resolve_contact_points( | ||
known_nodes: &[InternalKnownNode], | ||
hostname_resolution_timeout: Option<Duration>, | ||
) -> (Vec<ResolvedContactPoint>, Vec<String>) { | ||
// Find IP addresses of all known nodes passed in the config | ||
let mut initial_peers: Vec<ResolvedContactPoint> = Vec::with_capacity(known_nodes.len()); | ||
|
@@ -323,7 +352,7 @@ pub(crate) async fn resolve_contact_points( | |
let resolve_futures = to_resolve | ||
.into_iter() | ||
.map(|(hostname, datacenter)| async move { | ||
match resolve_hostname(hostname).await { | ||
match resolve_hostname(hostname, hostname_resolution_timeout).await { | ||
Ok(address) => Some(ResolvedContactPoint { | ||
address, | ||
datacenter, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -806,6 +806,27 @@ impl<K: SessionBuilderKind> GenericSessionBuilder<K> { | |
self | ||
} | ||
|
||
/// Changes DNS hostname resolution timeout. | ||
/// The default is 5 seconds. | ||
/// | ||
/// # Example | ||
/// ``` | ||
/// # use scylla::{Session, SessionBuilder}; | ||
/// # use std::time::Duration; | ||
/// # async fn example() -> Result<(), Box<dyn std::error::Error>> { | ||
/// let session: Session = SessionBuilder::new() | ||
/// .known_node("127.0.0.1:9042") | ||
/// .hostname_resolution_timeout(Duration::from_secs(10)) | ||
/// .build() // Turns SessionBuilder into Session | ||
/// .await?; | ||
/// # Ok(()) | ||
/// # } | ||
/// ``` | ||
pub fn hostname_resolution_timeout(mut self, duration: Duration) -> Self { | ||
self.config.hostname_resolution_timeout = Some(duration); | ||
self | ||
} | ||
|
||
Comment on lines
+809
to
+829
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this should accept There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I decided to be consistent with other optional timeout options. See for example If user wants to set any of these to sb.config.hostname_resolution_timeout = None; There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I dislike the idea of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could expose another methods ( |
||
/// Sets the host filter. The host filter decides whether any connections | ||
/// should be opened to the node or not. The driver will also avoid | ||
/// those nodes when re-establishing the control connection. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use
impl ToSocketAddrs
, as it's more concise and we get nothing with explicit type parameter here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I'm suprised that such a trait exists