-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protocols/kad: Fix right shift overflow panic in record_received #1492
Conversation
Within `Behaviour::record_received` the exponentially decreasing expiration based on the distance to the target for a record is calculated as following: 1. Calculate the amount of nodes between us and the record key beyond the k replication constant as `n`. 2. Shift the configured record time-to-live `n` times to the right to calculate an exponentially decreasing expiration. The configured record time-to-live is a u64. If `n` is larger or equal to 64 the right shift will lead to an overflow which panics in debug mode. This patch uses a checked right shift instead, defaulting to 0 (`now + 0`) for the expiration on overflow.
protocols/kad/src/handler.rs
Outdated
@@ -364,6 +364,15 @@ pub struct KademliaRequestId { | |||
connec_unique_id: UniqueConnecId, | |||
} | |||
|
|||
#[cfg(test)] | |||
impl Default for KademliaRequestId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as KademliaRequestId
implements Clone
, this Default
impl might as well be available all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an opaque implementation detail. Implementing Default
on it doesn't make any sense. Maybe it could contain a channel or an Arc
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood KademliaRequestId
to be some sort of unique one-time token and Clone
defeats that purpose but I did not look much into it so feel free to ignore the criticism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine either way. I can't come up with a use-case for Default
outside of testing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood
KademliaRequestId
to be some sort of unique one-time token andClone
defeats that purpose but I did not look much into it so feel free to ignore the criticism.
Just to give some context: KademliaRequestId
did not use to be Clone
but it had to be as a result of #1440 requiring Clone
for the TInEvent
of any handler in libp2p-swarm
, since it is now possible to "broadcast" a single such event to multiple handlers (which in turn was a requirement discovered during the substrate integration in paritytech/substrate#5066) and KademliaRequestId
is part of these events in libp2p-kad
. It would of course be better to not have that Clone
, but I didn't come up with a way to do so.
protocols/kad/src/behaviour/test.rs
Outdated
vec![], | ||
); | ||
|
||
kad.record_received(target, connection_id, request_id, record); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be testing private implementation details. The parameters of this method are totally arbitrary. The fact that it's a separate method at all is also totally arbitrary and subject to change. All we're doing with this test is locking the current implementation as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concern. What are you advocating for instead: (1) remove the test or (2) write a larger test that reproduces the same behaviour?
All we're doing with this test is locking the current implementation as it is.
I don't think a test should ever lock an implementation. While I am in favor of having a test for each regression, I am also in favor of removing outdated tests if necessary.
Thanks for fixing this, I wasn't even aware bit shifts could overflow in this way. A regression test is a good idea but I don't see why it needs to be so complicated. Why not extract the code that calculates the expiry into a standalone function and writing a small property and or / regression test for that? Even if you want the function to actually take a |
Extract right shift into isolated function and replace complex regression test with small isolated one.
@romanb thanks for the input. Can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few simplifying suggestions, let me know if you disagree with any of them.
protocols/kad/src/behaviour.rs
Outdated
let expiration = self.record_ttl.map(|ttl| | ||
now + Duration::from_secs(ttl.as_secs() >> num_beyond_k) | ||
); | ||
let expiration = exp_decr_expiration(self.record_ttl, num_beyond_k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let expiration = exp_decr_expiration(self.record_ttl, num_beyond_k); | |
let expiration = self.record_ttl.map(|ttl| now + exp_decrease(ttl, num_beyond_k)); |
protocols/kad/src/behaviour.rs
Outdated
@@ -1030,6 +1026,15 @@ where | |||
} | |||
} | |||
|
|||
/// Calculate exponentially decreasing expiration from a default time-to-live by a factor. | |||
fn exp_decr_expiration(default_ttl: Option<Duration>, factor: u32) -> Option<Instant> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn exp_decr_expiration(default_ttl: Option<Duration>, factor: u32) -> Option<Instant> { | |
fn exp_decrease(ttl: Duration, exp: u32) -> Instant { |
protocols/kad/src/behaviour.rs
Outdated
@@ -1030,6 +1026,15 @@ where | |||
} | |||
} | |||
|
|||
/// Calculate exponentially decreasing expiration from a default time-to-live by a factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Calculate exponentially decreasing expiration from a default time-to-live by a factor. | |
/// Exponentially decrease the given duration (base 2). |
protocols/kad/src/behaviour.rs
Outdated
@@ -1030,6 +1026,15 @@ where | |||
} | |||
} | |||
|
|||
/// Calculate exponentially decreasing expiration from a default time-to-live by a factor. | |||
fn exp_decr_expiration(default_ttl: Option<Duration>, factor: u32) -> Option<Instant> { | |||
default_ttl.map(|ttl| Instant::now() + Duration::from_secs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_ttl.map(|ttl| Instant::now() + Duration::from_secs( | |
Duration::from_secs(ttl.as_secs().checked_shr(exp).unwrap_or(0)) |
protocols/kad/src/behaviour.rs
Outdated
@@ -946,8 +946,6 @@ where | |||
return | |||
} | |||
|
|||
let now = Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is preferable to leave this here as it can be reused multiple times (cf. #1496). And since Instant::now()
is a side-effecting function for obtaining some notion of system time, usually both testability and performance is improved by using it mostly in top-level functions and passing the instant down to others where needed (see also the other comments).
Suggested simplifications look good to me. @romanb would you mind taking another look? |
Within
Behaviour::record_received
the exponentially decreasingexpiration based on the distance to the target for a record is
calculated as following:
Calculate the amount of nodes between us and the record key beyond
the k replication constant as
n
.Shift the configured record time-to-live
n
times to the right tocalculate an exponentially decreasing expiration.
The configured record time-to-live is a u64. If
n
is larger or equalto 64 the right shift will lead to an overflow which panics in debug
mode.
This patch uses a checked right shift instead, defaulting to 0 (
now + 0
) for the expiration on overflow.Fixes #1491.