Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle missing case in reestablish local commitment number checks #2721

Merged
merged 3 commits into from
Nov 29, 2023

Conversation

TheBlueMatt
Copy link
Collaborator

If we're behind exactly one commitment (which we've revoked), we'd
previously force-close the channel, guaranteeing we'll lose funds
as the counterparty has our latest local commitment state's
revocation secret.

While this shouldn't happen because users should never lose data,
sometimes issues happen, and we should ensure we always panic.

Further, test_data_loss_protect is updated to test this case.

The data loss protect test was panicking in a message assertion
which should be passing, but because the test was marked only
`#[should_panic]` it was being treated as a successful outcome.
Instead, we use `catch_unwind` on exactly the line we expect to
panic to ensure we are hitting the right one.
@codecov-commenter
Copy link

codecov-commenter commented Nov 9, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (1e25979) 88.80% compared to head (ac3fd98) 90.09%.
Report is 68 commits behind head on main.

Files Patch % Lines
lightning/src/ln/reload_tests.rs 93.57% 7 Missing ⚠️
lightning/src/ln/channel.rs 88.88% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2721      +/-   ##
==========================================
+ Coverage   88.80%   90.09%   +1.28%     
==========================================
  Files         113      114       +1     
  Lines       89170   101278   +12108     
  Branches    89170   101278   +12108     
==========================================
+ Hits        79191    91243   +12052     
+ Misses       7735     7647      -88     
- Partials     2244     2388     +144     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TonyGiorgio
Copy link
Contributor

we should ensure we always panic

I dont think that the PR is doing that though, it just logs and closes the channel with a different case but it would still result in loss of funds from the looks of it.

@benthecarman
Copy link
Contributor

I dont think that the PR is doing that though, it just logs and closes the channel with a different case but it would still result in loss of funds from the looks of it.

the off by one is fixed in the final commit

} else {
Err(ChannelError::Close("Peer attempted to reestablish channel with a very old remote commitment transaction".to_owned()))
Err(ChannelError::Close(format!(
"Peer attempted to reestablish channel with a future local commitment transaction: {} (received) vs {} (expected)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they have a future commitment shouldn't we just send them an an error and have them close, so we don't close with a revoked state

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we won't be closing with a revoked state, it implies we have lost some state, but only a new commitment transaction, we haven't revoked the latest state we have, so its still safe to broadcast.

@@ -4295,8 +4302,18 @@ impl<SP: Deref> Channel<SP> where
order: self.context.resend_order.clone(),
})
}
} else if msg.next_local_commitment_number < next_counterparty_commitment_number {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case before this is else if msg.next_local_commitment_number == next_counterparty_commitment_number - 1 which would qualify for this as well. Should this be msg.next_local_commitment_number < next_counterparty_commitment_number - 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its an else if and and the intent is to capture any "they're behind us" case (that we cannot recover from, which is handled in the above conditionals).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think it'd be more clear if this was msg.next_local_commitment_number < next_counterparty_commitment_number - 1 since the statement before is else if msg.next_local_commitment_number == next_counterparty_commitment_number - 1, shouldn't have to rely on the -1 in the previous statement to make this one correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, oh, okay so that's where the disagreement is. So the statement in the error message (even if not closing in response, or the "very" part of the comment) is correct with or without the - 1. Given the else clause is "future", it seems better to use "very old" in the "one step old" case.

Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK!

This pull request effectively addresses the scenario where the next_remote_commitment_number is equal to our_commitment_transaction + 1. Previously, this specific case was not accounted for, leading to the triggering of the else statement and resulting in a final close and hence loss of funds. Now, with the recent changes, this situation is properly handled under the 'We have fallen behind case…'

I'm planning to review the test soon and will share my complete review shortly!

@TonyGiorgio
Copy link
Contributor

FYI, since we started panic-ing here instead of allowing it to force close when running behind, we have had a lot less force closes and none that have resulted in justice transactions. And no reports of errors on the wallet side yet.

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after squash

lightning/src/ln/channel.rs Show resolved Hide resolved
When we reestablish there are generally always 4 conditions for
both local and remote commitment transactions:
 * we're stale and have possibly lost data
 * we're ahead and the peer has lost data
 * we're caught up
 * we're nearly caught up and need to retransmit one update.

In especially the local commitment case we had a mess of different
comparisons, which is improved here. Further, the error messages
are clarified and include more information.
If we're behind exactly one commitment (which we've revoked), we'd
previously force-close the channel, guaranteeing we'll lose funds
as the counterparty has our latest local commitment state's
revocation secret.

While this shouldn't happen because users should never lose data,
sometimes issues happen, and we should ensure we always panic.

Further, `test_data_loss_protect` is updated to test this case.
@TheBlueMatt
Copy link
Collaborator Author

Squashed without any changes -

$ git diff-tree -U1 1e4c520a ac3fd98e
$ 

@TheBlueMatt TheBlueMatt added this to the 0.0.119 milestone Nov 28, 2023
@@ -4152,14 +4152,15 @@ impl<SP: Deref> Channel<SP> where
return Err(ChannelError::Close("Peer sent an invalid channel_reestablish to force close in a non-standard way".to_owned()));
}

let our_commitment_transaction = INITIAL_COMMITMENT_NUMBER - self.context.cur_holder_commitment_transaction_number - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat poor choice for a variable name that's a number/index, and not an actual tx

@wpaulino wpaulino merged commit c2bbfff into lightningdevkit:main Nov 29, 2023
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants