Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htlcswitch: pipeline settles to switch #3143

Merged

Conversation

Crypt-iQ
Copy link
Collaborator

@Crypt-iQ Crypt-iQ commented May 30, 2019

This PR makes the outgoing link pipeline the settle to the
switch as soon as it receives it. Previously, it would wait for a
revocation before sending it, which caused increased latency on
payments as well as possibly never settling on the incoming link.
A duplicate settle is still sent to the switch, but it is handled
gracefully. A new AckEventTicker was added to the switch which
acknowledges any pending settle / fail entries in an outgoing
link's fwd pkgs in batch. This was needed in order to reduce the
number of db txn's which would have been incurred by acking whenever
we receive a duplicate settle without batching.

Fixes #3069

@Crypt-iQ Crypt-iQ changed the title [WIP] htlcswitch: pipeline settles to switch htlcswitch: pipeline settles to switch May 31, 2019
@Roasbeef Roasbeef added htlcswitch incomplete WIP PR, not fully finalized, but light review possible optimization P3 might get fixed, nice to have labels Jun 18, 2019
@wpaulino wpaulino added this to the 0.7.1 milestone Jul 2, 2019
@wpaulino wpaulino added the v0.7.1 label Jul 2, 2019
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Crypt-iQ this looks great, only one minor comment. looking forward to benchmarking the performance improvement!

htlcswitch/link.go Outdated Show resolved Hide resolved
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jul 3, 2019

I have been running flakehunter unit tester on the failed test in travis and it hasn't failed... so I think it's unrelated. I have seen that error on my mac though (unrelated to this branch).

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Crypt-iQ a couple other small comments, fixups can be squashed as well!

htlcswitch/switch.go Outdated Show resolved Hide resolved
htlcswitch/switch.go Show resolved Hide resolved
@Roasbeef Roasbeef requested a review from wpaulino July 9, 2019 23:27
Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but since there'll be duplicate settles we'll see some error logs that aren't necessary and might lead to users thinking there's an issue. Below are the trace logs for a completed payment:

2019-07-09 18:55:57.835 [DBG] HSWC: ChannelLink(668de44edf52c85c7ec6e54e604dcfe88d6a312cb08afb7d25bf49eb11ecedb4:0): sampled fee rate for 3 block conf: 12500 sat/kw
2019-07-09 18:56:02.893 [TRC] HSWC: Committing fresh circuits: ([]channeldb.CircuitKey) (len=1 cap=1) {
 (channeldb.CircuitKey) (Chan ID=0:0:0, HTLC ID=3)
}
2019-07-09 18:56:02.919 [TRC] HSWC: ChannelLink(401:1:0) received switch packet inkey=(Chan ID=0:0:0, HTLC ID=3), outkey=(Chan ID=401:1:0, HTLC ID=0)
2019-07-09 18:56:02.919 [TRC] HSWC: ChannelLink(401:1:0) Received downstream htlc: payment_hash=71b691fe313daa0a8ff7b18824f755240561a11512a18fe12c7c745dee7ed532, local_log_index=2, batch_size=1
2019-07-09 18:56:02.919 [DBG] HSWC: ChannelLink(401:1:0) Queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=3)->(Chan ID=401:1:0, HTLC ID=2)
2019-07-09 18:56:02.972 [TRC] HSWC: Opening finalized circuits: ([]htlcswitch.Keystone) (len=1 cap=1) {
 (htlcswitch.Keystone) (Chan ID=0:0:0, HTLC ID=3) --> (Chan ID=401:1:0, HTLC ID=2)
}
2019-07-09 18:56:03.008 [DBG] HSWC: ChannelLink(401:1:0) removing Add packet (Chan ID=0:0:0, HTLC ID=3) from mailbox
2019-07-09 18:56:03.008 [TRC] HSWC: Deleting resolved circuits: ([]channeldb.CircuitKey) <nil>
2019-07-09 18:56:03.067 [TRC] HSWC: ChannelLink(401:1:0) processing 0 remote adds for height 5
2019-07-09 18:56:03.211 [DBG] HSWC: Closed completed SETTLE circuit for 71b691fe313daa0a8ff7b18824f755240561a11512a18fe12c7c745dee7ed532: (0:0:0, 3) <-> (401:1:0, 2)
2019-07-09 18:56:03.211 [DBG] HSWC: Tearing down open circuit with SETTLE pkt, removing circuit=(Chan ID=0:0:0, HTLC ID=3) with keystone=(Chan ID=401:1:0, HTLC ID=2)
2019-07-09 18:56:03.218 [TRC] HSWC: Deleting resolved circuits: ([]channeldb.CircuitKey) (len=1 cap=1) {
 (channeldb.CircuitKey) (Chan ID=0:0:0, HTLC ID=3)
}
2019-07-09 18:56:03.257 [DBG] HSWC: Closed completed SETTLE circuit for 71b691fe313daa0a8ff7b18824f755240561a11512a18fe12c7c745dee7ed532: (0:0:0, 3) <-> (401:1:0, 2)
2019-07-09 18:56:03.364 [TRC] HSWC: Deleting resolved circuits: ([]channeldb.CircuitKey) <nil>
2019-07-09 18:56:03.440 [DBG] HSWC: ChannelLink(401:1:0): settle-fail-filter &{1 [0]}
2019-07-09 18:56:03.440 [TRC] HSWC: ChannelLink(401:1:0) processing 0 remote adds for height 6
2019-07-09 18:56:03.440 [ERR] HSWC: Unable to find target channel for HTLC settle/fail: channel ID = 401:1:0, HTLC ID = 2
2019-07-09 18:56:03.441 [ERR] HSWC: ChannelLink(401:1:0) unhandled error while forwarding htlc packet over htlcswitch: Unable to find target channel for HTLC settle/fail: channel ID = 401:1:0, HTLC ID = 2
2019-07-09 18:56:12.673 [DBG] HSWC: Sent 100 satoshis and received 0 satoshis in the last 10 seconds (0.200000 tx/sec)
2019-07-09 18:56:18.475 [DBG] HSWC: Acked 1 settle fails: ([]channeldb.SettleFailRef) (len=1 cap=1) {
 (channeldb.SettleFailRef) {
  Source: (lnwire.ShortChannelID) 401:1:0,
  Height: (uint64) 6,
  Index: (uint16) 0
 }
}

htlcswitch/switch.go Show resolved Hide resolved
htlcswitch/switch.go Outdated Show resolved Hide resolved
server.go Outdated
@@ -430,6 +430,8 @@ func newServer(listenAddrs []net.Addr, chanDB *channeldb.DB, cc *chainControl,
htlcswitch.DefaultFwdEventInterval),
LogEventTicker: ticker.New(
htlcswitch.DefaultLogInterval),
AckEventTicker: ticker.New(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: wrapping should be (if it doesn't fit in 80 chars):

ticker.New(
    htlcswitch.DefaultAckInterval,
)

// because the settles are pipelined to the switch and otherwise
// the bandwidth won't be updated by the time Alice receives a
// response here.
time.Sleep(2 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a mock ticker to send a force tick and avoid the sleep.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about how this would work. We can feed ticks to the AckEventTicker, but that won't help since we need to actually wait for the revocation from Bob (the duplicate settle doesn't matter here) and there doesn't seem to be a straightforward way to be notified upon receiving a revocation. We could loop and timeout after some timeout interval, but that's not so clean either...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems TestChannelLinkSingleHopPayment fails for a similar reason and can be updated with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, you're right. I see why the sleep is indeed needed, seems fine to me as is for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the TestChannelLinkSingleHopPayment no longer fails on my machine with flake-unit, so maybe it's just my Mac.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it flakes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use lntest.WaitPredicate instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get an import cycle then, but we could just copy the function?

@Crypt-iQ
Copy link
Collaborator Author

@wpaulino I wasn't entirely sure what to do about the extra error logs - I could remove it if the packet has a destRef field? What was once erroneous behavior is the norm so I think that'd make sense. What do you think?

@Crypt-iQ Crypt-iQ force-pushed the pipelining_settle_0525 branch from 09f980e to 9f065dc Compare July 11, 2019 03:36
@Crypt-iQ Crypt-iQ requested a review from Roasbeef as a code owner July 11, 2019 03:36
@wpaulino
Copy link
Contributor

@wpaulino I wasn't entirely sure what to do about the extra error logs - I could remove it if the packet has a destRef field? What was once erroneous behavior is the norm so I think that'd make sense. What do you think?

Sounds good to me. I think @cfromknecht had some thoughts on possibly refactoring that portion of code to improve clarity of the new behavior, though I'd say it's not a blocker for this PR.

@Crypt-iQ
Copy link
Collaborator Author

@wpaulino I wasn't entirely sure what to do about the extra error logs - I could remove it if the packet has a destRef field? What was once erroneous behavior is the norm so I think that'd make sense. What do you think?

Sounds good to me. I think @cfromknecht had some thoughts on possibly refactoring that portion of code to improve clarity of the new behavior, though I'd say it's not a blocker for this PR.

If we don't log the error, we should still return the error and it will still get reported in the link though. So idk if the change is worth it?

htlcswitch/switch.go Outdated Show resolved Hide resolved
htlcswitch/switch.go Show resolved Hide resolved
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff turned out to be much smaller than I had originally anticipated, me gusta!


// DefaultAckInterval is the duration between attempts to ack any settle
// fails in a forwarding package.
DefaultAckInterval = 15 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would this we want this value to be much lower, like in the ms? The two above are for non-critical operations like logging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't too critical, it just needs to happen at some point so we stop resending the htlc internally. if it doesn't happen before shutdown it will be done when on the next connection

lntest/itest/lnd_test.go Outdated Show resolved Hide resolved
@Crypt-iQ Crypt-iQ force-pushed the pipelining_settle_0525 branch 3 times, most recently from 6f1c1c3 to 029956e Compare July 17, 2019 11:05
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work on this optimization @Crypt-iQ !! 💯

@Roasbeef
Copy link
Member

Needs a rebase @Crypt-iQ.

This commit makes the outgoing link pipeline the settle to the
switch as soon as it receives it. Previously, it would wait for a
revocation before sending it, which caused increased latency on
payments as well as possibly never settling on the incoming link.
A duplicate settle is still sent to the switch, but it is handled
gracefully. A new AckEventTicker was added to the switch which
acknowledges any pending settle / fail entries in an outgoing
link's fwd pkgs in batch. This was needed in order to reduce the
number of db txn's which would have been incurred by acking whenever
we receive a duplicate settle without batching.
@Crypt-iQ Crypt-iQ force-pushed the pipelining_settle_0525 branch from 029956e to 00814dc Compare July 18, 2019 21:20
@Crypt-iQ
Copy link
Collaborator Author

Rebased and squashed fixups @Roasbeef

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⛩

@cfromknecht cfromknecht merged commit 807012f into lightningnetwork:master Jul 20, 2019
@cfromknecht
Copy link
Contributor

@Crypt-iQ Cheers to lower latency for anyone routing through LND!! 🍾🎉🔥

@joostjager
Copy link
Contributor

joostjager commented Oct 31, 2019

Created issue for unjustified error: #3656

Not ideal that there is always an error added to the log during the very basic operation of a successful payment.

@Crypt-iQ Crypt-iQ mentioned this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
htlcswitch incomplete WIP PR, not fully finalized, but light review possible optimization P3 might get fixed, nice to have
Projects
None yet
Development

Successfully merging this pull request may close these issues.

htlcswitch: pipeline pre-images to the incoming link as soon as it's obtained
6 participants