Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPP: Enable MultiPathPayments for payment lifecycle #3970

Merged
merged 34 commits into from
Apr 3, 2020

Conversation

halseth
Copy link
Contributor

@halseth halseth commented Jan 29, 2020

This PR makes the changes needed to support multi path payments from the payment lifecycle's perspective, by sending and collecting multiple payment shards simultaneously.

There parts that see the most significant change in this PR:

PaymentSession

The RequestRoute call now takes maxAmt and feelimit as arguments. The idea here is that it will use this this create a route that carry an amount up to this maximum. The actual splitting is handled in #3967, so note that in this PR the PaymentSession will always create a single shard with the full amount always.

ControlTower (channeldb.payment_control)

To allow each payment to have multiple HTLCs in flight at the same time, and handle all the cases of the shards failing/settling individually, we redefine (and loosen up the strictness a bit) the ControlTower.

We now allow registering new attempts for a payment as long as the payment is in flight and has not reached a terminal condition. A terminal condition is either a settled shard, or one of the shards encountering a failure that should fail the payment. In these cases we want to wait for all outstanding shards to finish, but we don't want to register any new ones.

Settling/failing HTLC attempts now is handled independently of the payment level status.

Failing the payment is now allowed as long as the payment is known. The idea here is that any shard that encounter a terminal error can fail the payment, but the payment status is not actually failed before all shards have finished, and none of the shards settled. This is to simplify the interaction between shards, as they don't have to coordinate to record the ultimate payment outcome in the database. This also handles the case where the payment is in the process of being failed, but then one of the late shards settle, overriding the recorded failure.

tl;dr: We record shard settles/fail individually for each HTLC, any shard can fail the payment, but when we read the payment status we do it in this order:

  1. If there are in flight HTLC: PaymentStatus=InFlight
  2. If there are any settled shards: PaymentStatus=Succeeded
  3. If the payment level failure is set: PaymentStatus=Failed

SendPayment/PaymentLifecycle

The result returned from SendPayment is changed slightly. In cases of terminal payment failures, earlier we would return a string depicting the last error we encountered during path finding. This was mainly done to be able to get the failure outcome for a SendToRoute attempt, since for a regular SendPayment many many attempts can be made before an error is returned, and in that case the last error isn't worth much more than all other encountered errors.

With this insight we now remove this lastError "hack" (see SendToRoute section below), and return a generic no_route error from SendPayment when path finding ultimately fails. With the introduction of HTLC attempt tracking in #3989, we can instead get the individual HTLC errors from doing a ListPayments call.

The main reason we wanted to remove this lastError tracking is that it makes even less sense in a multi-shard scenario. When we have several concurrent shards that fails, there is really no such thing as a "last error", and it makes more sense to look up the HTLC outcomes individually.

To enable multi-shard sending, the payment lifecycle will now request shards up to a given amount from the PaymentSession, and use the resulting route to create a shard. Shards will be launched as long as there is remaining value to be sent. A payment is considered successful the moment a preimage comes back (and failed the moment at least one of the shards encounter a terminal error) and all outstanding shards have failed.

SendToRoute

With the abstractions done in the payment lifecycle, SendToRoute no longer needs to be launched using a payment lifecycle. Instead we call launchShard and collectResult directly, avoiding having to use a prebuilt payment session. Note that for SendToRoute payments, we always record a failed shard as a "terminal error" in the database, since we can never know whether it is the last SendToRoute call that will be made to this payment hash. This works out nicely with the new ControlTower setup.

TODO:

  • unit tests
  • MPP integration test

Builds on #4000

@halseth halseth force-pushed the amp-router-mvp-2020 branch 6 times, most recently from dd1b592 to 389e145 Compare February 11, 2020 18:57
@halseth halseth force-pushed the amp-router-mvp-2020 branch from 389e145 to 97268ac Compare March 6, 2020 14:42
@halseth halseth changed the title [WIP] Router MPP changes MPP: Enable MultiPathPayments for payment lifecycle Mar 6, 2020
@halseth halseth marked this pull request as ready for review March 6, 2020 14:49
@halseth halseth requested review from joostjager and removed request for Roasbeef and cfromknecht March 6, 2020 14:50
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed the pr in total, but wanted to drop this high level comment right away.

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
@halseth
Copy link
Contributor Author

halseth commented Mar 9, 2020

Because the payment loop goroutine remains active and uses up resources, it always needs to end. This necessitates a somewhat artificial timeout on the sendtoroute-paymentsession. An idea could be to not have an active payment loop, but let the per-htlc goroutines control the payment process. Whenever an htlc comes back, the current state of the payment is evaluated. The amount and fee budget remaining is calculated directly from the payment in the database and based on that zero, one or multiple additional htlcs are launched.

This would remove the need for the payment session timeout and possibly also simplify the code. Separate tracking of shards in-flight, amount remaining etc is no longer required, reducing the potential for bugs and desynchronization.

If we want this, this is the time to do it. We won't overhaul the core of payment sending again anytime soon I expect, given the size of the change set in this pr.

I think this is a good idea. Please hold off review until I've pushed a version doing it this way (or found that it is not feasible).

@Roasbeef
Copy link
Member

Can be rebased now (on top of the changes for the modification referenced above).

@halseth halseth force-pushed the amp-router-mvp-2020 branch 2 times, most recently from e276580 to 02f9757 Compare March 12, 2020 15:37
@halseth
Copy link
Contributor Author

halseth commented Mar 12, 2020

New version pushed, review can be resumed.

(unit test changes still WIP)

@halseth halseth force-pushed the amp-router-mvp-2020 branch from 2d68fb3 to a426f1e Compare April 2, 2020 08:25
@joostjager
Copy link
Contributor

Got this on shutdown:
2020-04-02 10:35:44.218 [ERR] CRTR: Error collecting result for shard 6012 for payment b0974a2d153c3ceae1258e910a3c16941e7a1e064b4f10e60c6f2b3c9eb208c7: router shutting down

Maybe exempt logging for router shut down

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ready to roll with this monster PR. Only functional outstanding issue is the inconsistency of the state reporting between TrackPayment and ListPayments. Discussed offline to address in a follow-up.

There is a bit more unification work to be done probably to bring those two rpc calls closer. Also the response of routerrpc.SendToRoute could be a message of type lnrpc.HTLCAttempt.

Overall great work to make the switch-over to multi-part sends possible. Definitely worth a mention in the Lightning history books 📖

remainingAmt: p.totalAmount - sentAmt,
remainingFees: feeBudget,
terminate: terminate,
}, nil
}

// resumePayment resumes the paymentLifecycle from the current state.
func (p *paymentLifecycle) resumePayment() ([32]byte, *route.Route, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably still needed for the legacy sendpayment response that returns the success route on the payment level.

halseth and others added 14 commits April 2, 2020 19:29
This commit redefines how the control tower handles shard and payment
level settles and failures. We now consider the payment in flight as
long it has active shards, or it has no active shards but has not
reached a terminal condition (settle of one of the shards, or a payment
level failure has been encountered).

We also make it possible to settle/fail shards regardless of the payment
level status (since we must allow late shards recording their status
even though we have already settled/failed the payment).

Finally, we make it possible to Fail the payment when it is already
failed. This is to allow multiple concurrent shards that reach terminal
errors to mark the payment failed, without havinng to synchronize.
(almost) PURE CODE MOVE
The only code change is to change a few select cases from

case _ <- channel:
to
case <- channel:

to please the linter.

The test is testing the payment lifecycle, so move it to
payment_lifecycle_test.go
This also fixes a test bug that the manually created route didn't match
the actual payment amount in the test cases, and adds some fees to the
route.
Also rename Success to SettleAttempt in the tests.
And modify the MissionControl mock to return a non-nil failure reason in
this case.
This commit finally enables MP payments within the payment lifecycle
(used for SendPayment). This is done by letting the loop launch shards
as long as there is value remaining to send, inspecting the outcomes for
the sent shards when the full payment amount has been filled.

The method channeldb.MPPayment.SentAmt() is added to easily look up how
much value we have sent for the payment.
This commit enables MPP sends for SendToRoute, by allowing launching
another payment attempt if the hash is already registered with the
ControlTower.

We also set the total payment amount of of the payment from mpp record,
to indicate that the shard value might be different from the total
payment value.

We only mark non-MPP payments as failed in the database after
encountering a failure, since we might want to try more shards for MPP.
For now this means that MPP sendToRoute payments will be failed
only after a restart has happened.
We add validation making sure we are not trying to register MPP shards
for non-MPP payments, and vice versa. We also add validtion of total
sent amount against payment value, and matching MPP options.

We also add methods for copying Route/Hop, since it is useful to use
for modifying the route amount in the test.
testSendToRouteMultiPath tests that we are able to successfully route a
payment using multiple shards across different paths, by using SendToRoute.

Co-authored-by: Joost Jager <joost.jager@gmail.com>
We whitelist a set of "expected" errors that can be returned from
RequestRoute, by converting them into a new type noRouteError. For any
other error returned by RequestRoute, we'll now exit immediately.
@halseth
Copy link
Contributor Author

halseth commented Apr 2, 2020

Got this on shutdown:
2020-04-02 10:35:44.218 [ERR] CRTR: Error collecting result for shard 6012 for payment b0974a2d153c3ceae1258e910a3c16941e7a1e064b4f10e60c6f2b3c9eb208c7: router shutting down

Maybe exempt logging for router shut down

Fixed

@halseth
Copy link
Contributor Author

halseth commented Apr 2, 2020

lncli is still using the main rpc SendRoute call. But when using routerrpc.SendToRoute, the same thing happens. First htlc has a proper error, second one shows:

[lncli] rpc error: code = Unknown desc = payment has already failed

Fixed.

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version LGTM, congrats @halseth on topping the leaderboard with a new most-commented PR to lnd! 🏆Super excited to get this deployed and get rolling with the follow ups to complete the MPP picture.

@cfromknecht cfromknecht requested review from Roasbeef and removed request for Roasbeef April 3, 2020 14:23
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⚡️

@Roasbeef Roasbeef merged commit 92ee497 into lightningnetwork:master Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants