Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trampoline onion format (Feature 56/57) #836

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

t-bast
Copy link
Collaborator

@t-bast t-bast commented Jan 25, 2021

Trampoline routing uses layered onions to trustlessly and privately offload the calculation of parts of a payment route to remote trampoline nodes.

A normal onion contains a smaller onion for the last hop of the route, and that smaller onion contains routing information about the next trampoline hop.

Intermediate trampoline nodes "fill the gap" by finding a route to the next trampoline node, and sending it the peeled trampoline onion, until that reaches the final destination.

This PR details the onion construction and requirements for supporting nodes. I advise readers to also have a look at #829 which gives a more high-level view of the different components, how they interact, and provides nice diagrams that help understand the low-level details.

Copy link
Collaborator

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, but I'm not sure I understand the purpose of the payment_secret except inside the internal onion for the final node?

- MUST use a different `session_key` for the `trampoline_onion_packet` and the `onion_packet`
- MUST include the `trampoline_onion_packet` tlv in the _last_ hop's payload of the `onion_packet`
- MUST include the invoice's `payment_secret` in the _last_ hop's payload of the `trampoline_onion_packet`
- MUST generate a different `payment_secret` to use in the outer onion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include payment_secret at all in the outer onion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for the case where trampoline nodes aggregate incoming MPP and then re-split differently to reach the next trampoline node.
This ensures they use the normal MPP validation code and no intermediate nodes can cheat. They could just rely on the total_amount to verify they receive everything, but I like the fact that it works just like normal payments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you need total_msat from payment_data. But payment_secret here doesn't help much: the aggregating node can't know what the correct value is (and I can't see where in the spec you say it should be checked...).

An intermediate can probe by sending its own partial payment with some random payment_secret, so you can't really even say "they must all be the same...".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that in the invoice-based flow the recipient generated the payment_secret and can check the correctness of the value. In the trampoline flow, it's the trampoline sender who generates one, and the recipient blindly accepts and just uses it to bundle together HTLCs from the same set.

But I think it still has some value because it does ensure that HTLCs from two different nodes will not end up being bundled together (and mess up the payment). If we have the following payment (with trampoline route Alice -> T1 -> T2 -> Bob):

Alice -----> ... -----> T1 -----> I1 -----> I2 -----> T2 -----> ... -----> Bob

T1 generated a payment_secret to send HTLCs to T2. I2 is trying to interfere by generating his own (trampoline) payment to T2. They will have a different payment_secret, so they won't interfere with the ones T1 sent.

04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
@rustyrussell rustyrussell changed the title Trampoline onion format Trampoline onion format (Feature 24/25) Mar 1, 2021
Copy link
Contributor

@lightning-developer lightning-developer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the spec should make a recommendation for CLTV and fee budgets that senders of trampoline onions should use for each trampoline to reduce the usage of temporary_trampoline_failure.

Also it seems reasonable to just reply with a single fee_budget_msat instead of * [u32:fee_base_msat] and [u32:fee_proportional_millionths] as those fields seem not to make sense in the context of trampolines.

Trampolines could announce the fee_budget_msat and cltv_budget (in case you like the new name) that they believe to be sufficient in their node announcements?

This error usually indicates that routes were found but failed because of
temporary failures at intermediate hops.

1. type: NODE|25 (`trampoline_fee_expiry_insufficient`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a recommendation that should be used initially by the sender for CLTV and fee budgets?

I understand from your comment in reply to @ecdsa at #829 (comment) that you suggest to have a trial and error mechanism with feedback from the trampoline node. I am worried if more than 1 trampoline node is involved that this might create a lot of unnecessary round trips as the sender would have to learn the fee and cltv budget for the first trampoline and then for the second one and so one. In the meantime the trampoline onion would always be cancelled in such roundtrips.

Maybe a recommendation for the sender to take the median CLTV for paths of up to n hops lenghts and a fee budget for (multi)paths accordingly to start with might be helpful? I understand that especially the fee estimation might be tricky but I would prefer to start somewhere.

I also wonder if it makes sense for trampoline nodes to replicate the base_fee_msat and fee_proportional_millionths mechanism as trampolines have a total fee budget that they will allocate to deliver the payment and earn from what they have not used. Or did you plan that if a trampoline pays too much total base_fee_msat but stays far below the fee_poportional_millionths that you would send temporary_trampoline_failure?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the spec should make a recommendation for CLTV and fee budgets that senders of trampoline onions should use for each trampoline to reduce the usage of temporary_trampoline_failure.

The spec cannot do that, this is entirely up to node operators and implementations and we must let the market decide on good values here (which will likely change more often than the specification does).

Also it seems reasonable to just reply with a single fee_budget_msat instead of * [u32:fee_base_msat] and [u32:fee_proportional_millionths] as those fields seem not to make sense in the context of trampolines.

I don't see why they don't make sense? Base and proportional fees do apply to trampoline nodes, and it's more consistent that way. Trampoline nodes derive values for base and proportional fees by walking the graph outwards and adding the values of individual edges.

Overall your whole comment is addressed by what will come in a second step. My first proposal for trampoline (more than 2 years ago, see #654) did contain a mechanism to gossip trampoline fees and cltv. I decided to drop this part for now, because the most common way of using trampoline doesn't really need it, so it can be added later.

The simplest path to trampoline is to have the recipient include trampoline nodes in their invoices. It's the recipient's responsibility to compute the fees that each trampoline node will need to reach them (which is easy to do by just looking at a small subset of the graph) and include those fees in the invoice. Then the sender will either:

  • directly send to those trampoline nodes (if the sender is able to compute a route to them): in that case there should never be a trampoline fee / cltv failure
  • pick a first trampoline node close to them: in that case they do need to guess what fees and cltv they should give that first trampoline node to allow it to reach the second trampoline node. This may require a retry, but it requires only one, which is efficient enough (the second trampoline node should never require a fee / cltv update because the recipient computed the correct values)

My goal is to work on the gossip part once this first PR is accepted, and after getting some real-world feedback on what works well and what doesn't work in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec cannot do that, this is entirely up to node operators and implementations and we must let the market decide on good values here (which will likely change more often than the specification does).

What I meant was the idea if for example the cheapest path between two trampolines is of 3 hops the sender using these two trampoline nodes should include at least the fees that would be charged on such a path. Similarly with the cltv delay which MUST at least meet that of the 3 channels. While this is a perfectly reasonable recommendation I just realized that it kind of defeats the purpose of Trampolines as the sender node wants to outsource the route computation. Also trampolines might utilize unannounced channels. So yes I agree it does not make sense to add a recommendation to the spec.

Also it seems reasonable to just reply with a single fee_budget_msat instead of * [u32:fee_base_msat] and [u32:fee_proportional_millionths] as those fields seem not to make sense in the context of trampolines.

I don't see why they don't make sense? Base and proportional fees do apply to trampoline nodes, and it's more consistent that way. Trampoline nodes derive values for base and proportional fees by walking the graph outwards and adding the values of individual edges.

Lets say I want to pay 200k sats and I add 50 sats base_fee and a feerate of 1000. This means that the trampoline node can charge a total of 250 sats. What happens if the trampoline node delivers the payment on channels charging a total of 220 sats for the fee rate and 20 sats for their base_fee? I think the trampoline node would have to pay a total of 240 sats and stay below the total fee_budget. It did however spend more on the fee_rate. I think in such a situation the node should process the payment and be happy it earned 10 sats. This is similar to forwarding of regular onions where the difference of outgoing HTLC and incoming HTLC can be considered a fee budget for the routing node and needs to be smaller or equal to the result of the fee formular with fee_rate and base_fee.

If the above goal to forward the payment is the case, then why requiring two separate fees? It seems the final question is only whether the trampoline was able to deliver to the next trampoline while staying within the total fee budget. Also the base fee might complicate things in case the payment amount is larger and the trampoline node has to use MPP. If there was just a single fee rate for the trampoline and the underlying channels the allocation seems much more straight forward as one would effectively have a unit cost but I guess this is more a technical detail.

Overall your whole comment is addressed by what will come in a second step. My first proposal for trampoline (more than 2 years ago, see #654) did contain a mechanism to gossip trampoline fees and cltv. I decided to drop this part for now, because the most common way of using trampoline doesn't really need it, so it can be added later.
[...]
My goal is to work on the gossip part once this first PR is accepted, and after getting some real-world feedback on what works well and what doesn't work in practice.

It seems to me that we will end up having a gossip for trampolines at the end anyway which is why I elaborated above on the fee and cltv budget. I understand why you want to do the upgrade in smaller steps. I will checkout out #654 and come back to the gossip questions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the base and proportional fee, I agree with @lightning-developer that it doesn't make sense to provide both. It makes sense to have both when we don't know the amount that will be paid and want to provide a formula that works for any amount, but here we're in the context of a specific payment, we know the amount and we can compute the exact fee.

Copy link
Collaborator Author

@t-bast t-bast Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm currently providing both is explicitly for that: because it should be used for future payments of unknown amounts.

We don't have a gossip broadcast mechanism yet, but since we need to have a failure message, I believe it makes sense to make it compatible with future gossip. Whenever a payer receives such an error, they should store fee-base and fee-proportional, and use those values for their future payments through that trampoline node. It may be outdated by then (because we don't have yet a mechanism to receive new fees, but we will in the future), but it may still be accurate and win us a round-trip.

You should think of it as complementary to gossip (and in our case, a first, incomplete version of trampoline gossip). That's why I really believe it should be a formula and not a value that works for only one payment, because the formula is a superset of the single amount, so it's always superior, isn't it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would only work if the trampoline node uses the same fee structure regardless of the path used. I don't think it makes sense to advertise trampoline fees before we know what route to use, what if the best route is more expensive than the fee we've advertised? Do we refuse to route the payment? Do we increase the fees for everyone to make this case less frequent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial, imperfect idea was that trampoline nodes would do a BFS to compute how much it would cost them to reach any node that are at at most N hops (in terms of fee-base and fee-proportional). Then they would chose some statistical model to gossip values that would work most of the time while staying competitive. Sometimes the result would be that they receive more fees that they need, sometimes less; then it's an implementation choice to choose whether you sometimes route for less than your budget and make up for it with other payments, or send back an error asking for more fees (at a risk of not being chosen to route the retry).

Note that even returning a fee_budget isn't guaranteed to work: that is only an estimation run by the trampoline node, but until the payment is actually tried, we can't know whether liquidity issues down the path will force us to raise the fees.

This is clearly an area where a lot more research would be useful, this is not an easy problem to solve as we work on very incomplete information. I assumed that it would provide the least friction to stick with the existing fee model, but that could be wrong.

I have a proposal to make this more future-proof. What about putting a tlv stream inside this error, that could contain either a fee_budget tlv (that contains the exact fee that should be used for this payment) or a fees tlv (that contains fee_base and fee_proportional)? Then it's up to implementations to choose whether they include only one of these or both of them, and we can very easily deprecate one field (or both) later when we have better real-world feedback on what fee estimation works for trampoline nodes?

t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Jul 15, 2024
We update the trampoline feature to match the official specification
from lightning/bolts#836.

We remove support for the previous version of trampoline, which means
that when paying nodes that use the experimental version, we will use
the trampoline-to-non-trampoline flow instead. Similarly, when older
nodes pay updated nodes, they won't understand the new trampoline
feature bit and will use the trampoline-to-non-trampoline flow.

We update the trampoline-to-non-trampoline flow to remove the unused
trampoline payload in the onion, which saves some space. Note that we
don't want to officially specify this scenario, as it leaks some data
about the recipient to the trampoline node. We rather wait for nodes
to either support trampoline or blinded paths, which fixes this issue.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Jul 18, 2024
We update the trampoline feature to match the official specification
from lightning/bolts#836.

We remove support for the previous version of trampoline, which means
that when paying nodes that use the experimental version, we will use
the trampoline-to-non-trampoline flow instead. Similarly, when older
nodes pay updated nodes, they won't understand the new trampoline
feature bit and will use the trampoline-to-non-trampoline flow.

We update the trampoline-to-non-trampoline flow to remove the unused
trampoline payload in the onion, which saves some space. Note that we
don't want to officially specify this scenario, as it leaks some data
about the recipient to the trampoline node. We rather wait for nodes
to either support trampoline or blinded paths, which fixes this issue.
Trampoline routing uses layered onions to trustlessly and privately offload
the calculation of parts of a payment route to remote trampoline nodes.

A normal onion contains a smaller onion for the last hop of the route, and
that smaller onion contains routing information about the next trampoline hop.

Intermediate trampoline nodes "fill the gap" by finding a route to the next
trampoline node, and sending it the peeled trampoline onion, until that
reaches the final destination.
@t-bast
Copy link
Collaborator Author

t-bast commented Jul 29, 2024

@arik-so @valentinewallace I have added a draft spec of trampoline payments to blinded paths in 296ce21. It is probably a bit confusing because it depends on Bolt 12 types that are defined in the offers PR, and I should more explicitly spell out the requirement of which tlvs are included where, but combined with the discussions we had in this comment, we should be able to understand each other.

I'm particularly looking for feedback on the shared secret extension I'm using for the recipient trampoline payload to include an ECDH with the invoice_request's payer_id, let me know what you think (see @TheBlueMatt's comment for more context).

Once we have a good enough rough consensus, I'll spend some time rebasing this PR, I'll re-write the requirements in a way that is similar to what was done in #1181 and more precise that what I've currently done, and I'll finalize the test vector.

@arik-so
Copy link

arik-so commented Jul 29, 2024

Awesome, thank you so much! Will update corresponding test vectors ASAP!

t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Aug 7, 2024
We previously supported having multiple channels with our peer, because
we didn't yet support splicing. Now that we support splicing, we always
have at most one active channel with our peer. This lets us simplify
greatly the outgoing payment state machine: payments are always made
with a single outgoing HTLC instead of potentially multiple HTLCs (MPP).

We don't need any kind of path-finding: we simply need to check the
balance of our active channel, if any.

We may introduce support for connecting to multiple peers in the future.
When that happens, we will still have a single active channel per peer,
but we may allow splitting outgoing payments across our peers. We will
need to re-work the outgoing payment state machine when this happens,
but it is too early to support this now anyway.

This refactoring makes it easier to create payment onion, by creating
the trampoline onion *and* the outer onion in the same function call.
This will make it simpler to migrate to the version of trampoline
that is currently specified in lightning/bolts#836
where some fields will be included in the payment onion instead of the
trampoline onion.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Aug 7, 2024
We previously supported having multiple channels with our peer, because
we didn't yet support splicing. Now that we support splicing, we always
have at most one active channel with our peer. This lets us simplify
greatly the outgoing payment state machine: payments are always made
with a single outgoing HTLC instead of potentially multiple HTLCs (MPP).

We don't need any kind of path-finding: we simply need to check the
balance of our active channel, if any.

We may introduce support for connecting to multiple peers in the future.
When that happens, we will still have a single active channel per peer,
but we may allow splitting outgoing payments across our peers. We will
need to re-work the outgoing payment state machine when this happens,
but it is too early to support this now anyway.

This refactoring makes it easier to create payment onion, by creating
the trampoline onion *and* the outer onion in the same function call.
This will make it simpler to migrate to the version of trampoline
that is currently specified in lightning/bolts#836
where some fields will be included in the payment onion instead of the
trampoline onion.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Aug 8, 2024
We previously supported having multiple channels with our peer, because
we didn't yet support splicing. Now that we support splicing, we always
have at most one active channel with our peer. This lets us simplify
greatly the outgoing payment state machine: payments are always made
with a single outgoing HTLC instead of potentially multiple HTLCs (MPP).

We don't need any kind of path-finding: we simply need to check the
balance of our active channel, if any.

We may introduce support for connecting to multiple peers in the future.
When that happens, we will still have a single active channel per peer,
but we may allow splitting outgoing payments across our peers. We will
need to re-work the outgoing payment state machine when this happens,
but it is too early to support this now anyway.

This refactoring makes it easier to create payment onion, by creating
the trampoline onion *and* the outer onion in the same function call.
This will make it simpler to migrate to the version of trampoline
that is currently specified in lightning/bolts#836
where some fields will be included in the payment onion instead of the
trampoline onion.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Aug 9, 2024
We update the trampoline feature to match the official specification
from lightning/bolts#836.

We remove support for the previous version of trampoline, which means
that when paying nodes that use the experimental version, we will use
the trampoline-to-non-trampoline flow instead. Similarly, when older
nodes pay updated nodes, they won't understand the new trampoline
feature bit and will use the trampoline-to-non-trampoline flow.

We update the trampoline-to-non-trampoline flow to remove the unused
trampoline payload in the onion, which saves some space. Note that we
don't want to officially specify this scenario, as it leaks some data
about the recipient to the trampoline node. We rather wait for nodes
to either support trampoline or blinded paths, which fixes this issue.
When paying a Bolt 12 invoice, the payer may use a trampoline node to
relay that payment. The payer simply includes some of the blinded paths
in the onion payload for the trampoline node, who will relay to those
blinded paths. The trampoline node doesn't learn anything about the
final recipient.

We only support using a single trampoline node, because we must provide
the blinded paths in the outer onion, instead of the trampoline onion.
If we included them in the trampoline onion, the trampoline node would
not have enough space in the outer onion to correctly relay the payment.

If the recipient supports trampoline and the `invoice_request` contains
the trampoline feature bit, the recipient may set it in its invoice. In
that case, the sender can include a trampoline onion to provide custom
TLVs to the recipient. We prevent the trampoline node from replacing
that onion with one that it created by using a shared secret created
from the `invoice_request` to authenticate that onion.

Note that this commit depends on Bolt 12: it references a few types that
are introduced in various commits related to Bolt 12 (e.g. `blinded_path`
and `blinded_payinfo`), which can be confusing since Bolt 12 spec is
still in progress. I will clean this up once Bolt 12 is finalized.
@t-bast
Copy link
Collaborator Author

t-bast commented Aug 9, 2024

@arik-so I slightly changed the last commit:

  • I'm now providing complete test vectors for trampoline to blinded paths (with and without a trampoline onion for the recipient)
  • I changed the invoice_request shared secret mechanism: instead of modifying the derivation of rho and mu, I use this shared secret in the HMAC's associated data for the final hop: this is simpler and avoids changing the internals of Sphinx

@t-bast
Copy link
Collaborator Author

t-bast commented Aug 12, 2024

There is one point that is worth discussing here: do we want to include an extension for Bolt 11 to introduce trampoline routing hints? If we want to be able to make non-blinded trampoline payments between wallets that have only private channels (e.g. Alice -> LSPA -> ... -> LSPB -> Bob), I see two possible options.

The first one is that we don't introduce any new routing hint, but if the invoice contains the (optional) trampoline feature bit, Alice assumes that nodes included in Bob's routing hints support trampoline and uses them as such. In the example above, Alice would use LSPA as her trampoline node and LSPB as Bob's trampoline node, and assume that LSPB is able to route to Bob. This option is nice because it doesn't increase QR code size, but when receiving wallets provide multi-hop routing hints, Alice may get that heuristic wrong. In practice I don't think this is an issue, in the worst case Alice could make several attempts where she assumes that different nodes in the routing hints support trampoline, until one works.

The second option is that we introduce a trampoline routing hint field to Bolt 11 invoices, that are very similar to routing hints but don't include a short_channel_id. This will be ignored by existing nodes that don't support trampoline. The issue with that option is that it increases QR code size, because the invoice must work for non-trampoline senders as well, so it will include both the normal routing hints and the trampoline hints.

Thoughts?

@TheBlueMatt
Copy link
Collaborator

On our end we only care about trampoline with blinded path destinations, really. I don't see much reason to care about trying to add a second trampoline hop of the recipient's LSP when they aren't using blinding, if the sender wants privacy from their trampoline hop(s), they should just add more trampoline hops!

t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Sep 16, 2024
We previously supported having multiple channels with our peer, because
we didn't yet support splicing. Now that we support splicing, we always
have at most one active channel with our peer. This lets us simplify
greatly the outgoing payment state machine: payments are always made
with a single outgoing HTLC instead of potentially multiple HTLCs (MPP).

We don't need any kind of path-finding: we simply need to check the
balance of our active channel, if any.

We may introduce support for connecting to multiple peers in the future.
When that happens, we will still have a single active channel per peer,
but we may allow splitting outgoing payments across our peers. We will
need to re-work the outgoing payment state machine when this happens,
but it is too early to support this now anyway.

This refactoring makes it easier to create payment onion, by creating
the trampoline onion *and* the outer onion in the same function call.
This will make it simpler to migrate to the version of trampoline
that is currently specified in lightning/bolts#836
where some fields will be included in the payment onion instead of the
trampoline onion.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Sep 18, 2024
We update the trampoline feature to match the official specification
from lightning/bolts#836.

We remove support for the previous version of trampoline, which means
that when paying nodes that use the experimental version, we will use
the trampoline-to-non-trampoline flow instead. Similarly, when older
nodes pay updated nodes, they won't understand the new trampoline
feature bit and will use the trampoline-to-non-trampoline flow.

We update the trampoline-to-non-trampoline flow to remove the unused
trampoline payload in the onion, which saves some space. Note that we
don't want to officially specify this scenario, as it leaks some data
about the recipient to the trampoline node. We rather wait for nodes
to either support trampoline or blinded paths, which fixes this issue.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Sep 18, 2024
We update our trampoline payments to blinded paths to match the official
specification from lightning/bolts#836.

The blinded paths and recipient features are included in the trampoline
onion, which potentially allows using multiple trampoline hops.

That was already what we were doing with experimental TLVs, so we simply
update the TLV values to match the spec values.
t-bast added a commit to ACINQ/lightning-kmp that referenced this pull request Sep 18, 2024
We add the ability to pay recipients that support trampoline *and*
blinded paths. We include the blinded path data in the trampoline
payloads for each node inside the blinded path. This doesn't reveal
unnecessary information to the trampoline node: this is specified in
details in lightning/bolts#836.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants