-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
draft: HTLC Endorsement to Mitigate Channel Jamming #1071
base: master
Are you sure you want to change the base?
draft: HTLC Endorsement to Mitigate Channel Jamming #1071
Conversation
591e524
to
db043e9
Compare
ec6eb65
to
a7075f7
Compare
04-onion-routing.md
Outdated
|
||
### Rationale | ||
If a HTLC is endorsed by a peer they have signaled that they expect the HTLC | ||
to resolve honestly, so will be held accountable for the manner in which they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this again now, it makes me think of https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-February/003842.html. But that was of course for the sender.
04-onion-routing.md
Outdated
@@ -1407,6 +1438,88 @@ The _origin node_: | |||
- MAY use the data specified in the various failure types for debugging | |||
purposes. | |||
|
|||
## Recommendations for Reputation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is helpful, makes it much more concrete what to think of when talking about a reputation system in the context of a routing node.
a7075f7
to
6e221f8
Compare
04-onion-routing.md
Outdated
for `resolution_time` incurred. | ||
- `fees`: the fees paid by a forwarded HTLC (as described in [BOLT #7](07-routing-gossip.md#htlc-fees), | ||
equal to 0 if the HTLC was not fulfilled). | ||
- `opportunity_cost`: `ceil ( (resolution_time - resolution_period) / resolution_period) * fees` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a worst-case resolution_period
which is by far larger than resolution_time
I can create a big enough negative opportunity_cost
along a route.
Given that LN payments unfold over a big route, if I control the source & destination of a payment I can decide for how long to hold the payment for.
Given this layout
A -- B -- C -- D
- I control
A
&D
B
andC
are some very big routing nodes
We consider the past relationship of B
and C
to be very good (i.e when B
forwards to C
then they will be endorsed because they have accumulated a lot of effective_fees
over the window of interest)
I can follow these steps:
- I make a few good payments from
A
toD
until I notice thatendorsed
is turned on (effectively meaning thatB
now endorses me) - Given access to the endorsed slots, I now create one or multiple payments from
A
toD
that upon success would pay a big amount offees
. - After I receive the
endorsed
HTLCs (that correspond to the above payments) onD
I hold on to them for as long as possible by not releasing the preimage. - Just before the CLTV timeout I fail the payment.
A -- B
,B -- C
,C -- D
all damage their local reputation byfees - opportunity_cost
, whereopportunity_cost
can be a big enough multiple offees
.
If the above flow is feasible, then a node that just earned the endorsed
flag by their peer can now cause them reputation damage by far greater than what it cost them (in fees) to earn that flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To mitigate this possible attack, perhaps opportunity_cost
formula can have an upper limit?
Something that can still allow it to create damage on effective_fees
(in order to maintain the earn slow / lose fast attribute) but not large enough to cause significant damage on other links further into the route, links with much stronger reputation relationships.
2600b6f
to
9b97e28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be reduced to two variables per (outgoing) channel: reputation and exhaustion cost.
So, if incoming endorsed
and outgoing reputation is greater than exhaustion cost of channel*:
- outgoing endorsed = 1
Otherwise: - outgoing endorsed = 0
If outgoing endorsed is 0:
- reduce the effective max_htlc_value_in_flight_msat and max_accepted_htlcs of the outgoing channel by 50% for purposes of this htlc.
Otherwise: - reduce outgoing reputation by
fee
* (cltv_expiry
- current block height) * 600, - Record the start time of the HTLC.
When the HTLC is resolved:
- Record the end time of the HTLC.
- If outgoing endorsed was 0:
- If the HTLC was successful, and the end time - start time was less than 60 seconds
- Increase the outgoing reputation by 50% of the htlc fee
- If the HTLC was successful, and the end time - start time was less than 60 seconds
- Otherwise (endorsed = 1):
- Increase the reputation by
fee
* (cltv_expiry
- current block height) * 600 - If the end time - start time < 60 seconds and the HTLC was successful
- Increase reputation by
(end time - start time - 60) / 60) * fee
- Increase reputation by
- Increase the reputation by
Every 1 days (or X blocks?)
- Reduce all outgoing reputations by 1% (? depends on how long you're aiming for, see below)
- I didn't see this clearly spelled out in the draft, but on the call you used these terms. Ideally, it's "how much money did this channel make in the last max-ctlv-delta-allowed blocks", but practically it's probably a decaying average:
- If HTLC is successful:
- Add fee to exhaustion cost of outgoing channel
- Every block:
- Multiply the exhaustion cost of the outgoing channel by
1 - (1 - n)^(1/n)
wheren
is the max ctlv-delta you allow. - (ChatGPT tells me that's how you calculate the exp decay factor, but I haven't run tests to check...)
- Multiply the exhaustion cost of the outgoing channel by
04-onion-routing.md
Outdated
`max_accepted_htlcs`. | ||
- MUST choose `unknown_allocation_liquidity` <= the remote channel peer's | ||
`max_htlc_value_in_flight_msat`. | ||
- If `endorsed` is set to 1 in the incoming `update_add_htlc` AND the HTLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably explicitly allow (and ignore!) the other bits for future use. So this should be:
"If endorsed
is non-zero in the incoming..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason the endorsement is defined as a binary value, rather than a real number [0, 1], which may be more aptly called a reputation_weight
or confidence_score
?
While node’s may choose to use simpler implementations to begin (i.e binary signals and bucketed HTLCs), encoding more information in this field would allow for more precise reputation algorithms to develop independently over time without requiring a protocol change. For example, consider a long path constructed by a malicious actor:
A -> B -> … -> Y -> Z
Using only a binary signal, HTLC endorsement might get propagated through every hop. But if this is any real number, the confidence score can naturally decline and the attacker stands to lose the most reputation.
04-onion-routing.md
Outdated
Peers build reputation by forwarding successful HTLCs that resolve quickly, and | ||
lose reputation if they endorse failing or slow-resolving HTLCs. Reputation is | ||
only _negatively_ affected if an endorsed HTLC resolves undesirably, to hold | ||
nodes accountable for their endorsement signal while still allowing them to | ||
forward unendorsed HTLCs that they are not certain about. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this statement is true, because of the following scenario (let me know if I'm missing the point entirely):
- A wants to send a payment to D: A -> B -> C -> D
- the A -> B and B -> C channels are empty: A and B both endorse the HTLC
- the C -> D channel is full (because of payments coming from unrelated nodes, e.g. E -> C -> D)
- when C receives the HTLC, even it's fully endorsed and reputable, C has to fail the payment
- when B receives the failure, B will then decrease A's reputation
Is the negative impact on A an issue? Is it something that can be abused? Can we do something about it (should we)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick failure will not harm your reputation.
Only slow resolving endorsed HTLC can harm your reputation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I believe that's one of the main differences between @thomash-acinq's proposal and this one, it's hard to evaluate which is the right choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments/questions:
- if I'm an honest new node and the network is jammed, I think you can have a trust-based pay-for-endorsement scheme so they could send payments. I don't think it needs to be described here though
- if I have the topology A---B---C it seems like C can grief A's reputation with B?:
- A and B are both honest, C is malicious
- A sends an endorsed payment through B to C
- C holds the payment for as long as possible and then fails it back
- B punishes A for sending this endorsed payment that turned out to be jam-like
Hi, I find this method of mitigating HTLC jamming quite interesting, however, I have one question. Previously a node could achieve a higher payment success rate if it had more channels to more nodes in the network and it would possibly achieve more privacy if it utilized different nodes to route its payments through. |
04-onion-routing.md
Outdated
which capture the fees that it paid and the opportunity cost that holding it | ||
for `resolution_time` incurred. | ||
- `fees`: the fees paid by a forwarded HTLC (as described in [BOLT #7](07-routing-gossip.md#htlc-fees), | ||
equal to 0 if the HTLC was not fulfilled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm not understanding something. If fees
are equal to zero for unfulfilled HTLCs, then it means that opportunity_cost
is also zero. Does this mean failed HTLCs won't result in losing reputation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For zero fees a default ppm
will be assumed (100ppm?) in order to bypass this.
Also, if we're talking about a fast failing HTLC then even if we have fees the loss is going to be zero (can be seen on opportunity_cost
formula)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that for when the payment is fulfilled but fees sent by the sender are zero? Or for the case mentioned here equal to 0 if the HTLC was not fulfilled
?
A lot of the discussion revolves around the specific reputation scheme proposed here, however I don't think that this should be part of bolts which only describe rules for communication between peers. While it is crucial to find a good way to compute reputation, this topic is already discussed elsewhere (mailing list, meetings), we should focus here on the actual spec change: a way to signal to the next node how confident we are that this HTLC will succeed. The questions that need answering here are:
I personally think that it is useful to transmit our confidence to the next peer and that the more precision we give, the more useful it is. However too much precision could be a privacy leak (if you receive two HTLCs with the same confidence, it probably means that they followed the same path and came from the same sender) so I think that having 8 confidence buckets (3 bits of information) would be a good compromise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think that resource bucketing can make sense as an MVP for how to interpret the endorsement mechanic laid out in BOLT2, I find myself resistant to this being in the main BOLT sections. Even with the designation of "MAY", I think this is better suited to be an extension BOLT or perhaps even a BLIP.
The reason for this is that you state in the proposal that reputation is a local phenomenon. Each node not only gets to make a decision for how to measure reputation and how to update the priors based on activity, but also probably ought to be free to select among a nigh infinite number of slot/liquidity allocation strategies between endorsed and unendorsed HTLCs.
In a prior conversation you had explained that the endorsement mechanic requires a strategy for how that endorsement can be used to mitigate jamming to demonstrate the utility of endorsements at all. I agree with this assessment, but I still find the particular strategy in the proposal to be lacking (more on that below). However, there's nothing intrinsically broken about the resource bucketing strategy you present, it just is probably far more rudimentary than a mature deployment of this would look like.
Further, because this decision is ultimately a local one and the notion of reputation is also local, no matter what strategy you present here, even if it's one that all of us are delighted by, we should expect nodes on the network to experiment and deploy their own solutions. Because they can, and because better solutions to this problem will yield better risk-adjusted returns, we can also expect that a portion of these strategies will be proprietary as well.
Considering all of these factors, I think it is more appropriate to consider the resource allocation strategy as a recommendation, and should probably be placed into an appendix of some kind, be it an extension BOLT, a BLIP or otherwise. I could be misunderstanding the scope and responsibilities of the main BOLTs but if I was trying to bootstrap another implementation, I would be required to understand the endorsement specification to be compatible with the rest of the network, but I would have no need whatsoever to implement the exact reputation and resource allocation strategies to remain compatible with the network.
With the more organizational critiques out of the way, I will motivate where I'm coming from with my exact concerns with resource bucketing in general. All resources in economics have marginal value, meaning that each additional unit of that resource you consume costs more than the last one (in real terms). The resources you've identified here (slots and liquidity) are no different. As a result, the "real cost" of allocating the last slot or sat of liquidity is greater than the first. There is no way to set the parameters described in this section to accurately model this phenomenon. What if I want to have the required reputation for forwarding increase as my available resources decreases?
Ok, that's the cost/risk side of the equation, but what about the benefit/reward?
As a routing node, every decision to accept an HTLC is essentially granting an option on a liquidity trade that trades liquidity on the downstream link for liquidity on the upstream link, with a probable increase in the total liquidity (taken as fees). The jamming problem represents a risk in being able to execute that trade to completion. In any scenario where we are taking risks for potential benefits, it makes little sense to analyze the risks (which this proposal does with the endorsement mechanic and reputation recommendations) without also considering the potential benefits. This proposal ignores the potential benefits of forwarding an HTLC (chiefly the fees).
It can make sense for me as a node operator to let a node with lower reputation offer an HTLC forward with a large fee, when I'd be hesitant to do so at a lower fee. Similar to the way that higher interest rates are charged for borrowers with lower credit scores, we need not deny a forwarding request simply because the upstream link doesn't have the reputation we'd want.
So to summarize my criticisms of the resource bucketing strategy, it comes down to two things: 1. It does not account for the continuously variable nature of the costs of offering the slots/sats, 2. It does not account for the potential benefits of forwarding the HTLC. That said, I don't think it's reasonable to require an airtight algorithm that takes these things into account for the endorsement mechanic to be a useful improvement to the status quo. I also don't have any issue with large swaths of the network deploying this strategy and seeing if it improves jamming incidence rates. Despite its incompleteness in modeling the incentives of the operator, it may be a dramatic improvement over today, I don't know. However, because of this incompleteness, I don't think it should be in the part of the spec that I view to be required for interop with the rest of the network.
Agreed.
That's very dangerous as an attacker can trivially exploit this: they just need to offer very high fees to compensate for their bad reputation (it doesn't cost them anything because they don't intend to actually pay the fees, they will just fail the HTLC).
That's only a limitation of this specific algorithm to assign reputation, which as you said should not be part of the spec. However even when using a continuous reputation scheme, the binary endorsement forces you to discretize to 0 or 1. That's why I'm suggesting to replace the binary endorsement with a confidence value on 3 bits. A fully continuous value could be a privacy leak but I think that 3 bits is a good balance between the 1 bit of this proposal and a fully continuous value. |
This is far from a trivial exploit. It is already the case that the attacker has no way to know what their reputation is with respect to their peers. For them to be able to exploit it, they would need to know what your threshold for endorsement is, which isn't a publicly knowable thing. Additionally, even while offering high fees for offered HTLCs does not guarantee the loss of those sats, it is still a capital outlay requirement that can reduce the reach of these attacks as well as well as reduces the attacker's bandwidth to accomplish them. That said, I'd imagine the reduction in effectiveness of the attack as a result of this increased cost is probably marginal at best, but this was also not suggested as a security scheme, I was simply pointing out that we cannot ignore the reward side of the incentive scheme when considering a node operator's interests.
I actually think that this is a good thing. By forcing nodes to make a decision between 0 or 1 at the protocol level, you force the inputs to that decision to be a private matter, which ultimately it is. The node operator can either choose to tie its reputation to an HTLC or not.
I think that this convolutes things in a way that conceals the real dynamic in play. It is not the role of the endorser to "proxy" the reputation of its peers. The role of the endorser is to tie its own reputation to the HTLC it is offering. It is hard to understand how else to interpret the endorsement mechanic if it is allowed to have any more than 1 bit of signaling. Let's say we have 3 bits as you suggest, what happens if I endorse it to a level of 001 (000 being lowest and 111 being highest), and then the HTLC fails? What if the HTLC succeeds? What is my peer even trying to tell me when it gives a "partial endorsement"? The other issue with a continuous value is that it can basically be used as a measurement for how close to the payment source you are. Why would I endorse someone else's HTLC at a higher level than the upstream link did? Why wouldn't I ever endorse my own HTLC as 111? Ultimately I believe the forced discretization of the endorsement is a good thing. In fact I believe that simply specifying that and having some discussion and recommendations around possible ways of interpreting endorsement (or non-endorsement), is enough for this proposal to be self-justifying and complete. I believe that the specifics of how to measure reputation and how to allocate HTLC slots/sats based off of reputation is beyond the scope of what this specification should offer. Very often when we provide libraries we may also provide code examples to demonstrate how to use it, and I believe the resource bucketing scheme and ideas on how to measure and update reputation should not be viewed as anything more significant than a spec level code example. Compliance with these suggested schemes is neither enforceable nor can we expect nodes to adopt the same behaviors, so it really ought to be considered as a demo use of that endorsement bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK 9b97e28
The problem is so big that a deep evaluation of the solution is a lot time-consuming for people not daily thinking about this problem (at least for me), so I like the approach to start with the peer's reputation and see how the network reacts to this.
So, I will ack the approach and start to write some code for this in cln and then evaluate some real data. I guess the more difficult part here is the reputation algorithm
P.S': Some small nits found while reading have been reported.
P.S'': I agree that the reputation should be separate from the BOL. I have started the lnmetrics.rfc for this particular reason.
04-onion-routing.md
Outdated
HTLC resolution time is assessed relative to a threshold that the node | ||
considers to be a reasonable amount of time for a HTLC to resolve: | ||
- `resolution_period`: the amount of time a HTLC is allowed to resolve in that | ||
is classified as "good" behavior, expressed in seconds (default: 60 seconds). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should mention why 60 seconds, iirc this is the default mpp timeout! We should report it there
04-onion-routing.md
Outdated
successful, fast resolving HTLCs during the `resolution_time` the HTLC was | ||
locked in the channel. | ||
|
||
For every resolved incoming HLTC a peer has forwarded through a node, its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every resolved incoming HLTC a peer has forwarded through a node, its | |
For every resolved incoming HTLC a peer has forwarded through a node, its |
nit, there are other few around the doc
@@ -995,7 +995,10 @@ is destined, is described in [BOLT #4](04-onion-routing.md). | |||
1. type: 0 (`blinding_point`) | |||
2. data: | |||
* [`point`:`blinding`] | |||
|
|||
1. type: 1 (`endorsed`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. type: 1 (`endorsed`) | |
1. type: 3 (`endorsed`) |
Can we move this to a new optional/required pair (type 2/3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good work, and clearly a lot of thought has been put into the reputation algorithm.
I've found a couple weaknesses in the current design, which hopefully we can fix to make the mitigation more robust.
Hodling for fun and profit
By design, reputation is costly to build but easily lost. In particular, HTLCs that take longer than 90s to resolve will decrease reputations of any nodes that endorsed them. And for every additional 90s it takes to resolve the HTLC, reputations are decreased further.
So any node on the network that receives regular HTLC traffic can hodl HTLCs to destroy reputations of the upstream nodes.
Attack scenarios
Routing nodes, merchants, and LSPs on the network can exploit this weakness to destroy reputations of their competitors, essentially for free. Once reputations have been sufficiently destroyed, the competitors' channels can then also be jammed for ~0 cost.
A simple attack scenario could look like this:
- EviLSP and HonestLSP are LSPs competing with each other. Both LSPs run lightning nodes that are well connected with the rest of the network, and the LSPs also have a direct channel with each other.
- EviLSP starts to hodl all high-value HTLCs coming from HonestLSP. Just before the HTLCs approach their expiry, EviLSP forwards them on to the next node.
- As HTLCs that have been in flight for hours start to settle, HonestLSP rapidly slashes the reputation scores of all its upstream channel peers.
- EviLSP uses another lightning node to jam all of HonestLSP's channels.
- In followup PR, EviLSP claims their node had a temporary glitch causing delayed processing but that everything is fine now and at least their service is working better than HonestLSP. HonestLSP's users start switching to EviLSP.
Alternatively, EviLSP could be sneakier:
- EviLSP occasionally hodls HTLCs forwarded to them from HonestLSP. The hodl frequency and duration is set high enough to have a negative influence in the reputation algorithm but low enough to not raise HonestLSP's suspicion.
- After a few days or weeks, HonestLSP has slowly decreased the reputation scores of its upstream channel peers and no longer allows those peers to access its privileged slots.
- EviLSP uses another lightning node to jam all of HonestLSP's channels.
Mitigation
I haven't come up with any great ideas to mitigate this weakness. Hopefully we can get more people thinking about this problem and potential solutions.
Reputation multiplier effect
Because in-flight risk is calculated separately for each pair of incoming and outgoing channels, an attacker can exploit network topology to cause more jamming damage than they paid for while gaining reputation. See inline comment for more details.
Mitigation
If in-flight risk is calculated per incoming channel only (ignoring the outgoing channel), or simply per upstream node (which makes sense when multiple channels exist between two nodes), then the multiplier effect disappears.
Add an endorsement field to allow nodes to signal whether a HTLC is expected to resolve quickly or is unknown to the forwarder. The addition of this field allows for the introduction of local reputation tracking that still allows new and unknown entrants access to resources.
39f3c99
to
80dba2c
Compare
FYI - I believe this vector of attack for 3-party topology of lightning nodes (HonestLSP <-> EvilLSP <-> upstream peers) and See the email thread "Hold fee rates as DoS protection (channel spamming and jamming)" were long-delay applications such as atomic onchain / offchain swaps (e.g lightning loops) are mentioned, and how a time-independent hold feerate has been already suggested as a mitigation.
I believe the downside of aggregated reputation for a N number of incoming channels, wherever they're associated to a unique lightning node or not have already been considered in the past, with the controlled or uncontrolled scenarios. See the digest post "Channel Jamming" documentation on bitcoin-problems made by one of the author of the "Unjamming Lightning" paper, from which I believe this draft is partially inspired. I think even more sneakier hodling for fun and profit style of exploitation is leveraging multi-path payment and the fact that there are gossiped |
We define the following parameters: | ||
* `resolution_period`: the amount of time a HTLC is allowed to resolve in that | ||
classifies as "good" behavior, expressed in seconds. The recommended default | ||
is 90 seconds (given that the protocol allows for a 60 second MPP timeout). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to have an additional indication on the definition of the clock used to tick the resolution_period
second and from which all the opportunity_cost
computation are scaled on.
E.g, the Epoch which is defined on unix systems as "1970-01-01 00:00:00 +0000 (UTC), it's not a perfect clock though it's better than nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to be re-evaluated in light of finding here: #1071 (comment)
Thanks for the detailed review + writeup @morehouse 🙏
This is possible because our reputation algorithm only accounts for the risk of an incoming peer jamming our outgoing channel, which is clearly insufficient to cover downstream attacks like this. Pretty nasty when the attacker doesn't need to send any payments themselves. It seems reasonable to reverse this logic to consider reputation in both directions when we receive a HTLC:
We're currently looking into this and running some experiments, aiming to give some more meaningful analysis of how bi-directional reputation works for attacks like this (be)for(e) the summit. A few things that came up while discussing this attack which are tempting but probably not helpful:
Addressed in latest push (spec was outdated), thanks for flagging! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed at 80dba2c, including lrc's implem at e87ae62.
It's a dense proposal, and overall quite solid.
even if we lived in a world where senders penalize nodes that hold HTLCs for too long, the attacker can still be successful if they can fool many senders just once.
+1, No global clock in lightning to get reliable attributable errors / latency aware routing.
My instinct is that any monetary solution would price out honest users far before we compensate the node targeted.
On-chain fees paid by honest users to open chan could be used as part of the target node compensation. But it's a more complicated story...
by high reputation nodes. | ||
|
||
Sequence: | ||
* The `update_add_htlc` is sent by an upstream peer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
update_add_htlc
is sent by an upstream peer.
Minor - If the receiving peer is also the HTLC recipient, the reputation
algorithm could halt here. Unless the HTLC preimage is unknown to the recipient ?
Normally there is only keysend
payments, as such types of HTLCs.
I don't know if there should be a mention in the "Local Reputation" subsection,
that the reputation algorithm MAY NOT be run if the payment is final and there is
no slow jamming risk. It could be still worthy to score up the reputation of the
sending peer.
* The corresponding outgoing HTLC (if present) will be forwarded with | ||
`endorsed` set to `1`. | ||
* Otherwise: | ||
* The HTLC will be limited to the remaining "general" slots and liquidity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"* The HTLC will be limited to the remaining "general" slots and liquidity,
and will be failed if there are no resources remaining in this bucket."
While immediate failure is an option for the local routing node, an economically
rational one can rather halt the processing and wait for the local ressources
allocated to downstream channels to free up, and then process up the non-endorsed
HTLC.
There is no interactivity needed with the upstream peer after the commitment_signed
have been exchanged. The "halting time" per the local routing node measurement can
be subtracted from the CLTV delta difference between the outgoing_cltv_value
and
the current chain height.
In a world where there is economical competition among the routing node, why
the receiving peer would reject a HTLC for free ? It can be more rational to
bet and pockets in the fee_base_msat
and fee_proportional_millionths
. You
might offer back to the other routing peer that can fulfill this HTLC forwarding
request, high quality economic traffic.
Especially, if the peer does not meet sufficient local reputation, while not
all the protected_slot_count
are occupied, there is no economic sense to mark
an endorsed or non-endorsed HTLC as "general".
One suggestion could be rather to document that a receiving peer can support some
clawback where a HTLC can be upgraded from "general" to protected_slot_count
as
some implementation policy. After reading "Resource Bucketing" subsection, and lrc's
addHTLC
, I don't see that mentioned or implemented, if it's a worthy concern.
no need for use of protected resources as channels are not saturated during | ||
regular operation. Should the network come under attack, honest nodes that | ||
have built up reputation over time will still be able to utilize protected | ||
resources to process payments in the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the network come under attack, honest nodes that
have built up reputation over time will still be able to utilize protected
resources to process payments in the network.
Minor - There could be a succinct mention of how the local resource conservation
system behave for honest routing nodes boostrapping their HTLC forwarding in the
network. Such nodes might not have accumulated enough reputation during a steady
state to leverage it in face of slow jamming attack. It can be worthy to be sure
the system works smoothly for marginal peers added to the network topology.
Especially, if there are some negative events happening deeper in the stack of
the routing nodes (e.g a cloud center being taking down by a tsunami). Some
segments of the channels topology could have to be substituted on a short period
of time. The whole graph seen with by concatenating node_announcement
and
channel_announcement
by not be a bijection with the infrastructure.
resources to and signal endorsement of a HTLC on the outgoing channel. Nodes MAY | ||
use any metric of their choosing to classify a peer as having sufficient | ||
reputation, though a poor choice of reputation scoring metric may affect their | ||
reputation with their downstream peers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes MAY use any metric of their choosing to classify a peer as having sufficient
reputation, though a poor choice of reputation scoring metric may affect their
reputation with their downstream peers.
For the implementators, maybe few examples of scoring metrics could be given:
- reliability: the uptime of the sending peer in the channel
- success rate: success rates versus number of failures
- profit / volume: based on earned fees and total amount moved through the channel
- utility: that one is blurred...opportuntiy cost ? e.g if off-chain fees have been paid to open the chan
See that presentation about "Lightning network topology, its creation and maintenance
from which the above metrics are inspired from.
Maybe it could be added in this document or in a blip. I know there is the idea of what
is "observable within the protocol" described latter in this subsection, yet even for
the forwarding fees, this is a hard problem. The upstream peer has no visibility on
the local routing node's difference between amount_msat
and amt_to_forward
, if
onion encryption holds.
is granted. This algorithm uses forwarding fees to measure damage, as this value | ||
is observable within the protocol. It is reasonable to expect an adversary to | ||
"about turn" - to behave perfectly to build up reputation, then alter their | ||
behavior to abuse it. For this reason, in-flight HTLCs have a temporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this reason, in-flight HTLCs have a temporary negative impact on reputation until
they are resolved.
Minor - This sentence, or the idea behind it, I don't get it.
Let's consider a simple slow jamming, where an attacker forwards a bunch of spammy
HTLCs, which are never resolved successfully and a blank state where no reputation
have been accumulated in the reputationTracker
.
At the moment of reception of the update_add_htlc
, there is no informational state
by the receiving peer on how the HTLC is resolved, either by a success or a failure.
Marking that in-flight HTLC as having a negative impact on the channel or peer reputation
could lead to reject another concurrent in-flight HTLC, and a fees gain. The final state
of those 2 in-flight HTLCs could be a success. However at the time of processing among
upstream and downstream, the target node has no medium to predict in a deterministic
fashion the HTLC resolution.
Unless it is suggested that a target node should limit the max number of in-flight HTLC
originating from a single upstream peer ? I don't see that idea mentioned more neither
in the "Resource Bucketing" subsection or "Local Reputation" subsection. It could be
interesting as some kind of implementation policy to limit worst-case damage from a
single peer. E.g one that would have build up a high IncomingReputation
, and
then suddenly engage in a slow jamming on the target node.
Maybe, this could be described in a blip or another document as an implementation policy.
remote peer's `max_accepted_htlcs`). | ||
* `protected_liquidity_portion`: defines the portion of liquidity that is | ||
reserved for endorsed HTLCs from peers with sufficient reputation (default: | ||
0.5). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected_liquidity_portion
: defines the portion of liquidity that is
reserved for endorsed HTLCs from peers with sufficient reputation (default: 0.5).
Minor minor - The base of the 0.5
could be precised if it's the funding utxo
amount, from the htlc_minimum_msat
, with or without the channel reserve.
* SHOULD reduce the remote peer's `max_accepted_htlcs` by | ||
`protected_slot_count` for the purposes of the proposed HTLC. | ||
* SHOULD reduce the `max_htlc_value_in_flight` by | ||
`protected_liquidity_portion` * `max_htlc_value_in_flight`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- SHOULD reduce the
max_htlc_value_in_flight
by
protected_liquidity_portion
*max_htlc_value_in_flight
.
See comment above HTLC on-the-fly upgrade from "general" to "protected". This
could be the kind of situation where a high-value HTLC paying good routing fees
is rejected from forward on outgoing channel, because the HTLC outgoing_htlc_value
is falling just above protected_liquidity_portion
* max_htlc_value_in_flight
.
Of course, all depend if there is high volume of traffic going through
the target node, and that traffic probabilistically should soon occupied the
protected_slot_liquidity
, or if it's more economically interesting to take
the risk of making an exemption.
One could suggest this part could be better left to be described in another
document, or a blip and have implementation experimenting with that. This
could be too "rigid" for low-volume forwarding nodes and the parameters too
"flexible" for high-volume, topologically well-connected forwarding nodes.
Rolling windows specified in this write up may be implemented as a decaying | ||
average to minimize the amount of data that needs to be stored per-channel. In | ||
flight HTLCs can be accounted for separately to this calculation, as the node | ||
will already have data for these HTLCs available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rolling windows specified in this write up may be implemented as a decaying
average to minimize the amount of data that needs to be stored per-channel. In
flight HTLCs can be accounted for separately to this calculation, as the node
will already have data for these HTLCs available.
Minor - I believe the overall "Local Resource Conversartion" proposal would gain to
have the implementation notes dried up in its own document or blip, including
some magic values that are referenced in other subsections (e.g the 10 for the
incoming_channel_multiplier
defining the rolling window).
Without thoughts really on the decaying average, there could be implementation
alternative such as taking all the HTLCs points since the channel opening and
periodically re-evaluating their score according to the on-chain fees, as one
can see in the blocks, or the total HTLC forwarding traffic that has been through
the target node. Just ideas, more to note the range of rolling windows algorithm
that could be experimented with.
### Bootstrapping Outgoing Channel Revenue | ||
New channels with no revenue history: | ||
* MAY choose not to endorse any HTLCs in their first two weeks of operation | ||
to establish baseline revenue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New channels with no revenue history:
- MAY choose not to endorse any HTLCs in their first two weeks of operation
to establish baseline revenue.
Minor - It would deserve its own blip, especially if some nodes tries altnerative
bootstrapping ideas, e.g modulating the no endorsment period in function of
the peers's number of channel_announcements
.
when assessing reputation. | ||
* MAY consider `outgoing_channel_revenue` for all channels with the outgoing | ||
peer, but SHOULD take care to [bootstrap](#bootstrapping-outgoing-channel-revenue) | ||
new channels so they do not lower the reputation threshold for existing ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- MAY consider
incoming_channel_revenue
across all channels with the peer
when assessing reputation.- MAY consider
outgoing_channel_revenue
for all channels with the outgoing
peer, but SHOULD take care to bootstrap
Minor - ....Hmmmmmm, it could be interesting to adopt the communalized reputation assesment
over many channels from a upstream peer on the HTLC timing. Given their lower and
upper bounded by the height_added
and resolution_window
, a set of slow-jamming
HTLCs might have to fit within the same window.
Especially, it could be useful to prevent sudden spikes of slow-jamming HTLCs to
occupy liquidity / slots, when those slow-jamming are triggered with few hops of
depths in the graph. While not downgrading the forwarding of the upstream peers
the rest of the time. It could be a thing.
This PR introduces an
endorsed
TLV toupdate_add_htlc
as a way for nodes to indicate whether they expect a HTLC to resolve "honestly". Nodes are advised to allocate a limited portion of their outbound liquidity and slots to HTLCs that are not endorsed by peers that they consider to have high reputation.Opening early for discussion on structure, not ready for review - discussions around recommendations for local reputation scoring are ongoing.
Slides for the visually-minded here