BOLT 1: introduce warning messages, reduce requirements to send (hard) errors #834

rustyrussell · 2021-01-20T05:36:43Z

If you really want to have a hard error, set the new fatal field. We can still get upset if this happens during testing, and we can still log when they happen, but not break everything quite so easily!

This follows the suggestion by @t-bast to create a new message (1) "warning". I've implemented this for c-lightning already as a PR.

rustyrussell · 2021-01-20T06:22:01Z

I've started refactoring clightning internally so all our error routines are flagged soft or hard. I've got a hack where soft errors send out msg number 1 (undefined, currently ignored) instead of 17, which makes it more effective with older clightning which treats all error (17) messages as "hard".

Not sure if we should endorse that as a separate "warning" message?

The other approach would be to redefine 17 (error) as a soft error and create a new message (2?) as "hard error"?

t-bast · 2021-01-20T09:49:45Z

Interesting, eclair also defaults to fatal errors and closes channels when they happen, so old eclair will keep treating all errors as fatal regardless of the tlv flag. When the fatal TLV is missing, updated nodes will still want to consider these errors fatal and close the channel.

I think a separate (odd) message would be more appropriate, as it would work out of the box for non-upgraded nodes (they would simply ignore it so won't close a channel). For backwards-compatibility we need to keep the existing message for fatal errors, the new ones should be the non-fatal errors.

rustyrussell · 2021-01-21T00:22:58Z

Yes, though we've already had to except various errors from failing the channel, since LND sends them (e.g. if you are too slow in reestablishing a channel). Probably because LND didn't close on errors, so it didn't treat them as fatal itself.

Let me try a "warning" message instead then...

rustyrussell · 2021-01-21T04:43:09Z

OK, now it's a separate warning message, and I went through and revised and made consistent all the errors/closing/failing language.

t-bast

I think it's a good idea, it will be much simpler to notice issues and have users take unilateral actions without needing to contact their peer to understand what went wrong.

01-messaging.md

03-transactions.md

carlaKC · 2021-01-21T17:24:23Z

Hi all, I started thinking about possible extensions to how the spec deals with errors towards the end of last year, and wanted to drop some thoughts in here.

A string based error is nice for humans to read, but doesn’t give code much of a chance of elegantly reacting. If we’re going to make a change now, I think it would be worth considering the following so that we don’t have to come back again to make further updates:

Error codes: an enum which describes why the error is being sent
Error-specific fields: based on the error code set, could be TLV or a fixed set of fields
Fatal bool: similar to the idea introduced here, indicating whether we need to close the channel.

I don’t want to derail conversation here , so will make my own ML post with details if this sounds reasonable.

t-bast · 2021-01-21T17:52:51Z

I think it could be reasonable for this new warning message (I'm not sure it makes sense for the error message though).

Do you have some concrete examples of what warnings could trigger an automatic action? TBH it feels like the protocols in the spec should have all cases covered to avoid needing it, but there's always a difference between theory and practice.

rustyrussell · 2021-01-25T00:22:37Z

Hi all, I started thinking about possible extensions to how the spec deals with errors towards the end of last year, and wanted to drop some thoughts in here.

A string based error is nice for humans to read, but doesn’t give code much of a chance of elegantly reacting. If we’re going to make a change now, I think it would be worth considering the following so that we don’t have to come back again to make further updates:
* Error codes: an enum which describes why the error is being sent

* Error-specific fields: based on the error code set, could be TLV or a fixed set of fields

* Fatal bool: similar to the idea introduced here, indicating whether we need to close the channel.

In general, these errors should never occur. It means one side has a bug. Unfortunately, that happens in practice, leading to workarounds where errors are ignored. Hence distinguishing between "Warning: you sent crap" and "Fatal: no point talking any more, I've killed the channel".

As we cannot define all possible bugs, the best you can do is log something and retry, and eventually give up (just as you would if there were no progress without any error).

There is a strong case of other kinds of errors: in the offers spec I included such things, as they can genuinely happen, and we already have onion errors. We've also talked about soft-errors in the onion context (e.g. "I don't want to worry you, but there's a delay...")

carlaKC · 2021-01-26T07:12:55Z

In general, these errors should never occur. It means one side has a bug... As we cannot define all possible bugs, the best you can do is log something and retry

Right, I think I misunderstood the original reasoning for this warning message - thanks for clarifying! Error codes make less in the infinite range of bugs that one could run into :')

Do you have some concrete examples of what warnings could trigger an automatic action?

@t-bast I'll make a ML post with my case for these error codes, rather than clogging up this issue more since they are related, but not directly relevant to a soft warning error.

This takes from the draft spec at lightning/bolts#834 Note that if this draft does not get included, the peer will simply ignore the warning message (we always close the connection afterwards anyway). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: Protocol: we now report the new (draft) warning message.

niftynei · 2021-02-08T18:55:57Z

As we cannot define all possible bugs, the best you can do is log something and retry, and eventually give up (just as you would if there were no progress without any error).

I'm working on the error pathways for the dual-funding RBF right now. This disambiguation between "warning" and "error" is very useful in that context. Particularly how 'warning/close connection' allows a retry from initial state conditions, in other words signals to abandon the failed RBF attempt (with an explanation as to why.)

Error codes are indeed useful for specific, narrowly defined interactions and recovery pathways, like payment attempts. In c-lightning we also encode them for our RPC errors, to help identify common cases. I can see a use case where we'd want error codes specifically for RBF failure cases, however these can be added ad-hoc for specific and well-understood cases in the future.

In the general case, returning a warning to the caller with an explanation and re-starting the attempt is a robust solution; error coded response handling logic has a tendency to degrade robustness over time ime.

This takes from the draft spec at lightning/bolts#834 Note that if this draft does not get included, the peer will simply ignore the warning message (we always close the connection afterwards anyway). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: Protocol: we now report the new (draft) warning message.

Add support for logging warning messages as introduced in lightning/bolts#834 Support for sending warning messages instead of current errors will be added in a later PR.

lightning/bolts#834 recommends sending warning messages instead of connection-level errors in some cases, which avoids unnecessary channel closure.

t-bast

LGTM, I finally implemented this in eclair and tested with c-lightning, and we exchange warning messages correctly.

Just needs a rebase and it's an ACK for me!

t-bast · 2021-09-13T20:01:05Z

Needs rebase, otherwise ACK 4f2ec06

Roasbeef · 2021-09-13T20:09:39Z

@niftynei

In the general case, returning a warning to the caller with an explanation and re-starting the attempt is a robust solution; error coded response handling logic has a tendency to degrade robustness over time ime.

If the warning message has no context at all, then how is the caller meant to parse the message to restart the attempt? Totally disagree that error code response handling logic degrades over time, when was the last time you needed to redefine how you handle the HTTP 302 error code?

t-bast · 2021-09-13T20:26:07Z

If the warning message has no context at all, then how is the caller meant to parse the message to restart the attempt? Totally disagree that error code response handling logic degrades over time, when was the last time you needed to redefine how you handle the HTTP 302 error code?

I really don't understand what you mean.
Node operators have context about what they're doing, they know when they're opening or closing channels.
So it's really helpful for them to get something when it's rejected by the other side, and a string message is actually a very flexible way of doing this.

We've already found it very useful many times in the wild compared to "the remote peer just disconnected on me, so something must be wrong with what I sent them but I have no idea what".

TheBlueMatt · 2021-09-27T19:20:40Z

Wen rebase, lets land this :)

t-bast

ACK 5c745fd, let's land this!

TheBlueMatt

Basically there, want to come to come conclusion on all-0 errors.

TheBlueMatt · 2021-09-30T22:26:23Z

01-messaging.md


 #### Requirements

-The channel is referred to by `channel_id`, unless `channel_id` is 0 (i.e. all bytes are 0), in which case it refers to all channels.


I think this is the last thing we need to resolve, I'm still highly unconvinced here - yes, some of the historical places all-0 channel ids have been used was nonsense, but the idea that a peer will have a channel with me and I don't want to talk to them anymore and I'm not able to tell them that seems really wrong? Implementing all the logic to stay connected to them, parse their messages for channel IDs, and then fail those channels seems substantially complicated, and probably not relevant in cases where you just want a peer to "go away". We can change how we communicate this if backwards compatibility is the concern, but it just seems very strange to me to not be able to communicate this at all (when we currently can).

TheBlueMatt · 2021-09-30T22:31:53Z

02-peer-protocol.md

  - if it hasn't sent a `funding_locked` yet:
    - MAY reply to a `shutdown` message with a `shutdown`
  - once there are no outstanding updates on the peer, UNLESS it has already sent a `shutdown`:
    - MUST reply to a `shutdown` message with a `shutdown`
  - if both nodes advertised the `option_upfront_shutdown_script` feature, and the receiving node received a non-zero-length `shutdown_scriptpubkey` in `open_channel` or `accept_channel`, and that `shutdown_scriptpubkey` is not equal to `scriptpubkey`:
+    - MAY send a `warning`.
    - MUST fail the connection.


Aren't we getting rid of the "fail the connection" terminology in this pr :)

TheBlueMatt · 2021-09-30T22:33:49Z

02-peer-protocol.md

  - if the `scriptpubkey` is not in one of the above forms:
-    - SHOULD fail the connection.
+    - SHOULD send a `warning`.


In general (including here and the below warning conversion, but basically everywhere), we should probablyld probably have something that says "and ignore the message" or so. That would imply that the sender can re-send the shutdown with a different script (or whatever message in most contexts), which seems to be your goal here. Maybe it could be a general comment at the top of bolt 0 or 1 whatever.

agree with extending the definition

TheBlueMatt · 2021-10-01T16:58:30Z

Successfully used this to debug an issue in a C-Lightning unmerged PR from LDK, so I'd call that cross-implementation-more-than-tested :)

rustyrussell · 2021-12-14T00:02:45Z

Restored all-zero errors, and trivial rebase.

t-bast

ACK 570d356, thanks! Let's finally land this!

lightning-developer

Concept ACK.

I pointed out a few stylistic things and asked a few clarification questions.

One thing that I realized towards the end there is often the phrase:

MUST send a warning and close the connection, or send an error and fail the channel.

I believe this is defined as things a node MUST do after sending warning or error so it seems very redundant to repeat this throughout the document.

lightning-developer · 2021-12-17T02:05:29Z

00-introduction.md

@@ -132,6 +132,20 @@ See [BOLT #11: Invoice Protocol for Lightning Payments](11-payment-encoding.md)
   * _See related: [closing transaction](#closing-transaction), [funding transaction](#funding-transaction), [penalty transaction](#penalty-transaction)_
   * _See types: [revoked commitment transaction](#revoked-commitment-transaction)_

+* #### *Fail the channel*:
+  * This is a forced close of the channel. Very early on (before


Might be a contradiction to #942 ?

lightning-developer · 2021-12-17T02:11:49Z

01-messaging.md

-The 2-byte `len` field indicates the number of bytes in the immediately following field.
+1. type: 1 (`warning`)
+2. data:
+   * [`channel_id`:`channel_id`]


Clarification: This assumes / includes temporary_channel_id during funding of the channel?

lightning-developer · 2021-12-17T02:16:47Z

01-messaging.md

  - SHOULD send `error` for protocol violations or internal errors that make channels unusable or that make further communication unusable.
  - SHOULD send `error` with the unknown `channel_id` in reply to messages of type `32`-`255` related to unknown channels.
+  - when sending `error`:
+    - MUST fail the channel(s) referred to by the error message.


nit: referred to by the error ~~message~~ (pending on: #941 )

the message has only one channel_id field. BOLT 1 says if the id is zero it refers to all channels between the peers. We ignore the situation where we have n channels and want to refer to m channels with 1 error?

That'd likely require multiple warning messages, one for each channel. Though it's hard to imagine a scenario where m < n and m > 1 channels need to close due to the same issue. The all-channel error was mostly intended to "there's something wrong but I can't identify the channel it's affecting" afaik, so "there's something wrong with one of these m channels" isn't much more useful IMHO.

lightning-developer · 2021-12-17T02:18:46Z

01-messaging.md

+  - when sending `error`:
+    - MUST fail the channel(s) referred to by the error message.
+  - when sending `warning`:
+	- MAY set `channel_id` to all zero if the warning is not related to a specific channel.


I guess you mean the channel_id in warning and not the channel_id of the channels between the peers?

lightning-developer · 2021-12-17T02:21:05Z

01-messaging.md

+    - MUST fail the channel(s) referred to by the error message.
+  - when sending `warning`:
+	- MAY set `channel_id` to all zero if the warning is not related to a specific channel.
+  - MAY close the connection after sending.


Why would that not hold for error? I understood error to be the case that that a force close should trigger. I see good reasons that after publishing the commitment transaction a peer might want to close the connection.

lightning-developer · 2021-12-17T02:24:15Z

01-messaging.md

-  - if no existing channel is referred to by the message:
+    - MUST fail the channel referred to by `channel_id`, if that channel is with the sending node.
+  - upon receiving `warning`:
+    - SHOULD log the message for later diagnosis.


What does "log" mean? Keep in memory? Write to a log file? (IIRC we don't sepc logging in the bolts)

lightning-developer · 2021-12-17T02:28:37Z

02-peer-protocol.md

@@ -484,7 +486,7 @@ A non-funding node (fundee):
  transaction after a timeout of 2016 blocks.

 From the point of waiting for `funding_locked` onward, either node MAY
-fail the channel if it does not receive a required response from the
+send an `error` and fail the channel if it does not receive a required response from the


seems like a contradiction here to add that it MAY fail the channel as for what a node MUST do after sending error is to fail the channel (BOLT 01 line 341)

lightning-developer · 2021-12-17T02:32:12Z

02-peer-protocol.md

  - if the `scriptpubkey` is not in one of the above forms:
-    - SHOULD fail the connection.
+    - SHOULD send a `warning`.


agree with extending the definition

lightning-developer · 2021-12-17T02:33:00Z

02-peer-protocol.md

@@ -647,7 +650,8 @@ The sending node:
 The receiving node:
  - if the `signature` is not valid for either variant of closing transaction
  specified in [BOLT #3](03-transactions.md#closing-transaction) OR non-compliant with LOW-S-standard rule<sup>[LOWS](https://github.com/bitcoin/bitcoin/pull/6769)</sup>:
-    - MUST fail the connection.
+    - MUST send a `warning` and close the connection, or send an
+      `error` and fail the channel.


redundant "fail the channel"

lightning-developer · 2021-12-17T02:33:04Z

02-peer-protocol.md

@@ -673,7 +677,8 @@ The receiving node:
          - MUST propose a `fee_satoshis` in the overlap between received and (about-to-be) sent `fee_range`.
  - otherwise, if `fee_satoshis` is not strictly between its last-sent `fee_satoshis`
  and its previously-received `fee_satoshis`, UNLESS it has since reconnected:
-    - SHOULD fail the connection.
+    - SHOULD send a `warning` and close the connection, or send an
+      `error` and fail the channel.


redundant "fail the channel"

ok that wording comes several times I stop pointing this out

TheBlueMatt · 2021-12-21T00:44:01Z

01-messaging.md

  - SHOULD send `error` for protocol violations or internal errors that make channels unusable or that make further communication unusable.
  - SHOULD send `error` with the unknown `channel_id` in reply to messages of type `32`-`255` related to unknown channels.
+  - when sending `error`:
+    - MUST fail the channel(s) referred to by the error message.
+  - when sending `warning`:


I thought we were restoring all-0s? Looks like its still removed here, and further down.

rustyrussell force-pushed the soft-errors branch from e6b540b to 528805b Compare January 21, 2021 04:40

t-bast reviewed Jan 21, 2021

View reviewed changes

01-messaging.md Outdated Show resolved Hide resolved

03-transactions.md Outdated Show resolved Hide resolved

t-bast mentioned this pull request Jan 27, 2021

Lightning Specification Meeting 2021/02/01 #838

Closed

21 tasks

rustyrussell changed the title ~~BOLT 1: make errors default to "soft" errors.~~ BOLT 1: introduce warning messages, reduce requirements to send (hard) errors, remove "all-channel" errors. Jan 29, 2021

rustyrussell mentioned this pull request Jan 29, 2021

Warning message support ElementsProject/lightning#4364

Merged

t-bast mentioned this pull request Feb 9, 2021

Lightning Specification Meeting 2021/02/15 #840

Closed

23 tasks

ariard mentioned this pull request Feb 22, 2021

Implement BOLT's 834, introduce warning messages lightningdevkit/rust-lightning#804

Closed

niftynei mentioned this pull request May 21, 2021

unknown message from CL to LND ElementsProject/lightning#4530

Closed

pm47 mentioned this pull request Jun 7, 2021

Alternate strategies for outdated commitments ACINQ/eclair#1838

Open

t-bast added a commit to ACINQ/eclair that referenced this pull request Jun 30, 2021

Decode warning messages

19acba9

Add support for logging warning messages as introduced in lightning/bolts#834 Support for sending warning messages instead of current errors will be added in a later PR.

t-bast mentioned this pull request Jun 30, 2021

Decode warning messages ACINQ/eclair#1854

Merged

t-bast added a commit to ACINQ/eclair that referenced this pull request Jul 1, 2021

Decode warning messages (#1854)

f52c3dd

Add support for logging warning messages as introduced in lightning/bolts#834 Support for sending warning messages instead of current errors will be added in a later PR.

t-bast added a commit to ACINQ/eclair that referenced this pull request Jul 6, 2021

Use warning messages for connection issues

9bd0548

lightning/bolts#834 recommends sending warning messages instead of connection-level errors in some cases, which avoids unnecessary channel closure.

t-bast mentioned this pull request Jul 6, 2021

Use warning messages for connection issues ACINQ/eclair#1863

Merged

t-bast added a commit to ACINQ/eclair that referenced this pull request Jul 6, 2021

Use warning messages for connection issues

1a00bd4

lightning/bolts#834 recommends sending warning messages instead of connection-level errors in some cases, which avoids unnecessary channel closure.

t-bast reviewed Jul 6, 2021

View reviewed changes

t-bast mentioned this pull request Aug 31, 2021

Use warning message in quick close #904

Merged

t-bast mentioned this pull request Sep 8, 2021

Lightning Specification Meeting 2021/09/13 #909

Closed

19 tasks

t-bast approved these changes Sep 13, 2021

View reviewed changes

t-bast mentioned this pull request Sep 23, 2021

Lightning Specification Meeting 2021/09/27 #916

Closed

19 tasks

rustyrussell force-pushed the soft-errors branch from 4f2ec06 to 5c745fd Compare September 27, 2021 20:14

t-bast approved these changes Sep 28, 2021

View reviewed changes

TheBlueMatt reviewed Sep 30, 2021

View reviewed changes

t-bast mentioned this pull request Oct 6, 2021

Lightning Specification Meeting 2021/10/11 #920

Closed

19 tasks

t-bast mentioned this pull request Oct 21, 2021

Lightning Specification Meeting 2021/10/25 #929

Closed

21 tasks

t-bast mentioned this pull request Nov 5, 2021

Lightning Specification Meeting 2021/11/08 #933

Closed

21 tasks

t-bast mentioned this pull request Nov 22, 2021

Lightning Specification Meeting 2021/11/22 #936

Closed

21 tasks

lightning-developer mentioned this pull request Dec 5, 2021

Removed requirement to broadcast an outdated commitment transaction #942

Merged

t-bast mentioned this pull request Dec 6, 2021

Lightning Specification Meeting 2021/12/06 #943

Closed

21 tasks

rustyrussell changed the title ~~BOLT 1: introduce warning messages, reduce requirements to send (hard) errors, remove "all-channel" errors.~~ BOLT 1: introduce warning messages, reduce requirements to send (hard) errors Dec 14, 2021

rustyrussell force-pushed the soft-errors branch from 51b8552 to 570d356 Compare December 14, 2021 00:02

t-bast approved these changes Dec 14, 2021

View reviewed changes

t-bast mentioned this pull request Dec 14, 2021

Lightning Specification Meeting 2021/12/20 #945

Closed

19 tasks

lightning-developer suggested changes Dec 17, 2021

View reviewed changes

TheBlueMatt reviewed Dec 21, 2021

View reviewed changes

t-bast mentioned this pull request Jan 3, 2022

Lightning Specification Meeting 2022/01/03 #949

Closed

21 tasks

rustyrussell closed this Jan 3, 2022

rustyrussell force-pushed the soft-errors branch from 570d356 to 58287e2 Compare January 3, 2022 19:12

rustyrussell mentioned this pull request Jan 3, 2022

Really: BOLT 1: introduce warning messages, reduce requirements to send (hard) errors #950

Merged

bitromortac mentioned this pull request Jan 11, 2022

Support LN protocol warning messages spesmilo/electrum#7624

Open


		#### Requirements

		The channel is referred to by `channel_id`, unless `channel_id` is 0 (i.e. all bytes are 0), in which case it refers to all channels.

BOLT 1: introduce warning messages, reduce requirements to send (hard) errors #834

BOLT 1: introduce warning messages, reduce requirements to send (hard) errors #834

Conversation

rustyrussell commented Jan 20, 2021 • edited Loading

rustyrussell commented Jan 20, 2021

t-bast commented Jan 20, 2021 • edited Loading

rustyrussell commented Jan 21, 2021

rustyrussell commented Jan 21, 2021

t-bast left a comment

Choose a reason for hiding this comment

carlaKC commented Jan 21, 2021

t-bast commented Jan 21, 2021

rustyrussell commented Jan 25, 2021

carlaKC commented Jan 26, 2021

niftynei commented Feb 8, 2021

t-bast left a comment

Choose a reason for hiding this comment

t-bast commented Sep 13, 2021

Roasbeef commented Sep 13, 2021

t-bast commented Sep 13, 2021

TheBlueMatt commented Sep 27, 2021

t-bast left a comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented Oct 1, 2021

rustyrussell commented Dec 14, 2021

t-bast left a comment • edited Loading

Choose a reason for hiding this comment

lightning-developer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rustyrussell commented Jan 20, 2021 •

edited

Loading

t-bast commented Jan 20, 2021 •

edited

Loading

t-bast left a comment •

edited

Loading