Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error" #6593

zerofeerouting · 2022-05-30T08:11:53Z

Background

I run a CLN node and have experienced quite a couple of instances where my node force-closed a channel, due to the LND peer sending an internal error message.

I finally had this error with a peer that was able to provide the relevant logs (@ZoltanAB)

LND environment

LND: 0.14.2-beta
OS: Linux ipayblue-1 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
Using @C-Otto's rebalance-lnd script
(if that's relevant)

Steps to reproduce

Have a channel between LND / CLN that forwards HTLCs.

Expected behaviour

LND should not send an error.

Actual behaviour

LND sends an error.

Logs

LND Logs (peer A)

2022-05-29 21:46:19.294 [ERR] HSWC: ChannelLink(297f43e0ac9a7307f334dc2a38eac05a86943f77e912dba679bc9cda52284a55:0): unable to remove fwd pkg for height=421027: bucket not found
2022-05-29 21:46:19.294 [ERR] HSWC: ChannelLink(297f43e0ac9a7307f334dc2a38eac05a86943f77e912dba679bc9cda52284a55:0): failing link: unable to resolve fwd pkgs: bucket not found with error: internal error

CLN logs (peer B)

2022-05-29T21:46:12.082Z UNUSUAL 032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-channeld-chan#7100: Adding HTLC 2358 too slow: killing connection
2022-05-29T21:46:12.084Z INFO    032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (9)
2022-05-29T21:46:20.650Z UNUSUAL 032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: Peer permanent failure in CHANNELD_NORMAL: channeld: received ERROR error channel 554a2852da9cbc79a6db12e9773f94865ac0ea382adc34f307739aace0437f29: internal error
2022-05-29T21:46:20.651Z INFO    032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL

Additional info

The LND node was heavily rebalancing and thus running into memory issues about 7 minutes before the event (no log entries up to 2022-05-29 21:40:02.124).

As you can tell from the graph, they stopped their rebalancing script a couple of hours after the crash.

The text was updated successfully, but these errors were encountered:

ZoltanAB · 2022-05-30T08:24:18Z

The rebalancing scripts were working fine. I guess the main issue was trying to run too many instances in the same time, this killed my system (out of memory).

indomitorum · 2022-05-30T11:07:00Z

I run LND 14.2. and yesterday, I opened a channel with Bcash who I'm told runs CLN. This got force closed remotely this morning.

Looked at the logs. This stands out :

ChannelLink(2a4829c56e036b97422be0c61d5a6d926a47317 86650a4b00ba55f1cc71a98b7:2): failing link: unable to update commitment: cannot add duplicate

ChannelLink(2a4829c56e036b97422be0c61d5a6d926a4731786650a4b00ba55f1cc71a98b7:2): failing link: unable to complete dance with error: remote unresponsive

[ERR] HSWC: ChannelLink(2a4829c56e036b97422be0c61d5a6d926a4731786650a4b00ba55f1cc71a98b7:2): failing link: unable to synchronize channel states: first message sent to sync should be ChannelReestablish, instead received: *lnwire.Error with error: unable to resume channel, recovery required

[ERR] HSWC: ChannelLink(2a4829c56e036b97422be0c61d5a6d926a4731786650a4b00ba55f1cc71a98b7:2): failing link: unable to update commitment: cannot add duplicate keystone with error: internal error

https://pastebin.com/42BqqhbW

zerofeerouting · 2022-05-30T11:25:08Z

@indomitorum Your log looks like it's related to @C-Otto's issue, which should be resolved #6485

Roasbeef · 2022-05-30T17:26:38Z

Is this the same issue as #6485? If so, it'll be resolved in 0.15.

zerofeerouting · 2022-05-30T17:40:26Z

The reason for the error message seems to be something different in this case: failing link: unable to resolve fwd pkgs: bucket not found with error: internal error. If the resulting behaviour is fixed either way, we can close the issue.

Crypt-iQ · 2022-05-30T18:11:43Z

This is not the same issue

Crypt-iQ · 2022-06-06T21:53:39Z

@ZoltanAB do you have more logs for this channel for several minutes before and after the above error? When did the node OOM? Relevant log categories would be HSWC, PEER, LNWL, CHDB.

ZoltanAB · 2022-06-07T06:19:17Z

@ZoltanAB do you have more logs for this channel for several minutes before and after the above error? When did the node OOM? Relevant log categories would be HSWC, PEER, LNWL, CHDB.

Does the log file contain any sensitive information? If not, I could send you the log file around that date and hour. Please advise. Thank you.

zerofeerouting · 2022-06-07T07:47:36Z

Just an info from me regarding the severity of this issue:

I have had 12 force closes due to this issue in the last seven days. That's a little more than 1% of my channels.

Crypt-iQ · 2022-06-07T13:19:06Z

@ZoltanAB do you have more logs for this channel for several minutes before and after the above error? When did the node OOM? Relevant log categories would be HSWC, PEER, LNWL, CHDB.

Does the log file contain any sensitive information? If not, I could send you the log file around that date and hour. Please advise. Thank you.

It contains privacy-leaking information (channel points, etc) - which I don't need if you want to redact them out. I am eugene on the lnd slack

zerofeerouting · 2022-06-07T13:33:16Z

Thank you for looking into this @Crypt-iQ.

ZoltanAB · 2022-06-07T15:12:40Z

@Crypt-iQ can I use your email address (el.....l@gmail.com) to send you the generated log files?

ZoltanAB · 2022-06-07T15:25:01Z

And FYI, today I had another similar FC around 02:10 AM GMT. Here is a graph of my load on the server for the last 24 hours:
https://gyazo.com/fb06514b54cbdbff5b9784f207667698
I can't see anything unusual, the load seems to be quite constant.

Crypt-iQ · 2022-06-07T15:34:18Z

@Crypt-iQ can I use your email address (el.....l@gmail.com) to send you the generated log files?

yup

Crypt-iQ · 2022-06-07T15:36:34Z

And FYI, today I had another similar FC around 02:10 AM GMT. Here is a graph of my load on the server for the last 24 hours: https://gyazo.com/fb06514b54cbdbff5b9784f207667698 I can't see anything unusual, the load seems to be quite constant.

Did you get the same bucket not found log message?

ZoltanAB · 2022-06-07T17:11:28Z

Yes. Sending you now.

ZoltanAB · 2022-06-07T17:15:19Z

Just sent you the logs. Thank you.

Crypt-iQ · 2022-06-08T16:53:23Z

Thanks for the logs, I know why this happens. I'll start working on a fix

ZoltanAB · 2022-06-08T17:16:32Z

Glad I could contribute a little.

…

On Wed, Jun 8, 2022, 19:53 Eugene ***@***.***> wrote: Thanks for the logs, I know why this happens. I'll start working on a fix — Reply to this email directly, view it on GitHub <#6593 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASBNXJV55776I4YQHTEOBO3VODFY7ANCNFSM5XJWSZIQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ZoltanAB · 2022-06-13T09:50:12Z

@Crypt-iQ any update on this? Just had another FC due to this issue. Thank you, If needed, can send you more logs.

Crypt-iQ · 2022-06-13T14:19:33Z

@Crypt-iQ any update on this? Just had another FC due to this issue. Thank you, If needed, can send you more logs.

This won't get into 0.15 since that is right around the corner and I want the fix to receive review w/o being subject to a deadline. I could provide a patch this week that you could apply to your node if you are comfortable, but you'd need to revert it first when upgrading to any other version

Crypt-iQ · 2022-06-14T19:26:20Z

preliminary fix is here #6642 - hopefully it survives review - it did fix my local repro case. I would recommend not patching this on your node until it receives adequate review or 0.15.1 is released

zerofeerouting mentioned this issue May 30, 2022

force close on peer with "internal error" no further explanations ElementsProject/lightning#5102

Closed

Crypt-iQ added bug Unintended code behaviour crash labels May 30, 2022

Roasbeef added database Related to the database/storage of LND htlcswitch labels May 30, 2022

Crypt-iQ self-assigned this Jun 6, 2022

Crypt-iQ added the P1 MUST be fixed or reviewed label Jun 7, 2022

Crypt-iQ removed the crash label Jun 8, 2022

Crypt-iQ mentioned this issue Jun 14, 2022

htlcswitch: add linkStopIndex to cleanly shutdown ChannelLink #6642

Merged

GordianLN mentioned this issue Jun 15, 2022

ChanStatusBorked, closure with ZFR, "ghost" balance sent to unknown addresses #6639

Closed

Crypt-iQ mentioned this issue Jun 21, 2022

server.go: replace call to removePeer with Disconnect in DisconnectPeer #6655

Merged

Roasbeef closed this as completed in #6642 Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error" #6593

Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error" #6593

zerofeerouting commented May 30, 2022 •

edited

Loading

ZoltanAB commented May 30, 2022

indomitorum commented May 30, 2022 •

edited

Loading

zerofeerouting commented May 30, 2022

Roasbeef commented May 30, 2022

zerofeerouting commented May 30, 2022

Crypt-iQ commented May 30, 2022

Crypt-iQ commented Jun 6, 2022

ZoltanAB commented Jun 7, 2022

zerofeerouting commented Jun 7, 2022

Crypt-iQ commented Jun 7, 2022

zerofeerouting commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

Crypt-iQ commented Jun 7, 2022

Crypt-iQ commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

Crypt-iQ commented Jun 8, 2022

ZoltanAB commented Jun 8, 2022 via email

ZoltanAB commented Jun 13, 2022 •

edited

Loading

Crypt-iQ commented Jun 13, 2022

Crypt-iQ commented Jun 14, 2022 •

edited

Loading

Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error" #6593

Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error" #6593

Comments

zerofeerouting commented May 30, 2022 • edited Loading

Background

LND environment

Steps to reproduce

Expected behaviour

Actual behaviour

Logs

LND Logs (peer A)

CLN logs (peer B)

Additional info

ZoltanAB commented May 30, 2022

indomitorum commented May 30, 2022 • edited Loading

zerofeerouting commented May 30, 2022

Roasbeef commented May 30, 2022

zerofeerouting commented May 30, 2022

Crypt-iQ commented May 30, 2022

Crypt-iQ commented Jun 6, 2022

ZoltanAB commented Jun 7, 2022

zerofeerouting commented Jun 7, 2022

Crypt-iQ commented Jun 7, 2022

zerofeerouting commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

Crypt-iQ commented Jun 7, 2022

Crypt-iQ commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

ZoltanAB commented Jun 7, 2022

Crypt-iQ commented Jun 8, 2022

ZoltanAB commented Jun 8, 2022 via email

ZoltanAB commented Jun 13, 2022 • edited Loading

Crypt-iQ commented Jun 13, 2022

Crypt-iQ commented Jun 14, 2022 • edited Loading

zerofeerouting commented May 30, 2022 •

edited

Loading

indomitorum commented May 30, 2022 •

edited

Loading

ZoltanAB commented Jun 13, 2022 •

edited

Loading

Crypt-iQ commented Jun 14, 2022 •

edited

Loading