Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant increase in force closures after upgrade to LND 0.15 #6744

Closed
mcsnubbs opened this issue Jul 16, 2022 · 20 comments
Closed

Significant increase in force closures after upgrade to LND 0.15 #6744

mcsnubbs opened this issue Jul 16, 2022 · 20 comments
Labels
bug Unintended code behaviour channel closing Related to the closing of channels cooperatively and uncooperatively channels

Comments

@mcsnubbs
Copy link

mcsnubbs commented Jul 16, 2022

Background

Over the past 3 months I built a node with about 200 channels using 0.14.3. Initially I saw ~1-3 force closures a month, and typically the other peer confirmed that they had a major issue with their node leading to the failure. After upgrading to 0.15, I've seen on average 3 force closures per day. I've been unable to decipher the logs to get a sense of what is happening and what I should attempt to do to rectify the problem. As is, it seems my most active channels are disappearing daily.

I am not aware of any connectivity issues with my node, running hybrid mode, pubkey: 022a03c83e94ab037a64dd71e54f1796db185f21b1d88ceea5486a274ec257e995.

Although Im not sure if it's relevant, I previously watched the HTLC stream with lntop, which would run for weeks in terminal. After upgrading to 0.15, lntop would lock up with many "pending" HTLCs. Now, it fails to launch at all.

Furthermore, lnd is running on a dedicated machine with bitcoind; prior to upgrading to 0.15, my cpu cores were running at an average of 0-5%; after upgrading to 0.15, most cores are at 5-10% utilization, with 1 or two cores spiking to 60-80% at any given time... No other changes have been made to the node, and I am not running rebalancing software.

Your environment

  • version of lnd 0.15
  • which operating system (uname -a on Nix) Linux WhiteBox 5.15.0-41-generic Make routing table persistent #44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • version of btcd, bitcoind, or other backend 0.23
  • any other relevant environment details

*** My most recent channel with c-otto force closed. I will include log entries for this channel funding outpoint below, followed by what c-otto sent me from his logs:

2022-07-16 11:55:43.342 [ERR] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): failing link: unable to accept revocation: revocation key mismatch with error: invalid revocation

2022-07-16 11:57:14.994 [ERR] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): failing link: ChannelPoint(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): unable to accept new commitment: not enough HTLC signatures with error: invalid commitment

2022-07-16 11:57:34.865 [INF] CNCT: ChannelArbitrator(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): force closing chan

*** Below are the logs sent from c-otto:

2022-07-16 17:55:32.875 [DBG] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): Failed to send 200022399 mSAT

^ Some failed HTLC, i.e. a working connection.

2022-07-16 17:55:39.614 [INF] PEER: Peer(022a03c83e94ab037a64dd71e54f1796db185f21b1d88ceea5486a274ec257e995): unable to read message from peer: EOF
2022-07-16 17:55:39.614 [INF] PEER: Peer(022a03c83e94ab037a64dd71e54f1796db185f21b1d88ceea5486a274ec257e995): disconnecting 022a03c83e94ab037a64dd71e54f1796db185f21b1d88ceea5486a274ec257e995@96.47.237.10:44598, reason: read handler closed
2022-07-16 17:55:39.816 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): stopping
2022-07-16 17:55:39.817 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): exited

^ Disconnected

Then: reconnection

2022-07-16 17:55:39.927 [INF] SRVR: Finalizing connection to 022a03c83e94ab037a64dd71e54f1796db185f21b1d88ceea5486a274ec257e995@96.47.237.10:44602, inbound=true
2022-07-16 17:55:39.934 [INF] PEER: Peer(022a03c83e94ab037a64dd71e54f1796db185f21b1d88ceea5486a274ec257e995): loading ChannelPoint(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1)
2022-07-16 17:55:39.934 [DBG] CNCT: New ChainEventSubscription(id=146) for ChannelPoint(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1)
2022-07-16 17:55:39.934 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): starting
2022-07-16 17:55:39.935 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1)
2022-07-16 17:55:39.935 [INF] PEER: Peer(022a03c83e94ab037a64dd71e54f1796db185f21b1d88ceea5486a274ec257e995): Negotiated chan series queries
2022-07-16 17:55:39.935 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): HTLC manager started, bandwidth=2393706690 mSAT
2022-07-16 17:55:39.935 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): attempting to re-synchronize
2022-07-16 17:55:40.948 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): received re-establishment message from remote side
2022-07-16 17:55:40.959 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): sending 2 updates to synchronize the state
2022-07-16 17:55:40.966 [DBG] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): loaded 1 fwd pks
2022-07-16 17:55:40.966 [DBG] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): removing completed fwd pkg for height=101142
2022-07-16 17:56:44.460 [DBG] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=10466197)->(Chan ID=737690:1073:1, HTLC ID=45951)

2022-07-16 17:57:44.522 [ERR] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): failing link: unable to complete dance with error: remote unresponsive

^ That's the only non-common log entry.

2022-07-16 17:57:44.522 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): exited
2022-07-16 17:57:44.525 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): stopping
2022-07-16 17:59:52.987 [INF] NTFN: Dispatching confirmed spend notification for outpoint=671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1, script=0 f4530c60eab59c020938bc93bbdb6028b758a556b8e53f42023018ae5d9cf767 at current height=745254: f68f0277aba6755ae46c98b532da40da6161444c0216ffca115fe061a99a4702[0] spending 671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1 at height=745254
@C-Otto
Copy link
Contributor

C-Otto commented Jul 16, 2022

Thanks for the issue! Let me know if you need further logs from my node.

@C-Otto
Copy link
Contributor

C-Otto commented Jul 16, 2022

I have a few of those:

2022-07-16 16:31:26.514 [ERR] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): failing link: unable to complete dance with error: remote unresponsive

Previous reconnection:

logs/lnd.log.53511.gz:2022-07-16 17:26:41.212 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): received re-establishment message from remote side
logs/lnd.log.53511.gz:2022-07-16 17:26:41.218 [DBG] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): loaded 1 fwd pks
logs/lnd.log.53511.gz:2022-07-16 17:26:41.218 [DBG] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): removing completed fwd pkg for height=101140
logs/lnd.log.53512.gz:2022-07-16 17:29:51.680 [DBG] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=10464892)->(Chan ID=737690:1073:1, HTLC ID=45951)
logs/lnd.log.53512.gz:2022-07-16 17:30:51.744 [ERR] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): failing link: unable to complete dance with error: remote unresponsive
logs/lnd.log.53512.gz:2022-07-16 17:30:51.744 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): exited
logs/lnd.log.53512.gz:2022-07-16 17:30:51.744 [INF] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): stopping

@hieblmi
Copy link
Collaborator

hieblmi commented Jul 16, 2022

@mcsnubbs on a side note, lntop works for me running v0.15.0-beta. Could check your ~/.lntop/lntop.log

@lucasdcf lucasdcf added bug Unintended code behaviour channel closing Related to the closing of channels cooperatively and uncooperatively channels labels Jul 16, 2022
@mcsnubbs
Copy link
Author

mcsnubbs commented Jul 17, 2022

I had two force closes since posting this issue, both with large reputable nodes. Will include the log info. The second list of log entries has a "fwding package db has been corrupted" error that I have not yet seen.

2022-07-17 05:27:52.213 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:1): HTLC manager started, bandwidth=3075323126 mSAT
2022-07-17 05:27:52.213 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:1): attempting to re-synchronize
2022-07-17 05:27:52.213 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:1): received re-establishment message from remote side
2022-07-17 05:30:40.784 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): stopping
2022-07-17 05:30:40.784 [WRN] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): error when syncing channel states: link shutting down
2022-07-17 05:30:40.785 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): exited
2022-07-17 05:30:40.854 [INF] PEER: NodeKey(03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601) loading ChannelPoint(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2)
2022-07-17 05:30:40.854 [INF] HSWC: Removing channel link with ChannelID(4f8707322809a34b2cd464b35517f3dbc4dc22e0ff9395ae8a3724437465f0ce)
2022-07-17 05:30:40.854 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): starting
2022-07-17 05:30:40.854 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2)
2022-07-17 05:30:40.854 [INF] NTFN: New block epoch subscription
2022-07-17 05:30:40.854 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): HTLC manager started, bandwidth=4778017444 mSAT
2022-07-17 05:30:40.854 [INF] HSWC: ChannelLink(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): attempting to re-synchronize
2022-07-17 05:33:01.520 [INF] NTFN: Dispatching confirmed spend notification for outpoint=ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2, script=0 a3921047b009617866ad37ca1738d652dd123e5a5bd6ac2475eebd737421ce48 at current height=745344: f2910adaa3dd4faf75a92309020c13634610db16376ca58085b6b53b15d88606[0] spending ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2 at height=745344
2022-07-17 05:33:01.524 [INF] CNCT: Unilateral close of ChannelPoint(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2) detected
2022-07-17 05:33:01.525 [INF] CNCT: ChannelArbitrator(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): remote party has closed channel out on-chain
2022-07-17 05:33:15.614 [WRN] CHBU: Replacing disk backup for ChannelPoint(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:1) w/ newer version
2022-07-17 05:33:33.674 [INF] CNCT: ChannelArbitrator(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): still awaiting contract resolution
2022-07-17 05:33:45.683 [INF] CNCT: *contractcourt.commitSweepResolver(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): Sweeping with witness type: CommitmentToRemoteConfirmed
2022-07-17 05:33:45.683 [INF] CNCT: *contractcourt.commitSweepResolver(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): sweeping commit output
2022-07-17 05:34:44.332 [INF] NANN: Announcing channel(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2) disabled [detected]
2022-07-17 05:34:44.332 [ERR] NANN: Unable to sign update disabling channel(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): edge not found
2022-07-17 05:39:50.397 [INF] CNCT: *contractcourt.commitSweepResolver(ccf065744324378aae9593ffe022dcc4dbf31755b364d42c4ba309283207874f:2): local commitment output fully resolved by sweep tx: 997a314d838c55958f276afe3f00151453ce54922b3dccd3bfa28d8e7c811cc7
2022-07-16 21:56:36.420 [ERR] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): failing link: unable to complete dance with error: remote unresponsive
2022-07-16 21:56:36.420 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): exited
2022-07-16 21:56:36.949 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): stopping
2022-07-16 21:58:02.945 [INF] PEER: NodeKey(0262ab2e8c0f8d7062f51847eaf05900b43fe0902ba67858a4d8ec74fec68f4ba3) loading ChannelPoint(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0)
2022-07-16 21:58:02.945 [INF] HSWC: Removing channel link with ChannelID(32da91f3615a77ab6625579b80da3f46ed0fa0f19edaf2888712d6fcb25ac844)
2022-07-16 21:58:02.945 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): starting
2022-07-16 21:58:02.946 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0)
2022-07-16 21:58:02.946 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): HTLC manager started, bandwidth=3676794443 mSAT
2022-07-16 21:58:02.946 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): attempting to re-synchronize
2022-07-16 21:58:03.269 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): received re-establishment message from remote side
2022-07-16 21:58:05.661 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): sending 2 updates to synchronize the state
2022-07-16 21:58:27.683 [INF] NTFN: Dispatching confirmed spend notification for outpoint=44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0, script=0 61d99f1ebaa4a483603f65b56015dc50d294c5d2f542d985bc6a8f0ec56f1164 at current height=745303: b6fa986768f6ded26a5e465c8a0dfee1bd65a79de0cab12b20cf45f1c23a33b3[0] spending 44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0 at height=745303
2022-07-16 21:58:27.687 [INF] CNCT: Unilateral close of ChannelPoint(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0) detected
2022-07-16 21:58:27.688 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): remote party has closed channel out on-chain
**2022-07-16 21:58:41.211 [ERR] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): unable to remove fwd pkg for height=12890: fwding package db has been corrupted
2022-07-16 21:58:41.211 [ERR] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): failing link: unable to resolve fwd pkgs: fwding package db has been corrupted with error: internal error**
2022-07-16 21:58:41.211 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): exited
2022-07-16 21:58:58.495 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): still awaiting contract resolution
2022-07-16 21:59:06.077 [INF] CNCT: *contractcourt.commitSweepResolver(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): Sweeping with witness type: CommitmentToRemoteConfirmed
2022-07-16 21:59:06.077 [INF] CNCT: *contractcourt.commitSweepResolver(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): sweeping commit output
2022-07-16 21:59:06.529 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): a contract has been fully resolved!
2022-07-16 21:59:06.529 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): still awaiting contract resolution
2022-07-16 21:59:09.020 [INF] HSWC: Removing channel link with ChannelID(32da91f3615a77ab6625579b80da3f46ed0fa0f19edaf2888712d6fcb25ac844)
2022-07-16 21:59:09.020 [INF] HSWC: ChannelLink(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): stopping
2022-07-16 22:02:31.623 [INF] CNCT: *contractcourt.commitSweepResolver(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): local commitment output fully resolved by sweep tx: 64775e68ee277d9321f571f67f25a04313f3b0663d59cd356e61190eb69cb0ea
2022-07-16 22:02:40.153 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): a contract has been fully resolved!
2022-07-16 22:02:40.153 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): still awaiting contract resolution
2022-07-16 22:02:40.497 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): a contract has been fully resolved!
2022-07-16 22:02:40.497 [INF] CNCT: ChannelArbitrator(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): still awaiting contract resolution
2022-07-16 22:17:44.891 [INF] NANN: Announcing channel(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0) disabled [detected]
2022-07-16 22:17:44.894 [ERR] NANN: Unable to sign update disabling channel(44c85ab2fcd6128788f2da9ef1a00fed463fda809b572566ab775a61f391da32:0): edge not found

@mcsnubbs
Copy link
Author

@mcsnubbs on a side note, lntop works for me running v0.15.0-beta. Could check your ~/.lntop/lntop.log

LNTOP log showing a long list of permission denied rpc errors

{"level":"error","ts":1657704389.4633987,"caller":"ui/controller.go:121","msg":"failed","logger":"controller","error":"rpc error: code = Unknown desc = edge not found","errorVerbose":"rpc error: code = Unknown desc = edge not found\ngh.neting.cc/edouardparis/lntop/network/backend/lnd.Backend.GetChannelInfo\n\t/tmp/lntop/network/backend/lnd/lnd.go:346\ngh.neting.cc/edouardparis/lntop/ui/models.(*Models).RefreshChannels\n\t/tmp/lntop/ui/models/models.go:63\ngh.neting.cc/edouardparis/lntop/ui.(*controller).Listen.func1\n\t/tmp/lntop/ui/controller.go:119\ngh.neting.cc/edouardparis/lntop/ui.(*controller).Listen\n\t/tmp/lntop/ui/controller.go:148\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581","stacktrace":"github.com/edouardparis/lntop/ui.(*controller).Listen.func1\n\t/tmp/lntop/ui/controller.go:121\ngh.neting.cc/edouardparis/lntop/ui.(*controller).Listen\n\t/tmp/lntop/ui/controller.go:148"}
{"level":"error","ts":1657788681.9180472,"caller":"pubsub/pubsub.go:49","msg":"SubscribeInvoice returned an error","logger":"pubsub","error":"rpc error: code = Unknown desc = permission denied","stacktrace":"github.com/edouardparis/lntop/pubsub.(*PubSub).invoices.func2\n\t/tmp/lntop/pubsub/pubsub.go:49"}

@C-Otto
Copy link
Contributor

C-Otto commented Jul 17, 2022

Could you please edit your posts and use code block formatting (three backticks - code - three backticks)?

@mcsnubbs
Copy link
Author

Done. I compacted channel db, went from 18 to 2 gb, since then my routing has largely normalized, as has my cpu utilization and ability to run lntop. Would it make sense that the large channel.db caused the issue?

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Jul 18, 2022

I haven't seen this forwarding log error before, that might be an issue

2022-07-16 11:55:43.342 [ERR] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): failing link: unable to accept revocation: revocation key mismatch with error: invalid revocation

Should be fixed in 0.15.1

2022-07-16 11:57:14.994 [ERR] HSWC: ChannelLink(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): failing link: ChannelPoint(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): unable to accept new commitment: not enough HTLC signatures with error: invalid commitment

Should be fixed in 0.15.1

2022-07-16 11:57:34.865 [INF] CNCT: ChannelArbitrator(671cfce1d2f30c93333e4fe472f95c859893343aa6c50dac10b4ff58b6207bdb:1): force closing chan

Regular go-to-chain force close

@Crypt-iQ
Copy link
Collaborator

The forwarding log error appears to occur when an on-chain spend is detected (forwarding packages get wiped) and the link is still up and tries to remove a forwarding pacakge via resolveFwdPkg. This should be fine, but what I am not sure of is why the peer with the 4f8707322809a34b2cd464b35517f3dbc4dc22e0ff9395ae8a3724437465f0ce channel would reestablish the channel if it went to force close. CC @yyforyongyu . I did make note of this a while ago and it is benign, but is probably something that should be changed

@Crypt-iQ
Copy link
Collaborator

Things that could cause buggy force closes in 0.15 and prior from either you or C-Otto:

  • connectivity issues
    • tor had a ddos attack on hidden services recently, not sure if that's still ongoing
  • scripts that call DisconnectPeer and reconnect

@C-Otto
Copy link
Contributor

C-Otto commented Jul 18, 2022

  • scripts that call DisconnectPeer and reconnect

I do that. My node has the "add linkStopIndex to cleanly shutdown ChannelLink" fix, though.

@Crypt-iQ
Copy link
Collaborator

I do that. My node has the "add linkStopIndex to cleanly shutdown ChannelLink" fix, though.

I am pretty sure you'll want the DisconnectPeer patch from here #6655 also. I think that even if one ChannelLink waits for the current one to finish, the new one might have outdated state when it is loaded.

@C-Otto
Copy link
Contributor

C-Otto commented Jul 18, 2022

I already have that, thanks.

@Roasbeef
Copy link
Member

tor had a ddos attack on hidden services recently, not sure if that's still ongoing

Yep it is (though of a slightly different nature): https://status.torproject.org/

@Roasbeef
Copy link
Member

Based on this comment, I think we can close this issue as most of the known force close scenarios have been resolved in 0.15.1? The one wild card still is the Tor network, as it still isn't hack at prior levels of connectivity/reliability.

@mcsnubbs
Copy link
Author

So until 0.15.1 I should be regularly compacting channel.db to avoid similar issues?

RE TOR this node runs on clearnet as well; channel issues werent limited to TOR connected peers.

@Roasbeef
Copy link
Member

So until 0.15.1 I should be regularly compacting channel.db to avoid similar issues?

No, the issues we fixed aren't related to DB compaction.

@guggero
Copy link
Collaborator

guggero commented Jul 27, 2022

@mcsnubbs are you running with sync-freelist=true? It could be that the heavy CPU usage is coming from that which could perhaps also explain the force closes (your node not being able to respond in time). See #6737.

@mcsnubbs
Copy link
Author

@mcsnubbs are you running with sync-freelist=true? It could be that the heavy CPU usage is coming from that which could perhaps also explain the force closes (your node not being able to respond in time). See #6737.

You are correct! This was my issue... both .db bloating and CPU usage returned to normal. Appreciate the assistance!

@lncyberhub
Copy link

lncyberhub commented Jul 27, 2022

@mcsnubbs are you running with sync-freelist=true? It could be that the heavy CPU usage is coming from that which could perhaps also explain the force closes (your node not being able to respond in time). See #6737.

You are correct! This was my issue... both .db bloating and CPU usage returned to normal. Appreciate the assistance!

I had the same thing going on! Crazy db bloating after ~7 days. It looks like sync-freelist=false is helping!

Bitcoinite added a commit to Bitcoinite/RaspiBolt that referenced this issue Aug 25, 2022
The setting sync-freelist=true caused heavy bloating of the channel.db (several GB per day). Disabling this setting fixes this issue.

Related:
lightningnetwork/lnd#6800
lightningnetwork/lnd#6744
lightningnetwork/lnd#6737
lightningnetwork/lnd#6837 (comment)
VajraOfIndra pushed a commit to raspibolt/raspibolt that referenced this issue Aug 25, 2022
* Change sync-freelist to false in the lnd.conf

The setting sync-freelist=true caused heavy bloating of the channel.db (several GB per day). Disabling this setting fixes this issue.

Related:
lightningnetwork/lnd#6800
lightningnetwork/lnd#6744
lightningnetwork/lnd#6737
lightningnetwork/lnd#6837 (comment)

* remove sync-freelist=true from lnd.conf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour channel closing Related to the closing of channels cooperatively and uncooperatively channels
Projects
None yet
Development

No branches or pull requests

8 participants