postgres backend - LND v0.14.1-beta "lnd compatibility check failed" #78

miketwenty1 · 2021-12-22T02:48:02Z

Need help understanding what's going on with my setup or if this is a bug.

Note, currently running lndmon for many nodes using the standard bbolt/boltdb backend.
For some reason it seems like I'm getting errors when using LND with postgres.

logs:

2021-12-22 02:39:55.978 [INF] LNDMON: Starting Prometheus exporter...
2021-12-22 02:39:55.978 [INF] HTLC: Starting Htlc Monitor
2021-12-22 02:39:55.979 [INF] LNDMON: Prometheus active!
Lndmon exiting with error: GraphCollector DescribeGraph failed with: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2021-12-22 02:40:35.757 [INF] HTLC: Stopping Htlc Monitor
2021/12/22 02:40:35 Stopping Prometheus Exporter
GraphCollector DescribeGraph failed with: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Sometimes I'll just get this for the error in the logs:

lnd compatibility check failed: unable to get info for lnd node: rpc error: code = DeadlineExceeded desc = context deadline exceeded

The text was updated successfully, but these errors were encountered:

guggero · 2021-12-22T08:35:02Z

Sounds like the request is just timing out. lndmon uses the default RPC timeout of 30 seconds. Does it take longer than 30 seconds to call lncli getinfo on the postgres lnd?

miketwenty1 · 2021-12-22T15:18:35Z

@guggero the response is nearly instant when I do a lncli getinfo. Let me know what else I should test.

guggero · 2021-12-22T15:30:55Z

Ah, I looked at the wrong error message. Seems like DescribeGraph fails, not GetInfo. Can you try if the error goes away by adding --caches.rpc-graph-cache-duration=5m?
You might need to fill the cache initially with lncli describegraph, then the lndmon calls should be answered almost immediately.

miketwenty1 · 2021-12-22T16:23:18Z

You're recommending I run lncli describegraph to cache for 5m instead of default of 1m on bootup of LND?

I ran LND with this config, I then ran the lncli describegraph, right afterwards if I start lndmon it will return as a healthy prometheus target, but after a bit of time it crashes with the same error.

Something to note in terms of latency:

It took 2 minutes and 39 seconds to respond to my lncli stop command, when I was bringing this node down for the cache update.
it took 1 minute and 50 seconds to run the lncli describegraph command, after I booted with new cache config.

Not sure if this would warrant a ticket in the lightningnetwork/lnd repo?

guggero · 2021-12-23T09:01:24Z

This is the same issue as lightningnetwork/lnd#6107 then. The in-memory graph is exactly the same data as is served in describegraph. If it takes multiple minutes to load it on startup then it will take multiple minutes to scrape from the RPC, unless the RPC graph cache is turned on. But every time the graph cache expires, the first scrape will take that long again.

I see two ways to fix this (indirectly, the main fix will be to speed up the graph download in postgres): Set the rpc-graph-cache-duration to an infinitely long time (e.g. 8760h which is one year) to disable updating the graph data in lndmon.
Or increase the default RPC timeout (must be added to this struct: https://github.com/lightninglabs/lndmon/blob/master/lndmon.go#L41) and the scrape interval to something larger than the 1 minute 50 seconds it takes to load the graph.

miketwenty1 · 2021-12-23T23:37:51Z

Why is this only happening with postgres backend?

guggero · 2022-01-03T09:52:12Z

Why is this only happening with postgres backend?

Not sure what you mean... context deadline exceeded is Golang's way of saying "something timed out". So the error is because the DescribeGraph call takes too long with postgres.

sandipndev · 2022-06-30T06:21:18Z

Looks like this is happening on postgres and not bbolt, can reproduce. getinfo took 2m4s to respond.

wdstorer-bg mentioned this issue Sep 9, 2022

config: add rpctimeout #86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

postgres backend - LND v0.14.1-beta "lnd compatibility check failed" #78

postgres backend - LND v0.14.1-beta "lnd compatibility check failed" #78

miketwenty1 commented Dec 22, 2021

guggero commented Dec 22, 2021

miketwenty1 commented Dec 22, 2021

guggero commented Dec 22, 2021

miketwenty1 commented Dec 22, 2021 •

edited

Loading

guggero commented Dec 23, 2021

miketwenty1 commented Dec 23, 2021

guggero commented Jan 3, 2022

sandipndev commented Jun 30, 2022

postgres backend - LND v0.14.1-beta "lnd compatibility check failed" #78

postgres backend - LND v0.14.1-beta "lnd compatibility check failed" #78

Comments

miketwenty1 commented Dec 22, 2021

guggero commented Dec 22, 2021

miketwenty1 commented Dec 22, 2021

guggero commented Dec 22, 2021

miketwenty1 commented Dec 22, 2021 • edited Loading

guggero commented Dec 23, 2021

miketwenty1 commented Dec 23, 2021

guggero commented Jan 3, 2022

sandipndev commented Jun 30, 2022

miketwenty1 commented Dec 22, 2021 •

edited

Loading