multi: add htlc stream subscription and routing dashboard #59

carlaKC · 2020-11-11T14:59:53Z

This PR adds a subscription to HTLC events and adds a routing dashboard. HTLCs are separated with a dropdown variable to display the dashboard for sends/receives/forwards that the node processes.

There's a lot more than can be done with this subscription, so we stick to some basic questions:

How many {sends/receives/forwards} is my node processing?

How successful are my {sends/receives/forwards}?

Which channels are most used for successful {sends/receives/forwards}?

Which channels are most used for failed {sends/receives/forwards}?

How long is my liquidity locked up by {sends/forwards}?

Why are my {sends/receives/forwards} failing?

Full dashboard layout:

Open question here is whether we want to duplicate the "pending htlcs" that we already have on the node dash. I can see it being useful to have a graph with only forwards to monitor your traffic, and detect any unusual spikes/spamming.

Depends on #58, don't review the first 2 commits.

Roasbeef

Excellent work! IMO this is one of the last low hanging fruits, that also has a lot of leverage w.r.t giving node operators more insight w.r.t what's happening with their node in real-time. It can also eventually be used later on to trigger alerts if it appears that griefing attacks may be being launched across the network.

Haven't tried this out yet, but so far just a few comments re redundant counters, and richer use of labels.

collectors/htlcs_collector.go

Roasbeef · 2020-11-12T01:33:56Z

collectors/htlcs_collector.go

@@ -59,6 +64,24 @@ func newHtlcMonitor(router lndclient.RouterClient,
 			Name:      "failed_forwards",
 			Help:      "count fo failed forwards",
 		}),
+		resolutionTimeHistogram: prometheus.NewHistogram(
+			prometheus.HistogramOpts{


Similar comment here re a vector to add in additional labels. One other useful labels for this one in particular (also applies to the others), would be adding the payment hash itself. This would let us track MPP usage, possibly probing usage, hash re-use, see how long MPP payments take to resolve on avg, etc, etc.

Added some labels here. We don't have payment hash rn, because it was a bit involved to surface that info in the original lnd PR. Also wondering whether labels would be the best way to track something as variable as payment hash?

So I'm thinking the payment hash would only be used to group items in aggregation operators. So you could do something like count(resolution_ms) > 1 by (payment_hash) (assuming it's a gauge) which would let you track MPP usage over time as they're identified by repeated payment hashes.

It could end up being rather unscalable though since one may end up with so many labels over time.

collectors/htlcs_collector.go

Switch lndmon to use lndclient, providing the readonly macaroon as our only macaroon, so that no changes to lndmon setup are required. This switch also provides us with version checks, which we set to lnd v0.11, since that is the minimum supported version after recent CVEs.

Update all of our collectors to shutdown on failure rather than sliently log. This paired with restarting the lndmon container on exit allows easier detection of persistenet issues, and simple restart when lnd is unavailable temporarily.

Roasbeef

LGTM 🗾

Roasbeef · 2020-11-20T03:32:52Z

Also thinking would be useful to create quantile graphs for the resolution latency as well, but this is a great start!

Roasbeef requested changes Nov 12, 2020

View reviewed changes

carlaKC added 5 commits November 17, 2020 13:41

collectors: add htlc monitor

6c0ee9f

multi: add htlc resolution time to routing dashboard

7bda61d

collectors: add failure reasons vector to htlcmonitor

939973e

carlaKC force-pushed the 57-lndlcientswitchover branch from 709dee0 to 939973e Compare November 17, 2020 11:41

carlaKC requested a review from Roasbeef November 17, 2020 12:55

Roasbeef approved these changes Nov 20, 2020

View reviewed changes

Roasbeef merged commit bb350b5 into lightninglabs:master Nov 20, 2020

carlaKC mentioned this pull request Nov 25, 2020

metrics: add exported metrics based on the forwarding series information #57

Closed

carlaKC mentioned this pull request Dec 17, 2020

Extended Routing Dashboard #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi: add htlc stream subscription and routing dashboard #59

multi: add htlc stream subscription and routing dashboard #59

carlaKC commented Nov 11, 2020 •

edited

Loading

Roasbeef left a comment

Roasbeef Nov 12, 2020

carlaKC Nov 17, 2020

Roasbeef Nov 20, 2020

Roasbeef Nov 20, 2020

Roasbeef left a comment

Roasbeef commented Nov 20, 2020

multi: add htlc stream subscription and routing dashboard #59

multi: add htlc stream subscription and routing dashboard #59

Conversation

carlaKC commented Nov 11, 2020 • edited Loading

How many {sends/receives/forwards} is my node processing?

How successful are my {sends/receives/forwards}?

Which channels are most used for successful {sends/receives/forwards}?

Which channels are most used for failed {sends/receives/forwards}?

How long is my liquidity locked up by {sends/forwards}?

Why are my {sends/receives/forwards} failing?

Full dashboard layout:

Roasbeef left a comment

Choose a reason for hiding this comment

Roasbeef Nov 12, 2020

Choose a reason for hiding this comment

carlaKC Nov 17, 2020

Choose a reason for hiding this comment

Roasbeef Nov 20, 2020

Choose a reason for hiding this comment

Roasbeef Nov 20, 2020

Choose a reason for hiding this comment

Roasbeef left a comment

Choose a reason for hiding this comment

Roasbeef commented Nov 20, 2020

carlaKC commented Nov 11, 2020 •

edited

Loading