-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with /metrics endpoint #28
Comments
Hm, admittedly lndmon has not been tested on rpi-type hardware. |
This is you attempting to hit the |
@Roasbeef Honestly I haven't spent much time trying to debug this, but neither Prometheus nor myself (from the cli) can hit the metrics endpoint on |
After some time debugging I found out that what is taking so long is the GraphCollector's |
GraphCollector is taking more than 30% of the cpu time (understandable, this is the biggest dataset being ingested). |
I changed my Prometheus config (slower interval + higher timeout) and I am running diff --git a/prometheus.yml b/prometheus.yml
index 01797c0..81d781c 100755
--- a/prometheus.yml
+++ b/prometheus.yml
@@ -1,6 +1,7 @@
scrape_configs:
- job_name: "lndmon"
- scrape_interval: "20s"
+ scrape_interval: "30s"
+ scrape_timeout: "15s"
static_configs:
- targets: ['lndmon:9092']
- job_name: "lnd" I am not saying this should be merged because it's totally arbitrary. A bigger network and/or a slower hardware device would require even more conservative defaults. |
thanks for the reasearch @xsb i had the same problem. for me the scrape time was 30-50 seconds. |
I am trying to use lnd+lndmon on a rock64 board (similar to rpi, with arm64 and 4GB RAM) but Grafana only shows data points coming directly from lnd (Go Runtime + Performance dashboard). Everything supposed to come from lndmon is not there.
I noticed that when running simple queryes with PromQL I immediately got the error: "the queries returned no data for a table". Then went to Explore section and checked for
up
, there I can see how the lndmon process is reported to be down, which is not true.After that I tried to get the metrics directly and I realized I was getting slow response times on the metrics endpoint (between 10s and 12s usually):
I haven't investigated this deeply yet but the instance has more than enough Ram, and the CPU usage and load average don't look that bad.
Will try to spend more time in another moment but wanted to report soon just in case it's happening to more people.
The text was updated successfully, but these errors were encountered: