Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API returns negative values for some reply types #1315

Closed
kura opened this issue Mar 14, 2022 · 34 comments · Fixed by #1317 or #1322
Closed

API returns negative values for some reply types #1315

kura opened this issue Mar 14, 2022 · 34 comments · Fixed by #1317 or #1322

Comments

@kura
Copy link

kura commented Mar 14, 2022

Versions

  • Pi-hole: v5.9
  • AdminLTE: v5.11
  • FTL: v5.14

Platform

  • OS and version: 20.04.4
  • Platform: Raspberry Pi and VPS

Expected behavior

When hitting the API endpoint I would expect the value for replies to be positive numbers but instead I see some negative values.

Actual behavior / bug

When querying the API some of the reply_ values are negative.

$> curl -s "http://localhost/admin/api.php?summaryRaw" | python3 -m json.tool
{
    "domains_being_blocked": 2767826,
    "dns_queries_today": 41970,
    "ads_blocked_today": 4845,
    "ads_percentage_today": 11.54396,
    "unique_domains": 4671,
    "queries_forwarded": 20726,
    "queries_cached": 16248,
    "clients_ever_seen": 8,
    "unique_clients": 8,
    "dns_queries_all_types": 41965,
    "reply_NODATA": 358,
    "reply_NXDOMAIN": -337,
    "reply_CNAME": -190,
    "reply_IP": -3895,
    "privacy_level": 0,
    "status": "enabled",
    "gravity_last_updated": {
        "file_exists": true,
        "absolute": 1647219901,
        "relative": {
            "days": 0,
            "hours": 20,
            "minutes": 5
        }
    }
}

This is also the case when querying the stats directly from pihole-FTL so the issue may be there.

$> echo ">stats" | nc localhost 4711
domains_being_blocked 2767826
dns_queries_today 41572
ads_blocked_today 4808
ads_percentage_today 11.565476
unique_domains 4664
queries_forwarded 20557
queries_cached 16057
clients_ever_seen 6
unique_clients 6
dns_queries_all_types 41567
reply_NODATA 306
reply_NXDOMAIN -341
reply_CNAME -241
reply_IP -4122
privacy_level 0
status enabled
---EOM---

Steps to reproduce

Steps to reproduce the behavior:

I do not really know how to reproduce this behaviour, just that it happens for me on 2 different instances (1 local on a Raspberry Pi, the other on a cloud VPS.)

Debug Token

@rdwebdesign
Copy link
Member

I'm not sure if this is related to your issue, but you have some strange entries in your "Domains" list.

Open your debug log:

  • Look for *** [ DIAGNOSING ]: Domainlist and
  • scroll down until you find entries 311 to 317.

They are invalid.
The data are "inverted".
The domain name is on the type field and the domain field has a zero.

@kura
Copy link
Author

kura commented Mar 14, 2022

I'm not sure if this is related to your issue, but you have some strange entries in your "Domains" list.

Open your debug log:

* Look for `*** [ DIAGNOSING ]: Domainlist` and

* scroll down until you find entries _311 to 317_.

They are invalid. The data are "inverted". The domain name is on the type field and the domain field has a zero.

Good spot, I'll fix those values now.

@kura
Copy link
Author

kura commented Mar 15, 2022

Can confirm that fixing the above noted domainlist issues does not resolve this issue.

New debug token: https://tricorder.pi-hole.net/RsXFGM3V/

@yubiuser yubiuser transferred this issue from pi-hole/web Mar 15, 2022
@yubiuser
Copy link
Member

I've transferred the issue as this happend also when directly querying FTL.

@kura
Copy link
Author

kura commented Mar 15, 2022

As a note, this problem is resolved for a short period by either restarting pihole-FTL or removing pihole-FTL.db but eventually gets back in to a state where some of the reply_ values are negative again.

@DL6ER
Copy link
Member

DL6ER commented Mar 15, 2022

@kura This means restarting pihole-FTL alone (without removing the database) is sufficient to cure it for a short time? If so, this very much sounds like a bug in the garbage collection routines that ensure we can have the rolling time frame of history data. I will allocate some time on Saturday to check the code in depth here. I'm afraid I won't have time before.

@kura
Copy link
Author

kura commented Mar 15, 2022

@DL6ER yes, at least I am pretty sure that's the case. I've restarted it a few times over the last week without removing the DB. This resets the counters to 0 and they generally remain as positive integers for a few hours before switching and going negative from that point on. I have not noticed if the number decrements to 0 before going negative or if the value just flips from positive to negative.

This is also a fairly recent problem, I consume the API values for my Grafana home network dashboard which is why I noticed the issue. It only cropped up after the most recent update a week or two ago.

@DL6ER
Copy link
Member

DL6ER commented Mar 15, 2022

Thanks for the additional info, this is very valuable! I'll pay special attention to changes in the last update.

@kura
Copy link
Author

kura commented Mar 16, 2022

@DL6ER have 100% confirmed that just restarting pihole-FTL resolves the issue for between 1 and 2 hours before the integers randomly become negative. They also seem to just go from a positive to a negative without decrementing or it being an inversion. i.e. +1000 to -300 rather than going +1000 to -1000.

Edit: I just witnessed some of the reply_ values swapping from negative back to positive again, which I think gives credence to this being something like a GC routine issue.

@DL6ER
Copy link
Member

DL6ER commented Mar 18, 2022

Okay, looking at the code and reading ...

This resets the counters to 0 and they generally remain as positive integers for a few hours before switching and going negative from that point on.

... again, I'm pretty sure this has to do with improper query-type restoring from the database. Only recently, we implemented the ability to preserve the entire Query Log across FTL restarts (before, some details like reply times and types were lost) and looking at the code, it seems we forgot to update the counters after loading from the database. This will cause the negative numbers to appear when the garbage collection routine says "I have to subtract 1000 queries of type X" but the counter never actually had these 1000 queries.

The bug perfectly makes sense - so far. Now comes the bit where I'm a bit concerned that this may not be the final solution as I also read

by either restarting pihole-FTL or removing pihole-FTL.db but eventually gets back in to a state where some of the reply_ values are negative again.

Without any database being there initially, the problem shouldn't have appeared in the first place as no history data could have been imported improperly. However, you may have removed the database while FTL was still running, causing FTL to restore some of it when shutting down for a restart.

@kura cutting a long story short, please try my proposed fix by running

pihole checkout ftl fix/negative_replies

and check if the issue is resolved

@kura
Copy link
Author

kura commented Mar 18, 2022

@DL6ER I was looking back through my command history and there is a detail I left out about removing the DB which is that I ran rm pihole-FTL.db and noticed a new one was instantly created and never bothered to restart the pihole-FTL service. Maybe this could be explained by the GC issue?

I have switched my branch to fix/negative_replies and will post a reply later today with whether or not this fixes it. The counters are usually broken within a couple of hours so should be easy enough to tell. Thanks very much for the fast debugging.

@kura
Copy link
Author

kura commented Mar 18, 2022

One difference I have noticed is the branch checkout triggered a pihole-FTL service restart and my counters were not reset to 0, so I manually restarted the service to double check and can confirm they are correctly being transferred between restarts now which is not what was happening before. Which makes sense given the counter increment in the PR.

@yubiuser
Copy link
Member

I'm affected as well... going to run the custom branch.

echo ">stats" | nc localhost 4711
domains_being_blocked 814286
dns_queries_today 11483
ads_blocked_today 4870
ads_percentage_today 42.410519
unique_domains 1730
queries_forwarded 4686
queries_cached 1890
clients_ever_seen 13
unique_clients 11
dns_queries_all_types 11483
reply_NODATA -1752
reply_NXDOMAIN -42
reply_CNAME -317
reply_IP -14
privacy_level 0
status enabled
---EOM---

@kura
Copy link
Author

kura commented Mar 18, 2022

Can confirm I've been running the custom branch for 6 hours now, values have always been positive integers. I'd say you've fixed it with this change @DL6ER.

domains_being_blocked 3354348
dns_queries_today 48265
ads_blocked_today 4956
ads_percentage_today 10.268311
unique_domains 3927
queries_forwarded 24970
queries_cached 18148
clients_ever_seen 6
unique_clients 6
dns_queries_all_types 48265
reply_NODATA 6008
reply_NXDOMAIN 2280
reply_CNAME 9090
reply_IP 23987
privacy_level 0
status enabled
---EOM---

@yubiuser
Copy link
Member

yubiuser commented Mar 21, 2022

@kura Can you confirm the numbers don't turn negative anymore. I do still get negatived ones even with the custom branch sometimes....

domains_being_blocked 815867
dns_queries_today 11054
ads_blocked_today 4663
ads_percentage_today 42.183826
unique_domains 1219
queries_forwarded 4638
queries_cached 1694
clients_ever_seen 12
unique_clients 12
dns_queries_all_types 11054
reply_NODATA -201
reply_NXDOMAIN 87
reply_CNAME -34
reply_IP 375
privacy_level 0
status enabled

@kura
Copy link
Author

kura commented Mar 21, 2022

@yubiuser the numbers have not gone negative for me since Friday. I restarted the Pi that pihole is running on 2 days ago for OS patches and the reply_* values have been positive ever since.

$> sudo systemctl status pihole-FTL
● pihole-FTL.service - Pi-hole FTLDNS
     Loaded: loaded (/etc/systemd/system/pihole-FTL.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-03-18 18:16:18 UTC; 2 days ago
    Process: 2344142 ExecStartPre=/bin/touch /var/log/pihole-FTL.log /run/pihole-FTL.pid /run/pihole-FTL.port /var/log/>
    Process: 2344143 ExecStartPre=/bin/mkdir -p /var/run/pihole /var/log/pihole (code=exited, status=0/SUCCESS)
    Process: 2344144 ExecStartPre=/bin/rm -f /var/run/pihole/FTL.sock (code=exited, status=0/SUCCESS)
    Process: 2344145 ExecStartPre=/bin/chown -R pihole:pihole /var/run/pihole /var/log/pihole-FTL.log /run/pihole-FTL.p>
    Process: 2344146 ExecStartPre=/bin/chmod u=rwX,go=rX /var/run/pihole /var/log/pihole-FTL.log /run/pihole-FTL.pid /r>
    Process: 2344147 ExecStart=/usr/bin/pihole-FTL (code=exited, status=0/SUCCESS)
   Main PID: 2344160 (pihole-FTL)
      Tasks: 7 (limit: 4435)
     CGroup: /system.slice/pihole-FTL.service
             └─2344160 /usr/bin/pihole-FTL

Mar 18 18:16:17 p systemd[1]: Starting Pi-hole FTLDNS...
Mar 18 18:16:18 p pihole-FTL[2344147]: FTL started!
Mar 18 18:16:18 p systemd[1]: Started Pi-hole FTLDNS.
$> echo ">stats" | nc localhost 4711
domains_being_blocked 3456349
dns_queries_today 47499
ads_blocked_today 5627
ads_percentage_today 11.846565
unique_domains 6712
queries_forwarded 21995
queries_cached 19706
clients_ever_seen 6
unique_clients 6
dns_queries_all_types 47499
reply_NODATA 5811
reply_NXDOMAIN 2148
reply_CNAME 8981
reply_IP 26171
privacy_level 0
status enabled
---EOM---

@kura
Copy link
Author

kura commented Mar 21, 2022

I also graphed the last 4 days of data so you can see before switching to the fix branch and after.

2022-03-21_13-43_1

@yubiuser
Copy link
Member

Nice. How did you generate the data and graph. I could use it as well to look for patterns when the switch happen for me

@kura
Copy link
Author

kura commented Mar 21, 2022

@yubiuser I have a TIG stack running in my network that I use to monitor servers/services etc. Telegraf consumes the data from the pi-hole API every 5 seconds, inserts it in to InfluxDB and I have Grafana for displaying the data. So I just made a quick graph with 4 days of data.

@yubiuser
Copy link
Member

I have not setup any of those tools so far - maybe a good time to learn something new.

@kura
Copy link
Author

kura commented Mar 21, 2022

I have not setup any of those tools so far - maybe a good time to learn something new.

It's pretty quick and painless, add a couple of repos to the system and install grafana, telegraf and influxdb or you can do down the Docker route if you're more familiar with that approach. The telegraf config section I use for Pi-hole metrics is as follows;

[[inputs.http]]
  urls = ["http://pi-hole-ip-or-domain/admin/api.php?summaryRaw"]
  method = "GET"
  name_override = "pihole"
  tagexclude = ["host"]
  data_format = "json"
  json_string_fields = ["url", "status"]
  # insecure_skip_verify = true  # uncomment if you use a self signed TLS certificate

Once you have some stuff graphed, if you're like me you'll find yourself wanting to graph as much as possible. I graph all my internal network stuff (router, switch, NAS etc), internet connection (upload, download, packet loss, latency), all my servers including my Raspberry Pis (CPU, RAM, disk, services etc) and containers.

Super nerdy but often helpful for diagnosing problems (graphing this stuff is how I noticed the API returning negative values.)

@DL6ER
Copy link
Member

DL6ER commented Mar 21, 2022

@kura Can you compare the sum of the reply types with the total number of queries on your system?

When we're already talking about grafana: There are plans to add a native Prometheus metrics endpoint to the API. To be expected in Pi-hole v6.x (where x isn't necessarily 0)

@kura
Copy link
Author

kura commented Mar 21, 2022

@DL6ER sure.

A sum of reply_IP, reply_CNAME, reply_NXDOMAIN, reply_NODATA is actually ~4000 off of dns_queries_today and dns_queries_all_types. Sum = 39256, dns_queries_today or dns_queries_all_types=43560.

{
    "dns_queries_today": 43560,
    "dns_queries_all_types": 43560,
    "reply_NODATA": 5224,
    "reply_NXDOMAIN": 1920,
    "reply_CNAME": 8001,
    "reply_IP": 24111
}

I would very much like a native endpoint, although the API gets me by quite nicely anyway.

@DL6ER
Copy link
Member

DL6ER commented Mar 26, 2022

Okay, that's not too surprising to see a difference. In the end, we are only showing the most important 4 out of the 12 reply types FTL supports here.

As I don't recall the reasoning for including only a subset of the reply types, I added the remaining individual reply counters as well as their sum (dns_queries_all_replies) as new quantities. If there is a difference between any of the dns_queries_all_replies, dns_queries_all_types and dns_queries_today this highlights that the bug hasn't been fully fixed yet.

Please update on branch fix/negative_replies and report back.

@kura
Copy link
Author

kura commented Mar 26, 2022

@DL6ER I checked out the fix/negative_replies branch again but it's not showing the new reply types. Am I doing something wrong?

$> pihole checkout ftl fix/negative_replies
  Please note that changing branches severely alters your Pi-hole subsystems
  Features that work on the master branch, may not on a development branch
  This feature is NOT supported unless a Pi-hole developer explicitly asks!
  Have you read and understood this? [y/N] y

  [✓] Branch fix/negative_replies exists
  [i] Switching to branch: "fix/negative_replies" from "master"
  [✓] Downloading and Installing FTL
  [✓] Restarting pihole-FTL service...
  [✓] Enabling pihole-FTL service to start on reboot...
$> echo ">stats" | nc localhost 4711
domains_being_blocked 2968428
dns_queries_today 49523
ads_blocked_today 5179
ads_percentage_today 10.457767
unique_domains 3323
queries_forwarded 23618
queries_cached 20515
clients_ever_seen 6
unique_clients 6
dns_queries_all_types 49520
reply_NODATA 5742
reply_NXDOMAIN 2191
reply_CNAME 9473
reply_IP 26660
privacy_level 0
status enabled
---EOM---

@DL6ER
Copy link
Member

DL6ER commented Mar 26, 2022

Ah, sorry. I was distracted because the baby woke up almost immediately after my message above. I see now that the new FTL binaries have been rejected because our automated tests found that the API response changed in an unexpected way so you received the old version with your checkout. I added the new quantities to the tests now. Please try again.

@kura
Copy link
Author

kura commented Mar 26, 2022

No worries! I have updated it and used a quick bit of Python to confirm that the summed values do indeed match the total.

In [1]: s = """reply_UNKNOWN 2206
   ...: reply_NODATA 5827
   ...: reply_NXDOMAIN 2078
   ...: reply_CNAME 9460
   ...: reply_IP 26458
   ...: reply_DOMAIN 1479
   ...: reply_RRNAME 38
   ...: reply_SERVFAIL 1364
   ...: reply_REFUSED 0
   ...: reply_NOTIMP 0
   ...: reply_OTHER 0
   ...: reply_DNSSEC 0
   ...: reply_NONE 0
   ...: reply_BLOB 22"""=

In [2]: sum(int(b.split(" ")[1]) for b in s.split("\n")) == 48932
Out[2]: True

Any chance that you'll keep all these reply types when this is merged in to master and expose them via the API? It'd be nice to graph more than just the 4 exposed currently. :D

@DL6ER
Copy link
Member

DL6ER commented Mar 26, 2022

Yeah, sure. They'll stay. The only bit left is to confirm that everything does line up now. Also pinging @yubiuser who was still seeing negative numbers occasionally.

@kura
Copy link
Author

kura commented Mar 26, 2022

Scratch that, I just noticed that dns_queries_all_types and dns_queries_today values do not match...

dns_queries_today 49127
dns_queries_all_types 49124

The summed reply_ values match dns_queries_today but are 3 off from dns_queries_all_types.

domains_being_blocked 2968428
dns_queries_today 49127
ads_blocked_today 5154
ads_percentage_today 10.491176
unique_domains 3289
queries_forwarded 23318
queries_cached 20448
clients_ever_seen 6
unique_clients 6
dns_queries_all_types 49124
reply_UNKNOWN 2207
reply_NODATA 5857
reply_NXDOMAIN 2078
reply_CNAME 9504
reply_IP 26568
reply_DOMAIN 1482
reply_RRNAME 38
reply_SERVFAIL 1371
reply_REFUSED 0
reply_NOTIMP 0
reply_OTHER 0
reply_DNSSEC 0
reply_NONE 0
reply_BLOB 22
dns_queries_all_replies 49127
privacy_level 0
status enabled

@DL6ER
Copy link
Member

DL6ER commented Mar 26, 2022

Okay, so this looks like an issue with dns_queries_all_types now. We'll squash them all in the end! Thanks for your patience, a new version of this branch will be available in a few seconds. Please test away!

It'd also be interesting to see if you get any results for

grep "DB warn" /var/log/pihole-FTL.log

after your checkout.

@kura
Copy link
Author

kura commented Mar 26, 2022

No worries! I don't mind being a guinea pig and I'm not doing much but tinkering away on my own coding projects today anyway!

dns_queries_all_types, dns_queries_today, and dns_queries_all_replies have the same value now and a sum() of all reply_* values equals that value.

In [1]: # dns_queries_all_types 48042

In [2]: # dns_queries_today 48042

In [3]: # dns_queries_all_replies 48042

In [4]: s = """reply_UNKNOWN 1863
   ...: reply_NODATA 5810
   ...: reply_NXDOMAIN 2094
   ...: reply_CNAME 9363
   ...: reply_IP 26031
   ...: reply_DOMAIN 1469
   ...: reply_RRNAME 40
   ...: reply_SERVFAIL 1350
   ...: reply_REFUSED 0
   ...: reply_NOTIMP 0
   ...: reply_OTHER 0
   ...: reply_DNSSEC 0
   ...: reply_NONE 0
   ...: reply_BLOB 22"""

In [5]: sum(int(b.split(" ")[1]) for b in s.split("\n")) == 48042
Out[5]: True

As for warnings, grep "DB warn" /var/log/pihole-FTL.log returns nothing and I did a quick less through too to look for any other interesting stuff and it all looks normal, just a bunch of resizing the files in /dev/shm/.

Edit: Updated to reflect the additional new dns_queries_all_replies value that I had not noticed was also added to confirm it too has the expected value.

@DL6ER DL6ER mentioned this issue Mar 26, 2022
5 tasks
@DL6ER DL6ER linked a pull request Mar 26, 2022 that will close this issue
5 tasks
@kura
Copy link
Author

kura commented Mar 26, 2022

I did not expect the API endpoint to automatically get all these new reply_ values! That's cool, I guess it just eats all the data from pihole-FTL and spits it out.

Added to my graphing (I skipped some reply types 'cos I dunno what they are. 😄) I very much look forward to this being merged.

2022-03-26_14-27

@yubiuser
Copy link
Member

Confirm no negative reply types anymore.

@yubiuser
Copy link
Member

Released with https://github.com/pi-hole/FTL/releases/tag/v5.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants