-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
errors in validation log , and session restarting every 120 seconds #52
Comments
You sure it isn't just a connection issue? It looks like a flapping link. (In fact, that's what the router output says.) My environment is not reproducing it. |
No it was not a connection issue. After some times of restarting every 2 minutes (about 20 times) it remained connected. Now it says What about validation errors?
Is it normal? |
I've seen these in the wild. It just means some RPKI objects are simply invalid. For example, RFC 6493 clearly states that vCard version 3 is not valid, as far as the RPKI is concerned: https://datatracker.ietf.org/doc/html/rfc6493#section-5 I suppose it's possible Fort might be somehow misblaming these faults, in which case it would be helpful to analyze the objects, and see if they actually break the rules or not.
These suggest that the cache is not updating properly. This is why I suspect there might be some connection issue. How about this: Can you compress your cache directory, and somehow send it to me? If something is messing up Fort, I think the cache has a pretty high chance of being the culprit. Then try stopping Fort, deleting the cache, then start Fort again. Fort should regenerate the cache from scratch. What happens then? |
Here's a tgz of all FORT's stuff (I'll delete it in 24h) I also do this to clear cache :
but nothing changed , same errors in the syslog |
Ok, I think there's a misunderstanding here. Fort is an RPKI Validator. The point isn't just to send ROAs to the routers; the whole reason why you want an RPKI validator is because you want someone to also verify said ROAs, before they are sent to the routers. This involves a large laundry list of checks. The reason why Fort has two different types of logs (Operation logs and Validation logs) is because they're intended for different audiences. Operation logs are meant for the operator running Fort. Validation logs are meant for the people maintaining the RPKI repositories. I believe you are the former, but not the latter. When Fort dumps an error or warning in the validation logs, all it means is that it had to drop some ROAs or Router Keys, because the relevant RPKI objects weren't up to snuff. In some cases, you might argue that Fort is being too strict, and that's why we sometimes create incidences. But chances are, you don't care about that. If a ROA doesn't validate, it just doesn't validate. It needs to be thrown away. You don't want your routers to be making decisions off shit ROAs. The way I see it, it does look like the validation errors you're getting are perfectly normal. Here are some examples I found:
This manifest's CRL is
(One way of querying a CRL's valid dates is It seems the validation is correct; according to the log line, the CRL expired two hours prior to the time at which the log states the object validation was performed. (Just in case: You sure the node hosting Fort has its date set up properly? Maybe check the timezone too.)
It's not a lie. My browser does return 404 on https://chloe.sobornost.net/rpki/news-public.xml.
Also seems to be correct. It doesn't exist: $ rsync rsync://rpkica.mckay.com/rpki/MCnet/UEh2SAvdIgPsUFdv92RSSaNqBnY.mft potato.mft
rsync: link_stat "/MCnet/UEh2SAvdIgPsUFdv92RSSaNqBnY.mft" (in rpki) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1677) [Receiver=3.1.3]
I could go on and explain why I think the remaining error messages are probably fine, but you get the idea. Because it seems like you don't care about the validation logs, feel free to remove them. That's why On the other hand, if you think some of these errors should be turned into incidences, let's treat that as separate bugs. Ok, so that's that problem out of the way. Let's now think about the router link flapping. I still can't reproduce it, so would you be willing to run a custom version of Fort, which I'll set up to print loads of hopefully relevant information? |
Ok, just some errors can be ignored ... maybe they could be warning. So anyway it's working well. Thanks for the detailed explanation. |
Hypothesis: Fort hasn't finished the first validation during those first two attempts. So those "flaps" are actually retries. When Fort starts, it doesn't have any validation results to report, because it hasn't finished the first validation cycle. After the third retry, Fort is returning the validation data, so the link stays up. And the reason why I can't replicate it is because I'm using a different client, which behaves differently. Makes sense? Fort drops the following messages in the operation logs before and after the first validation is completed:
Do these line up with your router flaps? |
OS Debian 10.10
Linux rpki 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
fort 1.5.1
config.json attached
config.json.txt
Every two minutes router sessions are restarted (max uptime is two minutes).
The first line is a fort-1.5.1 at about two minutes.
The second line is a fort-1.4.2 whose session is not restarted.
show validation session
me@mx>Session State Flaps Uptime #IPv4/IPv6 records
172.20.21.33 Up 55 00:01:58 227887/43730
172.30.37.33 Up 3 06:41:53 227887/43730
show validation session
me@mx>Session State Flaps Uptime #IPv4/IPv6 records
172.20.21.33 Up 55 00:01:59 227887/43730
172.30.37.33 Up 3 06:41:54 227887/43730
show validation session
me@mx>Session State Flaps Uptime #IPv4/IPv6 records
172.20.21.33 Up 56 00:00:00 227887/43730
172.30.37.33 Up 3 06:41:55 227887/43730
me@mx>show validation session
Session State Flaps Uptime #IPv4/IPv6 records
172.20.21.33 Up 56 00:00:01 227887/43730
172.30.37.33 Up 3 06:41:56 227887/43730
I had to set http transfer-timeout to 120, to avoid lots of timeout error in validation-log (not too bad)
Still have lots of errors in validation-log , in different flavors (at startup, and at each refresh)
fort.log attached
fort.log
The text was updated successfully, but these errors were encountered: