feat: recovering preHandleMetadata
failure from sniffing
#769
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
This PR tries to handle some connection failure caused by Fake DNS cache invalidation, which usually happens in a setup where some caching DNS servers (like ADGuard Home) are using Clash's Fake DNS server as their upstream, with certain policy-based-routing implemented by examining the result of DNS queries.
The cache invalidation would occur if either of the following is met
cache.db
is corrupted for any reason (like switching to Clash Premium back and forth. I am migrating from Premium to Meta recently)store-fake-ip
is set tofalse
and some caching DNS servers are yet to be synchronized with the new value.It is really hard to deny, that for simpler setups, the best solution is clearing the DNS cache everywhere, or mainly these caching DNS servers. But that would require human effort to maintain the status carefully, and even distract people. Compared to which, I prefer an automated solution like what is proposed here (honestly more of a workaround) since it costs less for more complicated scenarios like home-lab or small teams.
Approach
In this PR, we adapted the existing feature called "sniffing" to resolve the issue described above. This is done in the following steps:
preHandleMetadata
fails to find the reverse mapping of a destination IP address, do not exit that early. Instead, use some variables to delay the error reporting.TCPSniff
and we are just making it return a flag, indicating whether a domain is discovered.TCPSniff
returnstrue
, we clear the failure flag set in step 1 and continue the connection. Otherwise, give up, just like what we did before this PR.I have tested it locally and I am quite satisfied with this solution, as both "tls certificate error" and "no route to host" reported by Chrome or OS are much fewer when the whole policy-based-routing system is recovering from an unexpected restart.