Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nslookup fails inside a container using the default DNS server #216

Closed
doctorpangloss opened this issue Mar 30, 2022 · 26 comments
Closed

nslookup fails inside a container using the default DNS server #216

doctorpangloss opened this issue Mar 30, 2022 · 26 comments
Assignees
Labels
bug Something isn't working Networking Connectivity and network infrastructure

Comments

@doctorpangloss
Copy link

Describe the bug
nslookup fails inside a container.

To Reproduce

docker run -it --rm mcr.microsoft.com/windows/servercore:ltsc2022
Microsoft Windows [Version 10.0.20348.587]
(c) Microsoft Corporation. All rights reserved.

C:\>nslookup www.google.com
Server:  UnKnown
Address:  172.30.224.1

C:\>nslookup www.google.com 8.8.8.8
Server:  dns.google
Address:  8.8.8.8

Non-authoritative answer:
Name:    www.google.com
Addresses:  2607:f8b0:4005:80e::2004
          216.58.195.68
*** UnKnown can't find www.google.com: Server failed

Expected behavior
nslookup should work against the default DNS server.

Configuration:

  • Edition: Windows 11 21H2
  • Base Image being used: mcr.microsoft.com/windows/servercore:ltsc2022
  • Container engine: docker
  • Container Engine version:
Client:
Cloud integration: v1.0.22
Version:           20.10.12
API version:       1.41
Go version:        go1.16.12
Git commit:        e91ed57
Built:             Mon Dec 13 11:44:07 2021
OS/Arch:           windows/amd64
Context:           default
Experimental:      true

Server: Docker Desktop 4.5.1 (74721)
Engine:
 Version:          20.10.12
 API version:      1.41 (minimum version 1.24)
 Go version:       go1.16.12
 Git commit:       459d0df
 Built:            Mon Dec 13 11:42:13 2021
 OS/Arch:          windows/amd64
 Experimental:     false

Additional context

  • Restarting the Docker daemon does not resolve the issue.
@doctorpangloss doctorpangloss added the bug Something isn't working label Mar 30, 2022
@ghost ghost added the triage New and needs attention label Mar 30, 2022
@cwilhit cwilhit added Networking Connectivity and network infrastructure and removed triage New and needs attention labels Mar 31, 2022
@cwilhit
Copy link
Contributor

cwilhit commented Mar 31, 2022

Thanks for opening this and providing repro steps. I've confirmed the repro and have opened MSFT internal 38776581 for reference.

@michbern-ms
Copy link
Contributor

There are a few articles about how Windows nslookup fails if the primary DNS fails, even when the second DNS is working fine:

https://defaultroot.com/index.php/2019/10/08/nslookup-default-behaviour-during-failover-of-primary-dns/#:~:text=Windows%20nslookup%20will%20always%20use%20the%20primary%20DNS,in%20the%20nslookup%20command%3A%20So%20all%20is%20well%21
https://social.technet.microsoft.com/Forums/en-US/b1977a50-c482-4daf-b113-63e87b9430d3/secondary-dns-does-not-resolve-160-the-nslookup-requests-windows-customer-160-when-the-primary

The second article notes that ping is a better basic test of DNS.

@doctorpangloss Just so that we can understand severity, is this a blocking issue for you or are you noting an unexpected behavior that is not blocking? Thanks!

@doctorpangloss
Copy link
Author

Sounds good if this is a general Windows issue...

@doctorpangloss
Copy link
Author

doctorpangloss commented Feb 1, 2023

I am reopening this because it seems almost every Windows container user encounters it

@lippertmarkus
Copy link

I'm having this issue since the last Windows Update to 10.0.22621.1105.

@jsturtevant
Copy link

does Resolve-DnsName work? My understanding is that Resolve-DnsName is the preferred tool for DNS lookups according to the networking team due to a difference in resolvers between the two. We call out using Resolve-DnsName in the kubernetes docs: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#dns-windows

On Windows, there are multiple DNS resolvers that can be used. As these come with slightly different behaviors, using the Resolve-DNSName powershell cmdlet for name query resolutions is recommended.

@lippertmarkus
Copy link

unfortunately not

@doctorpangloss
Copy link
Author

Just so that we can understand severity, is this a blocking issue for you or are you noting an unexpected behavior that is not blocking? Thanks!

It's really hard to say. There's a real issue here. For example would a golang application use the same mechanism as nslookup or Resolve-DNSName?

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
no assignees, please provide an update or close this issue.

3 similar comments
@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
no assignees, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
no assignees, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
no assignees, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@sbangari, @MikeZappa87, please provide an update or close this issue.

3 similar comments
@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@sbangari, @MikeZappa87, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@sbangari, @MikeZappa87, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@sbangari, @MikeZappa87, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@sbangari, @MikeZappa87, please provide an update or close this issue.

@burkhat
Copy link

burkhat commented Sep 15, 2023

We've the same behaviour in our Kubernetes Environment with Windows Nodes.
nslookup doesn't work, Resolve-DnsName works without any problem.

@sam-sla
Copy link

sam-sla commented Sep 15, 2023

We are also facing name-resolution issues since approximately one week ago. I think these two issues are all related to the same #386 and #420

@MikeZappa87
Copy link

MikeZappa87 commented Oct 23, 2023

The primary dns server is defaulting to the default gateway, by any chance did you intend to do that? The secondary dns servers will resolve. However a work around that will allow this to work is:
Create a new docker nat network:
docker network create -d "nat" --subnet "10.240.0.0/24" -o com.docker.network.windowsshim.disable_gatewaydns=true natgw

docker run -it --rm --net=natgw mcr.microsoft.com/windows/servercore:ltsc2022

You could possibly try deleting the nat network and creating it with the options above as well. Let me know if this works!

@MikeZappa87
Copy link

Unfortunately the approach of disabling the default gateway resolves the issue with resolving DNS queries however it breaks the internal docker DNS that resolves the containers ip by the container name.

@davhdavh
Copy link

davhdavh commented Dec 11, 2023

Any actual workarounds for this?

> docker run -it --rm mcr.microsoft.com/windows/servercore:ltsc2022
> ipconfig -all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : 34eba5103fea
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter
   Physical Address. . . . . . . . . : 00-15-5D-14-4B-41
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::585b:6496:8770:bb8e%4(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.17.91.239(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . : 172.17.80.1
   DHCPv6 IAID . . . . . . . . . . . : 67114333
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2D-08-2A-B6-00-15-5D-14-4B-41
   DNS Servers . . . . . . . . . . . : 172.17.80.1
                                       192.168.2.253
   NetBIOS over Tcpip. . . . . . . . : Disabled

Ping 172.17.80.1 => FAIL
Ping 192.168.2.253 => FAIL
Ping 8.8.8.8 => SUCCESS
nslookup using 172.17.80.1 => FAIL
nslookup using 192.168.2.253 => FAIL
nslookup using 8.8.8.8 => SUCCESS

I tried:

@florianehmke
Copy link

Facing the same issue, any updates here? I have no workaround and for me the issue appeared out of nowhere.

@grcusanz
Copy link

grcusanz commented Mar 20, 2024

A lot of this is mentioned in bits and pieces above, but here is what's going on, which I confirmed in our lab:

  1. The Windows implementation of nslookup uses its own internal implementation of the DNS protocol and will only query one DNS server. By default this is the first DNS server in the list. This differs from nslookup on Linux which will retry with other servers in the resolv.conf file.

  2. For containers the first DNS server in the list and the one used by nslookup by default is the Docker DNS resolver. This is required in order to resolve container IPs by their name. Unfortunately due to the Docker DNS issue Zappa linked up above the Docker resolver does not currently forward requests to an external DNS server. This is in the process of being fixed in the Moby repo.

Given the above, our general recommendation is to use the Resolve-DnsName cmdlet instead. Resolve-DnsName uses the built-in DNS client for the OS which will retry with all of the available DNS servers and unlike nslookup also works with configurations that use newer DNS technologies such as DNSSEC, DNS-over-HTTP (DoH) and DNS-over-TLS (DoT). This is the best way to determine if DNS is functioning within the container. Any application that relies on Windows to do the DNS lookup will get the same behavior as Resolve-DNSName.

If you really want to use nslookup you can, but be aware of the above limitations.

You may also be having DNS connection issues outside of the container host. To narrow that down, use pktmon to trace the packet to see if it leaves the container host in the correct format. In my environment 8.8.8.8 is blocked somewhere on the network. I can confirm it is not an issue with the Windows Container host by doing the following:

On the container host:

   PS C:\> pktmon filter remove
   PS C:\> pktmon filter add -t tcp -p 53
   PS C:\> pktmon filter add -t udp -p 53
   PS C:\> pktmon start --capture

In the container:

   PS C:\> nslookup bing.com 8.8.8.8
   DNS request timed out.
       timeout was 2 seconds.
   Server:  UnKnown
   Address:  8.8.8.8

   DNS request timed out.
       timeout was 2 seconds.
   DNS request timed out.
       timeout was 2 seconds.
   DNS request timed out.
       timeout was 2 seconds.
   DNS request timed out.
       timeout was 2 seconds.
   *** Request to UnKnown timed-out

Back on the container host:

   PS C:\> pktmon stop
   PS C:\> pktmon etl2txt PktMon.etl
   PS C:\> notepad pktmon.txt

In notepad I can look at the Appearance # to find the last appearance of the DNS request packet, check which component it was last seen on, and confirm that the packet looks correct:

   [02]0000.0000::2024-03-20 14:13:45.720095500 [Microsoft-Windows-PktMon] PktGroupId 281474976710677, PktNumber 1, Appearance 14, Direction Tx , Type Ethernet , Component 6, Edge 1, Filter 2, OriginalSize 80, LoggedSize 80 
   	00-15-5D-C8-8E-16 > E8-B5-D0-2C-24-40, ethertype IPv4 (0x0800), length 80: 10.127.130.152.59312 > 8.8.8.8.53: 1+ PTR? 8.8.8.8.in-addr.arpa. (38)

Further down in the file I can see that Component 6 is the ethernet adapter:

   [00]1D7C.0AFC::2024-03-20 14:13:59.067489700 [Microsoft-Windows-PktMon] Component 6, Type Miniport , Name netvsc.sys, Microsoft Hyper-V Network Adapter #2 
   [00]1D7C.0AFC::2024-03-20 14:13:59.067489900 [Microsoft-Windows-PktMon] Property: Component 6, PhysAddress  = 0x00155DC88E16 
   [00]1D7C.0AFC::2024-03-20 14:13:59.067490200 [Microsoft-Windows-PktMon] Property: Component 6, NdisMedium  = Ethernet  

Whenever the last appearance is the ethernet adapter it's safe to assume the packet left the machine. I verified that the IP addresses are correct, and that the destination MAC address is the physical ethernet switch. Since I never see a response in the pktmon log I know that I never received a reply on the container host.

I will leave this issue open for a few days longer, if anyone can show a pktmon log that suggests the container host dropped the packet incorrectly (not including the Moby issue above), I can look into that. If not, I'll close this issue.

@doctorpangloss
Copy link
Author

It makes sense to me if there were documentation somewhere that nslookup should simply not be used on Windows.

@grcusanz
Copy link

@doctorpangloss Thanks for the suggestion, I've submitted a PR to the container networking docs with this information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Networking Connectivity and network infrastructure
Projects
None yet
Development

No branches or pull requests