-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ping class responding Timedout not reading response when ICMP Time-to-live exceeded. #73232
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
I would label as 'area-System.Net' and 'os-linux' but dont have write permissions. |
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionWith reference to #61465 it appears that the fix already produced is not working, At least on linux containers. I read the above issue and it matched my scenario so was expecting update my framework/sdk's and move on. I noted that the fix was not included in 6.0.7 so I pushed on to preview .net 7. Reproduction StepsNote: its a few hops from the container to my outer WAN address. 192.168.1.1 IPV6I dont have facility to test IPV6. Code used.The code suggested for a test from Issue #61465 was used with the ttl changed to suit my environment.
Dockerfile file.Used to generate the container Visual Studio generated and nothing untoward here.
Expected behaviorWorks on windows. It does. Works in linux container: Actual behaviorDocker log fileStatus: TimedOut Address: 0.0.0.0 Please note: I tried running container with docker network default and again with Regression?No response Known WorkaroundsNo response ConfigurationDev environmentWindows 10 Pro 21H2 19044.1826 Other informationLinux command line on runing containerI installed the necessary utilities.
'172.17.0.1' being the docker I find it intresting the docker network gateway response works but not outside. However interesting that may be the test code fails in the Ping class at the docker gateway address hop 1. So that may be a red herring.
I also expected to see the above result as it forces traceroute to use ICMP messages (Which Ping.cs is doing) and it all works wonderfully via traceroute.exe :) Note: First three hops are docker and my equipment (I have dual WAN) , hop 4 is my entry point to the internet, hops 5-9 are EE mobile network internal class A addresses. Then are hops from UK to the USA. This is very normal for here.
So, the container is connected to the internet and its networking is performaing as expected. My conclusion is that the fix for issue #61465 is not enough, or Further informationWireshark data out and in from container code run. Request
Response
Please let me know if there is anything I can help with in resolving this.
|
There are two implementations of The other implementation spawns the |
@filipnavara thanks for the note: I have confirmed by cloning the internal My container log reveals at startup
I had already read the source code for I do hope I am doing something wrong, coz its easier to fix :) but I dont think so at this point. i ahve been looking at this for the best part of two days. I tried your suggestions anyway but got nowhere, too much of a linux newb there. I think It does not matter as that affects the branch of code not being used I.e. the call out to the ping utility. My code is definately entitled to call the Raw sockets branch of the code. |
May be worth of setting up complete unable repro with container @IainStevenson. I'm not sure if we would be able to reproduce otherwise. |
@wfurt thanks, I was jsut working on that very thing to help out. The project I am working right now is here: https://github.com/IainStevenson/network.monitor and if you clone it and checkout the It should take about 10-20 seconds tracing the route to 8.8.8.8 and then set off pinging it every 10 seconds for quite some time. The logic it runs is as follows. Performs a traceroute excersize to 8.8.8.8 by walking through max 30 hops (ttl) to find each host node on the route to that address. On windows (by starting netmon.cli) it does find each address that responds and then, pings all of them every 10 seconds. In my case thats about a doxens addresses. On linux it via the docker startup only finds 8.8.8.8 and pings that every 10 seconds. The data is output to the Please give me shout if you have any probs with that repo. BTW: I ahve a mac mini, I may get a chance to see if its working on there, but I know less about that than linux LOL - its too new. to me. |
OK that was quite easy to set up everything on my mac, its behaving ther same there too. Docker desktop for mac with linux containers, so its a linux specific problem. I will try and run code natively on mac next and see. |
Sorry folks, emergency stop. In that repo, the netmon.cli project from the network.monitor.sln which is executed via the docker compose startup is using net6 which as I understand it missed the boat for the last change anyway. In that repo is a small PingFixTest project which when you do a build and run of its docker file will get you the simple code I posted originally. I havent upgraded the whole repo yet to .net7 preview due to this bug/issue?. Let me know if you want that done and I will put in the work for teh whole repo. |
OK I built a docker image of pingfixtest and ran it as a linux image in docker for mac under net7 and it fails as described. So in all my environments: in linux it is a problem, on the mac its fine, on windows its fine. So what I did in reproduction standards is this; git clone repo
See that it fails
see that it works. I just repeated thse instructions on Windows and Mac and both wth linux containers in docker for OS. |
I am going to go out on a limb here and suggest this may be an endian problem. Windows Works, Mac works, Linux fails. The code paths in PingRawSockets.cs will result in a Timedout response in its calling code if the returned message Identifier is not the same as was configured for the socket. This fits the problem profile of the observed (in wireshark) message returning and being 'ignored' and resulting in a default timeout scenario. I am wondering if there is a way to test it with your working debug versions to see what is happening there. I cant get my environment to build in debug or I would have tested my theory already. I am also wondering how to do this without a debug environment ON a linux development host, which I dont have. I cant find enough of the source code to see if MemoryMarshal.Read is taking care of endien-ness or not. This is the only OS difference I can think of to this point, so I thought I'd share. |
Well I found my linux contaier is little-endian (same as windows) so maybe not. |
I'm trying to reproduce it under WSL. I do get similar symptoms but I still need to confirm that it's ultimately the same cause. |
@filipnavara I was wondering that myself, the common problem denominators so far are docker & linux. I will also work on a wsl environment here to see if I can eliminate docker from the environment. I am certaily learnig lots with this one :) |
What happens on my WSL is that the ReceiveFrom call doesn't even get the ICMP timeout replies. It just times out. |
While I was downloading a WSL distro I ran the pingfixtest code through a windows container on dokcer and it worked as expected. I modified my code to loop from hop 1 to 30 and exit on success. linux container
windows container
So that agreees with your results. Something about the linux environment which dives down into Ping.RawSockets.cs has a problem with the socket calls. |
@filipnavara I guess the next question is: Are you getting a reply showing up in wireshark that is getting lost, or is there genuinely nothing coming back? I am seeing replies coming back in. I can post my saved session for you? |
@filipnavara I think it may be an linux default firewall issue. I read here that this may help.
I am going to switch back to linux containers and try ad see if it the problem and if that helps. |
@filipnavara Ah how the thick plottens! I checked the firewall rules eventually by adding run arguemtns for docker to allow me to. {
"profiles": {
"PingFixTest": {
"commandName": "Project"
},
"Docker": {
"commandName": "Docker",
**"DockerfileRunArguments": "--cap-add=NET_ADMIN"**
}
}
} And installed iptables and sudo
and only to find its already accepting everything.
So I dug deeper and installed
So the container is seeing the packets arriving. I can only assume that 'ICMP time exceeded in-transit' equals TTLExpired |
@filipnavara OOO errr. this does not look good. :( I stepped up the detail and can see a laod of bad cksum on incoming data
|
@filipnavara Well the linux Ping utility also gets those so that is not a factor.
|
Triage: @IainStevenson it seems you are the only person hitting it. DO you have an isolated repro (e.g. in containers) that you could share. I don't see how else we can move forward here ... unless we missed something in the thread. |
This issue has been marked |
@karelz I can hit it on WSL so it's reproducible. I'm just not familiar enough with how raw sockets work on Linux though. I've only got as far as finding out that |
We we get nothing? (and I'm not sure I would trust WSL) |
It's WSL2, regular |
@wfurt @karelz The repo posted here has a solution in it that is isolated and produces the problem reliably. I apologise of the rather messy commentary above and I will repeat instructions here. You can ignore most of my solution code and hone in on the isolated problem repo within it called Which was written to proove the earlier bug fix does not work in the linux container context in the latest preview even though the change is included. The test repo was written from the suggestion by the bug fixer on how to test it. This following solution is setup to use .net7 preview 6 On a windows/mac terminal.
if you run that on a linux container in docker on your development host you see this.
If you run the same project code on a windows container you see this;
I have replicated this on a windows and a MAC host. There are comments above that show when you run the same code natively on your (windows/mac) host it works as expected. I understand that on the different base OS's things work diffeerently. I am jsut proving that Ping works when the underlying layer works. I have confirmed through experimentation that on the linux host it is traversing the code path that uses sockets rather than calling out externally to the linux OS. Therefore there is some problem with seeing the returning bytes from the pinged hsot in teh socket on linux hosts using the sockets layer. From what we all ahve observed the packetrs arearriving into the linux container host operating system AS THEY SHOUlD and do for ping working correctly on that container OS. IMHO Sockets is the problem, and or a problem in recognising the return data in the Ping socket handlers. My apologies if I am off the mark here. Basically I beleive my repo proves reliably that it it is broken. Please understand I will be delighted if its a code problem of mine. Personally I dont think this is a Ping problem but a deeper problem with the sockets layer on Linux containers when specifying a low value TTL. Just my 2 cents worth. |
To make life simpler I created a clean isolated version here |
Triage: Given that this is not a regression in 7.0, moving to 8.0. |
Some more info that may be of use. I enabled '.NET framework source stepping' in my Visual Studio -Tools/ Options menu in the Debugging / General settings. When I re-ran the test in the container it threw this Exception.
Which corresponds with the comment above from @filipnavara ReceiveFrom recieving no response Clearly this Sockets exception is trapped down below somewhere and surfaces at the Ping class as a Timedout response. Which suggests the problem lies within the Sockets library. I had a look there 'System.Net.Sockets.Socket.ReceiveFrom' and saw that it CAN emit 'NetEventSource' information but I dont know how to get that setup in my environment to find out what Sockets experiencing. |
Just so everyone is sure that something actually came back from the ICMP request, this is the 'tcpdump' of such a test;
BTW: The 'ICMP time exceeded in-transit' had me looking twice but that is what you get from a 'traceroute -I 8.8.8.8' |
I appreciate the debugging you do! I was already convinced that the TTL packet reach the Linux VM (confirmed by Sorry for the lack of feedback, I am hampered both by a lack of understanding of the raw socket details and health issues. |
@filipnavara YW. Sorry to hear about your health. I am a dog with a bone on this one. It's not a big earth shaking problem, but I dont like issues like this and worse I don't like not being able to solve it myself :) I am readind the Sockets code now to try and figure out a theory on why its happening but that code is scary ! Plus I have learned loads of new stuff. After 38 years in IT its been a rare treat in that respect. |
This may be because we call |
Description
With reference to #61465 it appears that the fix already produced is not working, At least on linux containers.
I read the above issue and it matched my scenario so was expecting update my framework/sdk's and move on.
I noted that the fix was not included in 6.0.7 so I pushed on to preview .net 7.
Reproduction Steps
Note: its a few hops from the container to my outer WAN address. 192.168.1.1
IPV6
I dont have facility to test IPV6.
Code used.
The code suggested for a test from Issue #61465 was used with the ttl changed to suit my environment.
Dockerfile file.
Used to generate the container
Visual Studio generated and nothing untoward here.
Expected behavior
Works on windows. It does.
Status: TtlExpired Address: 192.168.1.1
Works in linux container:
It fails with
Status: TimedOut Address: 0.0.0.0
Actual behavior
Docker log file
Status: TimedOut Address: 0.0.0.0
Please note: I tried running container with docker network default and again with
--network host
. It made no difference.Regression?
No response
Known Workarounds
No response
Configuration
Dev environment
Windows 10 Pro 21H2 19044.1826
Visual Studio 2022 Version 17.2.6
SDK .NET 7 preview 6 installed and allowed.
Docker Desktop 4.10.1 (82475) is currently the newest version available.
System.Net.Ping file version 7.0.22.32404.
Other information
Linux command line on runing container
I installed the necessary utilities.
'172.17.0.1' being the docker
bridge
network gateway.By now I expected to see the above result. Note it expires after the max hop (default 30) and does not even get the Success from the final hop to the address (15).
I find it intresting the docker network gateway response works but not outside.
However interesting that may be the test code fails in the Ping class at the docker gateway address hop 1. So that may be a red herring.
I also expected to see the above result as it forces traceroute to use ICMP messages (Which Ping.cs is doing) and it all works wonderfully via traceroute.exe :)
Note: First three hops are docker and my equipment (I have dual WAN) , hop 4 is my entry point to the internet, hops 5-9 are EE mobile network internal class A addresses. Then are hops from UK to the USA.
This is very normal for here.
So, the container is connected to the internet and its networking is performaing as expected.
My conclusion is that the fix for issue #61465 is not enough, or
Missing In Action
in this release for some reason.Further information
Wireshark data out and in from container code run.
Request
Response
Please let me know if there is anything I can help with in resolving this.
The text was updated successfully, but these errors were encountered: