-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileStream.Windows.cs can freeze on Interop.Kernel32.Readfile #26850
Comments
Do you actually mean an infinite loop (busy loop) ie code continually returns back to that jmp? |
Great questions, no the jump is frozen in place, not fluttering back and forth. Here is a minidump: As far as a simpler reproduction... I will see what I can do. Additional information is CPU is i7 4970K Intel, build is x64. |
Tracked it down to Process RedirectStandardInput and RedirectStandardOutput. If I had to guess, multiple dotnet hosts are writing to a central host but there is a file being read from. @adamsitnik Is my understanding correct? (Based on ConsoleHost -> SynchronousProcessOutputLoggerWithDiagnoser) |
It's not returning from the read on the file stream because this is actually a pipe to another process and the other process still has its end of the pipe open / that process hasn't gone away... so this code reading from the pipe is just going to sit blocked waiting for something more to read or for the pipe to close. |
Thanks Stephen. That makes sense but I wonder why it only affects my library call and ocassionally BenchmarkDotNet build in Travis CI. I will figure it out. If a Pipe is blocked, should it not throw an exception? I get one when trying to Read or Write a locked file handle in a stream... Why not another pipe? In assembly JMP should jump hit the supposed lock check and fall back? Anywho, I believe the issue is the old old old RedirectInput Output deadlock from synchronous reads. If it is a standard Redirect input and Redirect Output deadlock writing to a file or middleman. That middleman is then hung, which is what BenchmarkDotNet. All it needs is a flush and close on the input side. |
No. The whole point of doing a read on a pipe is to wait for data to read to be available. So it'll block until either data is available or the pipe is closed. |
Thanks Stephen, this sounds like an implementation related issue. I will close this ticket. Armed with this knowledge I should be able to resolve it now that I know for sure what I am looking for. Would StreamReader.Peek block? For example if the While loop was StreamReader.EndOfStream()._get() is the blocking call but does Peek? |
@houseofcat Could you please explain how to resolve this hanging issue? We met similar hanging in our CI build |
@frank-dong-ms It can be a hung process that blocks the read of the file as the preceding pipeline or file accessor has an exclusivity lock on writing. This normally doesnt occur these days except when using redirect input and output is being used - in some scenarios - it can. In my case, I created an infinite loop that exposed a small bug in BenchmarkDotnet that never released the TestHost, thereby never writing EOF to the output pipeline thus freezing the end output on read. It's a read call but Kernel32 is infinitely waiting to read and EOF wasnt coming. If you have some code, more than happy to take a look at a workaround. Note this method is used by ConsoleStream, so "file" is bad word for me to use? |
Thanks @houseofcat Our benchmark hanging only happens on CI of dotnet core 3.0 and not reproducible in local. I got below call stack when benchmark hangs for 20 minutes and looks exact as yours: I paste several CI builds with hangs below not sure if you have access to them: I have several questions:
Below is the entry point of our bench mark test: |
1.) If BenchmarkDotnet has a test running in an unreleased host (dubbed TestHost) and that never finishes - for programmatic reasons (i.e. a bug code side) - then fixing the bug allows that TestHost to finish. This will naturally release the stream (after publishing the EOF) and everything will work out. This only applies if the cause was some long running test that wasn't supposed to be long running or infinite looped occurs. Different machines process jobs and different rates - so a CI building node maybe much slower than your desktop for running code / tests. That could factor into the behavior difference. Maybe its not infinite - just a great deal slower than anticipated. This would usually cause a CI/CD pipeline to timeout and appear like the issue at hand. 2.) @stephentoub really pointed out that my reproduction code examples were trash. I dug and dug into the real code, optimizing and optimizing where I could. What really made a difference was fixing my hot infinite loop, but what made it much more noticeable was when I over encumbered the resources of the TaskScheduler with too many async Tasks that had hot body loops. Some cases it worked, some cases, it never got passed the first or second task essentially. This a common mistake for rookies and experts alike. Sometimes invoking 3.) I would have to deep dive your code and see what comes up. I would use logic first and if assuming its related to NetCore's behavior, isolate what is different code wise, with NetCore. Is it possible to "dockerize" the Travis CI/CD environment to run your code manually against docker locally where you can attach a debugger? You could try moving to NetCore3.1 - see if there is an adjustment to code or minor hotfix that has your issue solved, SDK 3.1.2/200 I think are the latest versions. We could look at the MSIL from the machine that built the code to run the benchmark and compare to local. This could provide some insight into the behavior differences. This maybe the most beneficial. I don't think 1 or 2 really apply looking at your suite of tests. I apologize I couldn't be more helpful. If I had to guess Iris test is frozen awaiting data inputs or writing data inputs to the Pipeline. Here is a similar issue: |
Tracked an infinite loop situation triggered by BenchmarkDotNet.
https://github.com/dotnet/corefx/blob/015ee21f6fe1eadad478d8c9409c95cf98c2b121/src/Common/src/CoreLib/System/IO/FileStream.Windows.cs#L1197
My project with Benchmark ready to go.
https://github.com/thyams/CookedRabbit
Opened an incident with BenchmarkDotNet and optimized their StreamReader loop:
dotnet/BenchmarkDotNet#830
Other issue is abated. Now it just never returns from FileStream.Windows.cs line 1197. That means the underlying issue was probably this to begin with.
Frozen on this:
Corresponding line:
Function Scope, Disassembly with Bytes:
The text was updated successfully, but these errors were encountered: