Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting many threaded processes breaks asynchronous read operations when using WaitForExit(int) #108395

Open
leppie opened this issue Sep 30, 2024 · 9 comments
Labels
area-System.Diagnostics.Process needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@leppie
Copy link
Contributor

leppie commented Sep 30, 2024

Description

While modernizing IronScheme tooling, I have upgraded my basic test runner from .NET 2.0/3.5 to newer frameworks.

The runner is simple, it just creates a process and run some native Scheme tests and then checks the output. For this I use asynchronous read operations.

To speed up tests, I decided to just run tests in parallel.

This is however when things started to break.

Data from StandardOutput and StandardError started getting truncated.

I also use the WaitForExit overload with a timeout (in ms) parameter. The process does not timeout however and exits normally.

Further testing indicated this bug has been around since .NET 4.0 as running it on .NET 3.5 works correctly.

However when using WaitForExit() without any arguments (or -1 aka Timeout.Infinite), no versions has this issue.

The issue also only seems to popup reliably when using more than threads than half or more than your processor count (16 logical on my desktop, fails with 8, but fine for 6 threads, does the same on a CI box with only 1 logical processor, 2 or more in this case).

Reproduction Steps

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net2.0;net472;net8.0;net9.0</TargetFrameworks>
    <LangVersion>9.0</LangVersion>
  </PropertyGroup>
</Project>
using System;
using System.Diagnostics;
using System.Text;
using System.Threading;
using System.IO;

const string host = "1.1.1.1";

int threads = args.Length > 0 ? int.Parse(args[0]) : Environment.ProcessorCount; // pc/2 = safer, pc/2-2 = almost always safe, pc = occasionally safe, pc*2 = never safe

int dc = threads;

var done = new ManualResetEvent(false);

Console.WriteLine($"Pinging {host} with {threads} threads.");

object LOCK = new object();

for (int i = 0; i < threads; i++)
{
    var pc = i;
    ThreadPool.QueueUserWorkItem(s => 
    {
        try
        {
            StartPing(host, pc);
        }
        catch (Exception ex)
        {
            lock(LOCK)
            {
                Console.WriteLine($"Ping {pc} error: {ex.Message}");
            }
        }
        finally
        {
            lock (LOCK)
            {
                dc--;

                if (dc == 0)
                {
                    done.Set();
                }
            }
        }
    }, pc);
}

done.WaitOne();

Console.WriteLine("All done");
Console.ReadLine();

// just any process that has some multiline output that does not finish instantly
void StartPing(string host, int pc)
{
    var output = new StringWriter();

    using var p = new Process
    {
        StartInfo = new ProcessStartInfo
        {
            FileName = "ping.exe",
            RedirectStandardOutput = true,
            StandardOutputEncoding = Encoding.UTF8,
            UseShellExecute = false,
            CreateNoWindow = true,
            Arguments = host
        }
    };

    p.OutputDataReceived += (s, e) =>   
    {
        lock (LOCK)
        {
            output.WriteLine(e.Data);
        }
    };

    lock (LOCK)
    {
        Console.WriteLine($"Ping {pc} start.");
    }

    if (!p.Start())
    {
        throw new Exception("bad");
    }

    p.BeginOutputReadLine();
    
    var exited = p.WaitForExit(int.MaxValue); // fails on all but .NET 2
    //var exited = true; p.WaitForExit(); // works fine on all

    if (!exited)
    {
        throw new Exception("didnt exit");
    }

    if (p.ExitCode != 0)
    {
        throw new Exception($"ExitCode = {p.ExitCode}");
    }

    var outputTxt = output.ToString().TrimEnd(Environment.NewLine.ToCharArray());

    if (!outputTxt.Contains("Average"))
    {
        throw new Exception($"Truncated output: \"{outputTxt}\"");
    }

    lock (LOCK)
    {
        Console.WriteLine($"Ping {pc} done.");
    }
}

Expected behavior

All threads should complete with expected output.

Actual behavior

Some threads has truncated output.

Regression?

Yes, works on .NET 2.0/3.5.

Tested on .NET 4.7.2, .NET 8.0 and .NET 9.0 on Windows, fails.
Tested on .NET 9.0 Linux, fails. (use ping -c 4 1.1.1.1 and check for avg in output)

Known Workarounds

No response

Configuration

Windows 10. Ubuntu Linux. Code is run in release.

Other information

Excuse the crudity of my repro code (and also has to compile for .NET 2). Also tried to keep any other issues from intereferring.

As I dont really bother with multithreaded code, it is not really obvious what caused this regression.

This code is likely the issue.
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Windows.cs#L180

Update:

Fails under Linux too and has same 'suspect' code.
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs#L208

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 30, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Sep 30, 2024
@leppie
Copy link
Contributor Author

leppie commented Sep 30, 2024

cc @stephentoub

@jkotas jkotas added area-System.Diagnostics.Process and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Sep 30, 2024
@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Oct 1, 2024
@jeffhandley jeffhandley added this to the Future milestone Oct 1, 2024
@leppie
Copy link
Contributor Author

leppie commented Oct 2, 2024

I just hit this again with Exec in MSBuild. And that was just a single task. Both Windows and Linux.

But I might have found a reason this happens.

My StandardOutput printed everything. But everything that was printed was just a single line to Console.Out at a time.
My StandardError missed a whole last section.

It is basically printing 3 multiline strings to Console.Error. The first 2 were shown, but not the last one.

I suspect that was the same as I before, and by luck, ping on both Windows and Linux also seems to do some multiline print at the end.

I have no idea what threads/threadpool pressure have to do with this.

@jkotas

This comment was marked as off-topic.

@am11
Copy link
Member

am11 commented Oct 4, 2024

cc @tmds, @adamsitnik, for insights / improvement ideas.

@tmds
Copy link
Member

tmds commented Oct 4, 2024

This is what WaitForExit does:

https://github.com/dotnet/runtime/blob/8d85ada66dad8019f308bae3a6311bf2282e8a05/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs#L203C23-L214

That is: if you specify a timeout, it does NOT wait for the output to have been fully received.

This has always been the behavior for .NET Core, and from the report it is also the behavior for .NET 4.7.2 (but perhaps not .NET 2.0/3.5).

After WaitForExit completed with a timeout, you can call it again with the infinite timeout to wait for all output to be received.

Or, if you don't want to wait indefinitely to receive the output, it is also possible to call WaitForExitAsync with a CancellationToken that times out. The CancellationToken will also be used while waiting for the output to be fully received.

@leppie
Copy link
Contributor Author

leppie commented Oct 6, 2024

After WaitForExit completed with a timeout

The issue has nothing to do with timeouts. The process exits normally (specifically why I used int.MaxValue).

you can call it again with the infinite timeout to wait for all output to be received.

Can this bit of info be added to docs?

@tmds
Copy link
Member

tmds commented Oct 6, 2024

After WaitForExit completed with a timeout

I meant: returns when called with a timeout argument.

@jkotas
Copy link
Member

jkotas commented Oct 6, 2024

you can call it again with the infinite timeout to wait for all output to be received.

Can this bit of info be added to docs?

The docs say "To ensure that asynchronous event handling has been completed, call the WaitForExit() overload that takes no parameter after receiving a true from this overload.".

It does the exact same thing as calling the method with an infinite timeout. except that it is more succinct code.

@jkotas

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Diagnostics.Process needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

5 participants