Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process cancelled tasks from SerialStream.Unix read/write request queues when a timeout occurs #55188

Closed
wants to merge 1 commit into from

Conversation

dseeg
Copy link

@dseeg dseeg commented Jul 6, 2021

SerialStream.Unix creates SerialStreamIORequest tasks and adds them to read/write queues. These queues are normally processed by the private method DoIORequest when IOLoop encounters Interop.PollEvents.POLLIN or Interop.PollEvents.POLLOUT events.

If the serial line is quiet, POLLIN/POLLOUT events won't occur, so DoIORequest is never called. This results in SerialStreamIORequests building up in the read/write queues.

This pull request fixes this by calling a new private method ProcessCancelledIORequests that clears cancelled tasks from the front of the queue. It is called before TimeoutExceptions are thrown.

Resolves #55146

@ghost
Copy link

ghost commented Jul 6, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

SerialStream.Unix creates SerialStreamIORequest tasks and adds them to read/write queues. These queues are normally processed by the private method DoIORequest when IOLoop encounters Interop.PollEvents.POLLIN or Interop.PollEvents.POLLOUT events.

If the serial line is quiet, POLLIN/POLLOUT events won't occur, so DoIORequest is never called. This results in SerialStreamIORequests building up in the read/write queues.

This pull request fixes this by calling a new private method ProcessCancelledIORequests that clears cancelled tasks from the front of the queue. It is called before TimeoutExceptions are thrown.

Resolves #55146

Author: dseeg
Assignees: -
Labels:

area-System.IO

Milestone: -

@dnfadmin
Copy link

dnfadmin commented Jul 6, 2021

CLA assistant check
All CLA requirements met.

@dseeg
Copy link
Author

dseeg commented Jul 6, 2021

I created a the new method to specifically check for cancelled tasks rather than relying on just checking if the task was completed like DoIORequest does. My concern was if Task1 was cancelled but Task2 completed successfully and was waiting to be processed, calling DoIORequest to clear the the cancelled Task1 could pull the successfully completed Task2 from the queue prematurely.

Because this is working entirely with private members, I'm not sure how I would create automated tests for this.

@dseeg
Copy link
Author

dseeg commented Jul 6, 2021

@adamsitnik unsure who I should tag on this (so apologies in advance for the spam), but what can I do about these failing checks? They look like infrastructure failures, or failures stemming from infrastructure failures.
ex, The HTTP request to 'GET ...' has timed out after 100000ms.

@adamsitnik adamsitnik requested review from krwq, wfurt and jozkee July 9, 2021 16:36
@adamsitnik adamsitnik added this to the 6.0.0 milestone Jul 9, 2021
@adamsitnik
Copy link
Member

They look like infrastructure failures, or failures stemming from infrastructure failures.

I've hit the re-run button (which is available only for maintainers). I am afraid that the only other way of re-running the failed runs is by closing and re-opening the PR or by pushing anything to the repo (which kind of sucks)

@dseeg
Copy link
Author

dseeg commented Jul 9, 2021

Ha, totally understood. That's not entirely foreign to me either -- I run into similar problems at work.

If those are acceptable alternative methods, I can keep trying until it passes or something comes up that can't be traced back to a timeout.

@dseeg dseeg closed this Jul 9, 2021
@dseeg dseeg reopened this Jul 9, 2021
@terrajobst terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021
@jeffhandley
Copy link
Member

@krwq This PR is assigned to you for follow-up/decision before the RC1 snap. The checks did all pass after rerunning.

@adamsitnik
Copy link
Member

@krwq ping

@jozkee
Copy link
Member

jozkee commented Aug 11, 2021

@dseeg were you able to test this with a real serial device?

@dseeg
Copy link
Author

dseeg commented Aug 11, 2021

@jozkee I originally encountered the issue when I was using a Zigbee radio module connected to a Raspberry Pi via serial pins. I recreated the issue and tested my fix using an Arduino connected to a Raspberry Pi via USB.

I made a sample project to reproduce the issue if you have an Arduino, a Raspberry Pi, and some spare time: https://github.com/dseeg/SerialPortMemoryLeak

I'll admit that the exact conditions for this to occur for others are probably unlikely. In my case, it took some strangely written code and a 4 day weekend for me to become aware of it. Let me know if there's anything you would like me to explain, address, change, etc.

@@ -807,6 +811,22 @@ private static int DoIORequest(ConcurrentQueue<SerialStreamIORequest> q, Request
return 0;
}

private static void ProcessCancelledIORequests(ConcurrentQueue<SerialStreamIORequest> q)
Copy link
Member

@krwq krwq Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will be thread safe as is. Currently there is multiple threads which can enqueue and only one dequeuing. With this change you will have multiple threads potentially dequeueing. Since we do Peek & Dequeue here and IOLoop you might end up with one thread peeking and different one dequeuing. See i.e.

// assumes dequeue-ing happens on a single thread

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead consider calling this inside of the IOLoop in the idle state (make sure to write a comment about why we do that)

private static void ProcessCancelledIORequests(ConcurrentQueue<SerialStreamIORequest> q)
{
// assumes dequeue-ing happens on a single thread
while (q.TryPeek(out SerialStreamIORequest r))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider cleaning all completed instead, similarly as done here:

if (r.IsCompleted)
{
q.TryDequeue(out _);
// take another item since we haven't processed anything
continue;
}

@krwq
Copy link
Member

krwq commented Aug 12, 2021

sorry for late review, I missed this notification - consider pinging me on teams/e-mail or assign this to my backlog for more reliable response

@carlossanlop
Copy link
Member

@krwq thanks for helping review this.

@adamsitnik @jozkee I think we should move this to 7.0 for the following reasons:

I'll admit that the exact conditions for this to occur for others are probably unlikely. In my case, it took some strangely written code and a 4 day weekend for me to become aware of it.

  • @krwq mentioned a potential thread safety concern.
  • We have little time to get this verified. The CI most likely won't catch regressions, so testing will have to be done manually.
  • The issue this PR is fixing was already marked for 7.0.

I'll move it to 7.0, but if someone has a good reason to keep it in 6.0, please let us know in the comments.

@carlossanlop carlossanlop modified the milestones: 6.0.0, 7.0.0 Aug 13, 2021
@krwq
Copy link
Member

krwq commented Oct 5, 2021

@dseeg what's the status? there's some unaddressed feedback left here

@dseeg
Copy link
Author

dseeg commented Oct 5, 2021

@krwq I made changes a while back based on your feedback but I was unable test it in a way I that I felt comfortable with it. A few other things came up in life so this was brought down a few pegs on my priority list. I should have notified someone. Apologies.

I did get a ping about a week ago that suggests this may already have been resolved with the release of 6.0. I'll check and confirm.

If I find that it's still not fixed, I'll need to see what I can do to rebuild my physical testing environment.

@jeffhandley
Copy link
Member

I should have notified someone. Apologies

It was my mistake not ensuring we were following up in a timely manner. I appreciate your understanding on our delays.

I did get a ping about a week ago that suggests this may already have been resolved with the release of 6.0. I'll check and confirm.

If I find that it's still not fixed, I'll need to see what I can do to rebuild my physical testing environment.

Thanks for your help here! I'm going to mark the PR as a Draft for the time being. Let us know if you need anything in the meantime on this.

@jeffhandley jeffhandley marked this pull request as draft October 9, 2021 06:52
@jeffhandley
Copy link
Member

By the way, marking this as a draft gives us 30 days to conclude whether or not we should proceed with the PR before it's auto-closed as inactive.

@dseeg
Copy link
Author

dseeg commented Oct 10, 2021

@jeffhandley I'm going to just go ahead and close it. For some reason I'm unable to build the project as I was able to before and it's preventing me from testing this in a way I'd feel comfortable about it, and I'm having immense trouble finding the time to identify why the build process no longer works.

I'm also not satisfied with the changes I'm trying to test; I'm worried it could introduce performance issues. Which leads me to this snippet I found when looking at the contributing readme: Maintainers will not merge changes that have narrowly-defined benefits, due to compatibility risk. I think the benefits here are pretty narrow. As I mentioned before, I only encountered this issue because I was doing something in a silly way.

@dseeg dseeg closed this Oct 10, 2021
@jeffhandley
Copy link
Member

Thank you, @dseeg. Do you think we should also close the backing #55146 issue too?

I hope that this was still a positive experience for you overall and that you will consider submitting more pull requests into the .NET libraries. There are a quite a few other IO issues up-for-grabs at the moment (query) if you're interested.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Ports community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible SerialPort memory leak on Raspbian (all Unix?)
8 participants