-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Override Stream.CopyTo on FileStream #17978
Override Stream.CopyTo on FileStream #17978
Conversation
I assume that we are doing this because of the handcrafted implementation is faster. Could you please collect some numbers to prove it? |
For reference - dotnet/corefx#11569 was PR with the handcrafted async implementation. |
I should have included more details on my thoughts in the issue I opened. My specific thought was that at a minimum FileStream.CopyTo should delegate to CopyToAsync in the case where it's in async mode. FileStream is a strange beast in that it's configured into one of two modes at construction time, either sync mode or async mode, the difference being whether it's created to use overlapped I/O on Windows, which needs to be specified at the time the handle is created. Then all subsequent operations on the stream need to match, so if an async operation is performed on a sync stream, the implementation just queues a work item to the thread pool that calls the synchronous method, and conversely if a sync method is called on an async stream, it delegates to the XxAsync method and then blocks waiting for it to complete. There is overhead associated with setting up each async operation, but there's even more overhead when a sync method is called on an async stream, as you not only get that per-operation async overhead, you also get the overhead of the blocking waiting for the result. FileStream's CopyToAsync override is there to minimize the async overhead for each operation when in async mode. With CopyTo not overridden, it just uses the base Stream's CopyTo, which will end up repeatedly calling Read and Write, each of which on an async FileStream will end up incurring the maximum overhead. So at a minimum I was thinking we should do (pseudo-code): public override void CopyTo(Stream destination, int bufferSize)
{
if (_asyncMode && GetType() == typeof(FileStream))
{
CopyToAsync(destination, bufferSize).GetAwaiter().GetResult();
}
else
{
base.CopyTo(destination, bufferSize);
}
} so that in the case where CopyTo is used on an async FileStream, we only incur the blocking and async overheads once rather than per read and write operation. It's possible all of the additional duplication this PR provides also has benefits, but that's much less likely to be a given, and as Jan suggests would really need to be measured (as would my suggested change, but I'm more inclined to believe it'll have measurable impact). |
Thanks for the feedback. When I had my nose in this I was actually wondering where the perceived benefit was. Other than saving on a few jumps and bypassing some fairly inexpensive logic there's not much difference. Now knowing your thoughts, things make more sense but it does raise a question for me - and I now realise I should have asked more questions earlier ;) I wonder if this change is effectively promoting sync over async and would contribute further to FileStream being a "strange beast". I wonder if rather than silently helping people do the wrong thing it would be better to work on a way towards helping them not get into this position in the first place and at the same time work towards having a less complicated internal implementation. I'm just spitballing here but the type of thing I'm wondering is if having ISyncStream and IAsyncStream interfaces would help. The current Stream type would have to implement both interfaces implicitly however additional implementations (e.g. SyncFileStream, AsyncMemoryStream, etc) would implement one of the interfaces implicitly and the other explicitly. They'd then bring in "base" functionality through composition rather than inheritance. The explicit implementations could probably be kept trivial - I'm not sure how often anyone would want to cast between I*Stream types? Basically, I wonder if with this type of thing you'd end up with a new set of stream types that are more difficult to use incorrectly and at the same time have simpler internals. I would fully accept that if I cast an AsyncFileStream to ISyncStream that these synchronous operations are sub-optimal but at least I'd have made the conscious decision to do this. Put another way, the API is making it easier for me to do the right thing than it is to potentially make a mistake. Having said all this, I'm perfectly happy to continue with what's been asked for. |
The other sync and async methods on FileStream already do this, so this isn't introducing a new concept, rather it's just doing the consistent thing in CopyTo that also happens to be significantly more efficient when used in this situation. I don't think introducing new interfaces would provide a benefit in this case. |
Cool, understood. I'll update PR and post figures in the next day or so. |
@jkotas and @stephentoub, I've done some further testing and submitted a new PR based on the feedback above. Unfortunately, My machine isn't suitable for this level of performance testing due to the amount of background noise and I'd say someone else would have to run perf tests on this before reaching any level of confidence. From my perspective there is perhaps up to a 5% improvement with large files (10MB to 64MB) however there seemed to be a more noticeable degradation for smaller files (maybe ~40% slower for 64KB files). The tests I ran on the original PR did not show a marked difference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Interesting. I would have expected a larger impact than that. I'll take a look... |
👍 I'd be very interested to hear how things look for you |
So from what I can see, the problem with smaller files is that this change really doesn't buy much and incurs additional cost in the process. CopyTo{Async} by default uses an ~80K buffer, which means the "old" implementation would end up doing a single read. The benefits of CopyToAsync are that it does the equivalent setup for a read but then reuses that set up across all subsequent operations, so if it would have only done a single read, it's really not providing any benefit. And in the process it's incurring additional overheads, like turning |
Thanks for the clarification. Perhaps I should try adding a test for the streams length and if it's less than some multiple of the buffer size (figured out through testing) call through to the base implementation, otherwise call through to CopyToAsync? |
I agree. I've just done a little more testing on this on files ranging from 10KB to 1GB. today I'm seeing a slight degradation on the smaller files and a slight improvement with bigger files. The degradation today isn't as pronounced as I was seeing the other day, <10% (which proves my machine isn't suitable for this type of testing) and I'm also not sure it's worth adding additional complexity for a potentially tiny gain. I'll bow out of this one for now. |
Thanks for your efforts on this, @tdinucci. Let's go ahead and close this and the issue for now; seems like there's not much of a "there there". Good investigation, though. |
Addresses https://github.com/dotnet/corefx/issues/29479
This is relevant to Windows only, just as CopyToAsync is.
A PR for unit tests is about to be submitted to corefx.