-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FileStream] update Position after ReadAsync completes #56093
[FileStream] update Position after ReadAsync completes #56093
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsWhile working on the blog post I've realized that there is a difference in when the position is updated for The Unix strategy updates the position after the operation completes: runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/UnixFileStreamStrategy.cs Line 45 in 7c72b72
IMO we should align both implementations. We can treat it as an edge case bug fix for cases where file length would get updated (in case of a file shared for writing) between the call to Line 41 in b823f14
and the call to Line 55 in b823f14
I am soon going to provide the benchmark results but I don't expect any regressions. Actually, we might even notice a small improvement for small files, where we are going to omit the call to
|
There is +- 4% gain for reading small files:
|
// touch the file pointer location at all. We will adjust it | ||
// ourselves, but only in memory. This isn't threadsafe. | ||
_filePosition += destination.Length; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that any Windows code being ported from .NET Framework to .NET 6 that was issuing concurrent reads based on position will now be broken. Just want to confirm that's intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have already announced that for FileStream
with buffering enabled, which is the default setting so I think it's OK: #50858
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other option is to enable consistency in the other direction: always update the position before the operation. Net result of this would be that on Unix we'd need to also patch the position back after a read if fewer bytes were read. But if we're saying that you can't trust the position until after the operation completes anyway, maybe that's ok and a smaller break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other option is to enable consistency in the other direction: always update the position before the operation
On Unix, it would require us to perform a sys-call to obtain the file length (which we should not cache on Unix, because file locking is only advisory and others can still write to file) to make sure that the position is not >= EOF
. If we would set _position > EOF
, the next write operation could create a sparse file. Sample:
File.WriteAllBytes(path, new byte[100]);
using var fs = new FileStream(path, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite, bufferSize: 0, FileOptions.Asynchronous);
fs.ReadAsync(new byte[123]); // no await, the position is 123
fs.WriteAsync(new byte[] { 1 }); // the byte would get written at position 123, not 100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub you are right that this could break users who issue concurrent reads in .NET Framework. With #50858 we break them only in a way that the reads are not concurrent, they are serialized. With this change they could perform multuple reads at the same offset without knowing that. I am going to close the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change they could perform multuple reads at the same offset without knowing that. I am going to close the PR
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I still wonder if we should change the Unix behavior, though, to first update the position and then patch it back after the read completes. It would seem to enable more compatibility with existing Windows code, even though with the Unix implementation either way you shouldn't be relying on the Position while there's an active async operation in flight.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the fact that we used to pretend that issuing concurrent reads and writes is OK with non-thread-safe FileStream
. In a perfect world, we would always update the position after the operation completes and give compiler errors for invalid usage.
But getting back to your question I think that we can only choose the lesser evil here:
- If we update the position after read completes, we have great perf (single sys-call), but the users who are misusing
FileStream
in the following way:- issuing multiple concurrent reads when buffering is disabled and the reads are not awaited: they end up reading content from the same offset (silent error)
- issuing write before async read operation completes: they end up overwriting the file content (silent error)
- If we update the position before read starts using current file length:
- everyone has to pay for the perf penalty (additional sys-call)
- if someone has extended the file in the meantime we still need to patch the position. Perhaps we could use some opportunistic caching anyway?
- If we update the position before the read starts assuming that we are going to read entire buffer (
_position += buffer.Length
):- there is no perf penalty (we are not checking the file length up-front)
- issuing concurrent reads should be fine, as long as
pread
returns0
when reading fromoffset > EOF
- issuing write before async read operation completes: we might end up with a sparse file. This should be extremely rare, as this is extreme misuse.
@stephentoub which option do you like the most? I am going to verify the pread
result for offset > EOF
in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably guessed given my previous comments, but I think option 3 (update position before read starts and then backpatch it if necessary) is the least bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the fact that we used to pretend that issuing concurrent reads and writes is OK with non-thread-safe FileStream
100% agree. If we were doing it from scratch today, we would not do that.
Http3_MsQuic test failures are tracked in #56090. Should be fixed and merged shortly. Sorry for the inconvenience! |
While working on the blog post I've realized that there is a difference in when the position is updated for
FileStream.ReadAsync()
The Unix strategy updates the position after the operation completes:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/UnixFileStreamStrategy.cs
Line 45 in 7c72b72
IMO we should align both implementations.
We can treat it as an edge case bug fix for cases where file length would get updated (in case of a file shared for writing) between the call to
Length
:runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.cs
Line 41 in b823f14
and the call to
ReadFile
:runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/AsyncWindowsFileStreamStrategy.cs
Line 55 in b823f14
I am soon going to provide the benchmark results but I don't expect any regressions. Actually, we might even notice a small improvement for small files, where we are going to omit the call to
Length