Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FileStream] update Position after ReadAsync completes #56093

Closed

Conversation

adamsitnik
Copy link
Member

While working on the blog post I've realized that there is a difference in when the position is updated for FileStream.ReadAsync()

The Unix strategy updates the position after the operation completes:

// This implementation updates the file position after the operation completes, rather than before.

IMO we should align both implementations.

We can treat it as an edge case bug fix for cases where file length would get updated (in case of a file shared for writing) between the call to Length:

and the call to ReadFile:

(SafeFileHandle.OverlappedValueTaskSource? vts, int errorCode) = RandomAccess.QueueAsyncReadFile(_fileHandle, destination, positionBefore, cancellationToken);

I am soon going to provide the benchmark results but I don't expect any regressions. Actually, we might even notice a small improvement for small files, where we are going to omit the call to Length

@adamsitnik adamsitnik added this to the 6.0.0 milestone Jul 21, 2021
@adamsitnik adamsitnik requested review from stephentoub and jozkee July 21, 2021 14:02
@ghost
Copy link

ghost commented Jul 21, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

While working on the blog post I've realized that there is a difference in when the position is updated for FileStream.ReadAsync()

The Unix strategy updates the position after the operation completes:

// This implementation updates the file position after the operation completes, rather than before.

IMO we should align both implementations.

We can treat it as an edge case bug fix for cases where file length would get updated (in case of a file shared for writing) between the call to Length:

and the call to ReadFile:

(SafeFileHandle.OverlappedValueTaskSource? vts, int errorCode) = RandomAccess.QueueAsyncReadFile(_fileHandle, destination, positionBefore, cancellationToken);

I am soon going to provide the benchmark results but I don't expect any regressions. Actually, we might even notice a small improvement for small files, where we are going to omit the call to Length

Author: adamsitnik
Assignees: -
Labels:

area-System.IO

Milestone: 6.0.0

@adamsitnik
Copy link
Member Author

There is +- 4% gain for reading small files:

Method Toolchain fileSize userBufferSize options Mean Ratio Allocated
ReadAsync \pr\corerun.exe 1024 1024 Asynchronous 103.4 us 0.96 5 KB
ReadAsync \main\corerun.exe 1024 1024 Asynchronous 107.8 us 1.00 5 KB

// touch the file pointer location at all. We will adjust it
// ourselves, but only in memory. This isn't threadsafe.
_filePosition += destination.Length;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that any Windows code being ported from .NET Framework to .NET 6 that was issuing concurrent reads based on position will now be broken. Just want to confirm that's intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already announced that for FileStream with buffering enabled, which is the default setting so I think it's OK: #50858

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to enable consistency in the other direction: always update the position before the operation. Net result of this would be that on Unix we'd need to also patch the position back after a read if fewer bytes were read. But if we're saying that you can't trust the position until after the operation completes anyway, maybe that's ok and a smaller break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to enable consistency in the other direction: always update the position before the operation

On Unix, it would require us to perform a sys-call to obtain the file length (which we should not cache on Unix, because file locking is only advisory and others can still write to file) to make sure that the position is not >= EOF. If we would set _position > EOF, the next write operation could create a sparse file. Sample:

File.WriteAllBytes(path, new byte[100]);
using var fs = new FileStream(path, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite, bufferSize: 0, FileOptions.Asynchronous);
fs.ReadAsync(new byte[123]); // no await, the position is 123
fs.WriteAsync(new byte[] { 1 }); // the byte would get written at position 123, not 100

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub you are right that this could break users who issue concurrent reads in .NET Framework. With #50858 we break them only in a way that the reads are not concurrent, they are serialized. With this change they could perform multuple reads at the same offset without knowing that. I am going to close the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change they could perform multuple reads at the same offset without knowing that. I am going to close the PR

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I still wonder if we should change the Unix behavior, though, to first update the position and then patch it back after the read completes. It would seem to enable more compatibility with existing Windows code, even though with the Unix implementation either way you shouldn't be relying on the Position while there's an active async operation in flight.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the fact that we used to pretend that issuing concurrent reads and writes is OK with non-thread-safe FileStream. In a perfect world, we would always update the position after the operation completes and give compiler errors for invalid usage.

But getting back to your question I think that we can only choose the lesser evil here:

  • If we update the position after read completes, we have great perf (single sys-call), but the users who are misusing FileStream in the following way:
    • issuing multiple concurrent reads when buffering is disabled and the reads are not awaited: they end up reading content from the same offset (silent error)
    • issuing write before async read operation completes: they end up overwriting the file content (silent error)
  • If we update the position before read starts using current file length:
    • everyone has to pay for the perf penalty (additional sys-call)
    • if someone has extended the file in the meantime we still need to patch the position. Perhaps we could use some opportunistic caching anyway?
  • If we update the position before the read starts assuming that we are going to read entire buffer (_position += buffer.Length):
    • there is no perf penalty (we are not checking the file length up-front)
    • issuing concurrent reads should be fine, as long as pread returns 0 when reading from offset > EOF
    • issuing write before async read operation completes: we might end up with a sparse file. This should be extremely rare, as this is extreme misuse.

@stephentoub which option do you like the most? I am going to verify the pread result for offset > EOF in the meantime

Copy link
Member

@stephentoub stephentoub Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably guessed given my previous comments, but I think option 3 (update position before read starts and then backpatch it if necessary) is the least bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the fact that we used to pretend that issuing concurrent reads and writes is OK with non-thread-safe FileStream

100% agree. If we were doing it from scratch today, we would not do that.

@karelz
Copy link
Member

karelz commented Jul 21, 2021

Http3_MsQuic test failures are tracked in #56090. Should be fixed and merged shortly. Sorry for the inconvenience!

@adamsitnik adamsitnik requested a review from stephentoub July 22, 2021 07:15
@adamsitnik adamsitnik closed this Jul 27, 2021
@adamsitnik adamsitnik deleted the updatePositionAfterReadCompletes branch July 27, 2021 14:13
@ghost ghost locked as resolved and limited conversation to collaborators Aug 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants