Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileStream.Flush(true) doesn't flush device buffer on macOS, while it does so on Windows #28444

Closed
maxim-saplin opened this issue Jan 17, 2019 · 25 comments · Fixed by #44443
Closed
Assignees
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@maxim-saplin
Copy link

Actual:
FileStream.Flush(OS) calls fsync() on macOS, on Windows it calls FlushFileBuffers().

fsync() - doesn't force device's write buffer flush, Window's FlushFileBuffers() does.

Expected:
macOS's implementation of the Flush() method forces device's buffer flush via fcntl(F_FULLFSYNC)

@JeremyKuhne
Copy link
Member

From what I can tell, fsync() does flush on Linux, so this is only a Mac issue. Marking this one up-for-grabs if anyone wants to pull together a PR.

@ptoonen
Copy link
Contributor

ptoonen commented Mar 20, 2019

I'll take a look at this one. As far as I see, even though a simple fsync with the F_FULLFSYNC parameter could be enough, although fnctl provides a better guarantee. According to OSX' man fsync man page:

Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence.

Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written. The disk drive may also re-order the data so that later writes may be present, while earlier writes are not.

This is not a theoretical edge case. This scenario is easily reproduced with real world workloads and drive power failures.

For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLFSYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more detail.

Since we don't know what application is being written, I tend to agree on @maxim-saplin's thoughts on using fcntl.

One footnote there is that fcntl may not guarantee this on APFS and that it may have a performance impact.

F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems. The operation may take quite a while to complete. Certain FireWire drives have also been known to ignore the request to flush their buffered data.

Are we sure we want to include this? Or maybe make it optional? Then there's the question if we're going to solve it in the Unix interop using an ifdef statement or create an entire new implementation.

@JeremyKuhne
Copy link
Member

Are we sure we want to include this?

I think it is the right thing to do, but I'm totally open to counter arguments. Consistency with legacy behavior here has been the main design goal.

Or maybe make it optional?

Maybe in the future we should allow dialing this in? FileStream needs some serious work, I'd like to see a more thorough overhaul with a fully rationalized set of configuration options I think.

Then there's the question if we're going to solve it in the Unix interop using an ifdef statement

That would be where I would start. It's the common pattern. I presume we can look for F_FULLSYNC being defined?

@stephentoub
Copy link
Member

Has this caused real problems for apps using .NET Core? We shouldn't make it significantly slower to achieve "consistency with legacy behavior" that's affecting very few people.

@maxim-saplin
Copy link
Author

maxim-saplin commented Mar 21, 2019

@stephentoub it did cause trouble with my perf. measuring app when on Windows I noticed that Flush() and Flush(true) show 10x difference in performance while there was not difference on macOS.
I think the issue here is not about slowdowns (someone going extra bit and explicitly putting this parameter would expect the slowdowns) it's that someone wanting resilient writes and calling Flush(true) would get them on Windows but not on macOS.

Though I might agree that scenarios where this inconsistency can be critical are very rare. If the IO part would require solid overhaul fixing this issue is not worth introducing new ones.

@stephentoub
Copy link
Member

someone going extra bit and explicitly putting this parameter would expect the slowdowns

This is a key part of my concern: it's not actually pay for play as you suggest, because FlushAsync() does this. If we change that, such that the only way you pay this cost is if you opt in via Flush(true), then I'm fine with it. But that's obviously a behavioral change as well.

@ptoonen
Copy link
Contributor

ptoonen commented Mar 21, 2019

@maxim-saplin @stephentoub imho this change should make it even though it makes it slower. Setting the true bit, means you probably do want to ensure it's synced to disk and don't want the 'loosey goosey' promise that OSX currently makes with an fsync. I'd say that if people already expect that performance hit (because they explicitly chose to set the bit), it won't be a very big issue.

Maybe the behavior change can be implemented when FileStream is revised as @JeremyKuhne implicated would happen?

@stephentoub
Copy link
Member

I'd say that if people already expect that performance hit (because they explicitly chose to set the bit), it won't be a very big issue.

As I mentioned, I'm ok with the perf hit on macOS when explicitly calling Flush(true). I'm more concerned about it when just using FlushAsync().

@ptoonen
Copy link
Contributor

ptoonen commented Mar 21, 2019

Ah, I missed that one. Yes, that would be an issue - especially if you're actually awaiting that flush. What do you propose here? We could create a new explicit FlushAsync(FlushToDisk) version but that would imply changing the existing behavior. This wouldn't be my first choice.

The other option I see is to change the InterOp behavior of Flush(true). This however would imply either changing behavior for all UNIX systems to use Fcntl in favor of FSync or moving the function to a Linux and OSX specific version.

Option two would be my preferred option.

@stephentoub
Copy link
Member

What do you propose here?

The viable options as I see them are:

  1. Do nothing.
  2. Change FlushAsync() to do the equivalent of Flush() not Flush(true), and then make Flush(true) as strong as we desire.
  3. Change the Flush(true) behavior and leave FlushAsync() as is, which would mean some internal reworkings and having both mechanisms available for the implementation to call.

For (2), the reasoning descriped in the comment in the src for why it's ok to do so with FlushAsync() but not Flush() seems flawed. And as to my knowledge there's no asynchronous equivalent at the OS level exposed for flushToDisk==true, I wouldn't want to expose a FlushAsync(true)… if a developer wants to Task.Run(() => Flush(true)), they can do so.

My preference is (2). We just need to convince ourselves that we're ok in a major release taking the behavioral change that FlushAsync no longer has the stronger flushToDisk semantics.

@ptoonen
Copy link
Contributor

ptoonen commented Mar 31, 2019

Right, am back home. Gave it some thought after seeing your response and I guess that might be the best option. I am living a bit in the past where I'm still weary of breaking changes but with .NET core that really isn't the same anymore :)

I've implemented the change but am seeing some unexpected test failures which I'm trying to pinpoint. Will update as soon as I've solved that.

@Anipik
Copy link
Contributor

Anipik commented May 4, 2019

@ptoonen are you still working on it ?

@ptoonen
Copy link
Contributor

ptoonen commented May 4, 2019

@Anipik yes I am, but am only slowly making progress here. One issue at a time. I hope to have it working somewhere within the next two weeks.

@Anipik
Copy link
Contributor

Anipik commented May 4, 2019

okay sounds good

@danmoseley
Copy link
Member

@ptoonen how is it going? Just wondering whether it misses 3.0 at this point.

@ptoonen
Copy link
Contributor

ptoonen commented May 31, 2019 via email

@danmoseley
Copy link
Member

@stephentoub you had a concern above about (2) being potentially a breaking change. We do have one more preview left in which breaking changes are possible - Preview 7. But it's the last one. Say we take this change in a week, would you feel that we have sufficient runway to validate this change is not unacceptably breaking before final release?

@stephentoub
Copy link
Member

@danmosemsft, it's "breaking" from a machine-wide reliability perspective: we'd stop forcing a flush to disk in a case where we were previously doing so. No amount of previews are going to tell us whether that's a problem or not. We just have to make a call. I'm ok with it. I think it's a mistake that FlushAsync was previously doing the same thing as Flush(true) when Flush() is equivalent to Flush(false).

I will go ahead and make the FlushAsync change now, separating it from this issue.

@ptoonen
Copy link
Contributor

ptoonen commented Jun 2, 2019

I did not succeed today. Although the change seems incredibly easy (and I think it may be), there are some tests that fail where I don't expect them to. If anyone else has any ideas, I'd gladly hear them. Otherwise I'm afraid I'm going to have to let this one go.

@stephentoub
Copy link
Member

@ptoonen, which tests? Can you share your branch? Someone might be able to pick up where you left off. Thanks!

@carlossanlop
Copy link
Member

@ptoonen are you still working on this? Do you have a branch you can share?

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@carlossanlop
Copy link
Member

@ptoonen I'll remove the assignment. If you get a chance to take a look at this again, feel free to reassign the issue to yourself.

@jozkee
Copy link
Member

jozkee commented Nov 7, 2020

My preference is (2). We just need to convince ourselves that we're ok in a major release taking the behavioral change that FlushAsync no longer has the stronger flushToDisk semantics.

@stephentoub I would personally prefer suggestion 3 over 2; that is keep using fsync for FlushAsync() and change Flush(true) to use fcntl(F_FULLFSYNC). Wouldn't be better to just avoid the breaking change in FlushAsync? I guess the downside would be that we wouldn't be consistent between sync and async.

@stephentoub
Copy link
Member

stephentoub commented Nov 7, 2020

The consistency is key. Stream.Flush() / FlushAsync() should be functionally equivalent except for the one piece that one does sync writes and the other async writes. They shouldn't have different levels of strength / reliability. But more importantly, FlushAsync() doing the equivalent of Flush(true) (which is not part of Stream) means it's failing at its one goal, to be async.

@stephentoub
Copy link
Member

stephentoub commented Nov 7, 2020

Actually, I already fixed that part, a year and a half ago:
dotnet/coreclr#24902
It's still frustratingly doing synchronous work, but that's covered by a different issue.

This issue is now just about making Flush(true) behave however is desired.

@ghost ghost closed this as completed in #44443 Nov 11, 2020
@jozkee jozkee modified the milestones: Future, 6.0.0 Nov 11, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 14, 2020
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
10 participants