-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the recommended File IO API usage pattern with regards to ext4 delayed allocations? #32600
Comments
you can do |
I could certainly call Beyond these examples I would expect many, many more libraries are omitting calling Flush(true) without a way to intervene. Every such usage is in violation to POSIX guarantees as kernel/file system developers would argue. They all can result in correct metadata and missing file contents when the system crashes / loses power.
Given my environment, I'm willing to pay the "expensive" part to make sure the weird limbo state between no file and a correct file with contents doesn't occur—especially as it avoids bricked embedded instrumentation. If the status quo has to remain (I really think it shouldn't), could there be an opt-in mechanism to make Flush always behave as Flush(true) on FileStream? For the above library, I have moved its storage folder to a tmpfs as I don't need the file persistent. This of course is a (lucky) special case. For the status quo to be on the safe side it looks like I have to turn off delayed allocations. This is of course an important performance optimization and helps avoiding file system fragmentation in ext4… |
Triage: When making an async flush the cost of the call is going to happen in the background so it isn't equal to the cost of calling |
I've taken a closer look at the issue and the referenced blog post (https://lwn.net/Articles/322823/). I really like this comment: fd = open("file.new",O_TRUNC|O_WRONLY|O_CREAT);
write(fd, "hi", 2);
close(fd);
rename("file.new", "file");
If the rename can happen before the data is written, and we get an empty file as the end result I am inclined to say that it's not a .NET bug. The article also mentions the patch that solved the problem:
I was curious to see if I can repro it using Ubuntu 18.04 and an ext4 drive with default mount settings ( adam@adsitnik-ubuntu:~/projects/repros/FileMoveExt4$ cat /proc/mounts | grep test_ext4
/dev/sdb1 /media/adam/test_ext4 ext4 rw,nosuid,nodev,relatime,data=ordered 0 0 Using the following app: using System;
using System.IO;
namespace FileMoveExt4
{
class Program
{
static void Main(string[] args)
{
string directoryPath = args[0];
string srcFilePath = Path.Combine(directoryPath, $"src_{Guid.NewGuid()}");
string dstFilePath = Path.Combine(directoryPath, $"dst_{Guid.NewGuid()}");
File.WriteAllText(path: srcFilePath, contents: "abc");
File.Move(srcFilePath, dstFilePath);
Console.WriteLine($"The file content: \'{File.ReadAllText(dstFilePath)}\'");
}
}
} And I was not able to repro it: adam@adsitnik-ubuntu:~/projects/repros/FileMoveExt4$ dotnet run -- "/media/adam/test_ext4"
The file content: 'abc' @Tragetaschen could you please try my repro app? Is there any chance that you are using an old kernel version that does not contain the mentioned patch? Or am I doing something wrong? |
For me, it was a one-time only event, but I don't have access to any of the development environments anymore. The most important bit was, that the embedded instrument just loses power by pulling the cable frequently. Any tests that do not kill the machine will probably work fine because eventually ext4 will of course write everything as expected if there's power to do it. Given the date of the OP, this was either 4.19.y or 5.4.y (likely the latter - I pretty much migrated at the same time frame). But since the patch is from 2009, its inclusion should be almost guaranteed. But note that the patch first runs the rename operation and only after that allocates the blocks. I don't know if this is still the case, but this would keep the small window open still. |
@Tragetaschen I read your comment in 12830 |
We are rather not going to change the current @Tragetaschen @TheCollegedude would that solve the problem for you? |
I would argue that not properly fsync'ing is the macOS approach to performance optimization, but I absolutely get that with the breadth of platforms .NET runs on it's too much to ask. :) Optimizing with these very specific semantics in POSIX in some file systems like ext4 has done a lot of damage over the years. I'm still not sure I understand the performance argument, especially since fsync is part of the async dispose path, but this doesn't matter. The app config switch sounds like a very viable option here to make sure .NET in its entirety can ensure the necessary POSIX semantics when needed. |
I would on the other hand expect
I don't think it is: runtime/src/libraries/System.Private.CoreLib/src/System/IO/FileStream.cs Lines 502 to 507 in 3f0c2cc
runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs Line 814 in 3f0c2cc
runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/OSFileStreamStrategy.cs Lines 112 to 121 in 3f0c2cc
Let me start my Linux machine and run some benchmarks for you |
…and this nicely summarizes the entire argument about what POSIX guarantees and what it doesn't. Yes, this is what most people (sane people?) would expect, but in corner cases like system crashes or power losses, it's possible that certain data has made into persistent storage and some hasn't because close doesn't actually write it - fsync does.
I think I found it two years ago, but the code has changed a lot with the introduction of strategies.
Just make sure: I get that not fsync'ing is faster than fsyncing - no need to show this. |
I wanted to make sure that we know the actual difference. Benchmark code: private const string FilePath = "FileUtils.GetTestFilePath.txt";
[GlobalSetup]
public void Setup() => File.WriteAllBytes(FilePath, new byte[1024]);
[GlobalCleanup]
public void Cleanup() => File.Delete(FilePath);
[Benchmark]
public bool OpenClose()
{
using (FileStream fileStream = File.OpenWrite(FilePath))
{
return fileStream.IsAsync; // return something just to consume the stream
}
}
[Benchmark]
public bool OpenFlushToDiskClose()
{
using (FileStream fileStream = File.OpenWrite(FilePath))
{
fileStream.Flush(flushToDisk: true);
return fileStream.IsAsync;
}
}
[Benchmark]
public bool OpenWriteClose()
{
using (FileStream fileStream = File.OpenWrite(FilePath))
{
fileStream.WriteByte(1);
return fileStream.IsAsync;
}
}
[Benchmark]
public bool OpenWriteFlushToDiskClose()
{
using (FileStream fileStream = File.OpenWrite(FilePath))
{
fileStream.WriteByte(1);
fileStream.Flush(flushToDisk: true);
return fileStream.IsAsync;
}
} Results (using ext4 file system): BenchmarkDotNet=v0.13.1.1786-nightly, OS=ubuntu 18.04
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22263.22
Flushing to disk when nothing was actually written is very expensive (x5), but flushing to a file after writing a single byte seems to not add any overhead (it actually does the opposite?!). Based on this findings I believe that we should perform more benchmarking (flushing after writing various number of bytes to the file, using SSD and non-SSD disks) and based on the results re-consider adding flushing to disk for certain file systems. We already know how to fetch the file system type: runtime/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs Line 468 in da2cf17
|
I have an embedded Linux platform with some flash storage where the file system is ext4 with defaut mount options. I'm running an ASP.NET Core web server with server-side Blazor and this per default stores encryption keys as XML file in the home directory. Recently, I got a system where that XML file had zero length and my application crashed when establishing the Blazor connection.
For reference, the code to store the XML file is here:
https://github.com/dotnet/aspnetcore/blob/v3.1.1/src/DataProtection/DataProtection/src/Repositories/FileSystemXmlRepository.cs#L135-L161
The above code synchronously writes to the file and then renames it from a temporary name to the final name. My file system contained the final file name, but with zero length. The ext4 default options include
data=ordered
so metadata should be written after file contents, but delayed allocations are active. In this case, the zero length file is THE sign for missing fsync(2)/fdatasync(2) invocations. For reference, here is a LWN article describing what looks to be the issue: https://lwn.net/Articles/322823/Being an embedded system instead of a proper shut down, power just gets cut frequently and anything not persisted on disk is (expected to be) lost, but the above code pattern properly disposes, so I expect any of three states: no file, a temporary file with contents, or the final file with contents. For example, I never had issues with the SQLite database (except when the file system journaling was disabled by accident…)
Is the above code snippet recommended in combination with ext4 delayed allocations?
Is everything properly fsync'ed in that code path?
Is there a missing pit of success here?
The text was updated successfully, but these errors were encountered: