-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs::File reads first 4096 bytes only on Linux v5.9 and v5.10 #3803
Comments
That sounds weird. I just tried the code you provided on my local Linux laptop and it seems to work fine. Does this happen on raspberry pi only? @wmanley Thoughts? Your PR was released in 1.6.0, so it's possible that it is related to this. |
Does it fail on files shorter than 4096 bytes? Does it fail on files longer than 4096 bytes? |
I've had a look through #3518 and I can't see how it might cause this issue. I'd been looking for a place where we incorrectly @br0adcast: Could you try reverting 39706b1 and see if that fixes it? |
I've tried to reproduce this on my Raspberry Pi with kernel 4.14, but it works fine for me. I've confirmed using |
Looks like we need more info to repro. |
I have not found this problem on the Linux systems I have, including the 32-bit ones i686 and 64-bit ARM. How i reproduce it:
|
Yes, reverting 39706b1 fixes the problem. strace:
|
Hmm, seems like a kernel bug. What a shame - I guess #3518 will have to be reverted. What filesystem is the file on? |
ext4 |
@br0adcast: Would you consider bisecting the kernel to find out when this broke? |
@wmanley: The problem occurs after upgrading the kernel from 5.4.81 to 5.10.0 and higher. At least, I have found it in the following kernels:
|
I just reproduced a bug in an i686 virtual machine with Arch Linux 32 + lts kernel Linux 5.10.32-1.0-lts #1 SMP Thu, 22 Apr 2021 07:53:37 +0000 i686 GNU/Linux |
I've tried to reproduce this with qemu by following https://ldpreload.com/blog/git-bisect-run . Code: use tokio::io::AsyncReadExt;
use std::ffi::CStr;
fn main() -> std::io::Result<()> {
if std::process::id() == 1 {
//std::fs::create_dir("/dev")?;
if unsafe{libc::mount(
CStr::from_bytes_with_nul(b"devtmpfs\0").unwrap().as_ptr(),
CStr::from_bytes_with_nul(b"/dev\0").unwrap().as_ptr(),
CStr::from_bytes_with_nul(b"devtmpfs\0").unwrap().as_ptr(),
0, std::ptr::null())} != 0 {
return Err(std::io::Error::last_os_error());
};
}
let res = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(test());
match res {
Ok(true) => println!("Success\n"),
Ok(false) => println!("Failed\n"),
Err(e) => println!("Error: {:?}\n", e),
}
if std::process::id() == 1 {
unsafe{libc::reboot(libc::LINUX_REBOOT_CMD_POWER_OFF)};
}
Ok(())
}
async fn test() -> std::io::Result<bool> {
let mut file = tokio::fs::File::open("init").await?;
let mut contents = vec![];
file.read_to_end(&mut contents).await?;
println!("File size = {}", contents.len());
Ok(contents.len() != 4096)
} Compile with:
And build initrd with:
Kernel built with:
Run with:
The test passes. Maybe I need a real filesystem, and not just initrd. |
Yes, it fails with ext4. Created ext4 partition with file:
Modified the test program to mount this fs and inspect this file: use tokio::io::AsyncReadExt;
use std::ffi::CStr;
fn main() -> std::io::Result<()> {
if std::process::id() == 1 {
//std::fs::create_dir("/dev")?;
if unsafe{libc::mount(
CStr::from_bytes_with_nul(b"devtmpfs\0").unwrap().as_ptr(),
CStr::from_bytes_with_nul(b"/dev\0").unwrap().as_ptr(),
CStr::from_bytes_with_nul(b"devtmpfs\0").unwrap().as_ptr(),
0, std::ptr::null())} != 0 {
return Err(std::io::Error::last_os_error());
};
std::fs::create_dir("/myext")?;
if unsafe{libc::mount(
CStr::from_bytes_with_nul(b"/dev/sda\0").unwrap().as_ptr(),
CStr::from_bytes_with_nul(b"/myext\0").unwrap().as_ptr(),
CStr::from_bytes_with_nul(b"ext4\0").unwrap().as_ptr(),
0, std::ptr::null())} != 0 {
return Err(std::io::Error::last_os_error());
};
}
let res = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(test());
match res {
Ok(true) => println!("Success\n"),
Ok(false) => println!("Failed\n"),
Err(e) => println!("Error: {:?}\n", e),
}
if std::process::id() == 1 {
unsafe{libc::reboot(libc::LINUX_REBOOT_CMD_POWER_OFF)};
}
Ok(())
}
async fn test() -> std::io::Result<bool> {
let mut file = tokio::fs::File::open("/myext/myfile").await?;
let mut contents = vec![];
file.read_to_end(&mut contents).await?;
println!("File size = {}", contents.len());
Ok(contents.len() != 4096)
} Run with partition attached as drive:
Now to bisect... |
Bisecting with script
and commands:
Points to torvalds/linux@efa8480: "fs: RWF_NOWAIT should imply IOCB_NOIO" If that isn't a smoking gun then I don't know what is. Hmm, I've just tested the recently released v5.12 and it succeeds there, so maybe it's already been fixed. I'll bisect to find the fix. |
It was fixed in 5.11: torvalds/linux@06c0444 neither commit message mentions anything about bugs. The fix seems incidental. I'll also run some tests to see if this is limited to 32-bit systems. |
So given those commits, which kernel versions are affected? I'm guessing we can add a check for the kernel version on the first call and fall back to the old behavior on kernels with the bug. |
If the 'preadv' function returns 0, we can try to do a regular read after that, to double check if it is an EOF or a kernel bug. if 0 == bytes_read {
Err(std::io::Error::from_raw_os_error(libc::EAGAIN))
} else if bytes_read < 0 {
Err(std::io::Error::last_os_error())
} else {
/* preadv2 returns the number of bytes read, e.g. the number of bytes that have
* written into `unfilled`. So it's safe to assume that the data is now
* initialised */
dst.assume_init(bytes_read as usize);
dst.advance(bytes_read as usize);
Ok(())
} |
I've just tested and I can confirm that the bug exists on x86-64 too, so it's not just a 32-bit issue.
5.9 and 5.10 are broken according to my testing. Otherwise it works fine. I've only tested mainline linux, not any of the stable branches. I also can't speak to what's in the distro kernels. I'll run the test on a few stable kernels to see if they're affected too.
That would certainly work around the bug, but would incur thread synchronisation overhead that #3518 was intended to avoid. |
Stable kernel testing: 5.6.19: OK So there's nothing on the stable branch that affects this issue. Regarding checking the kernel version, something like this should do the job: fn has_buggy_preadv2() -> bool {
let mut uts : libc::utsname;
let res = unsafe{libc::uname(&mut uts as *mut utsname)};
if res < 0 {
// Getting kernel version failed, assume that it's buggy
return true;
}
uts.release[..4] == b"5.9." || uts.release[..5] == b"5.10."
} But I would like to talk to the kernel devs first. |
Sure, let me know what the kernel devs have to say. |
I'm running into the same problem on my amd64 machine with kernel 5.10.0-6-amd64 (Debian 5.10.28-1) and (edit) using an XFS filesystem. I observed that when the file that is being read is already in the page cache (e.g. newly written files, or test by running |
Here's the email: https://lore.kernel.org/linux-fsdevel/fea8b16d-5a69-40f9-b123-e84dcd6e8f2e@www.fastmail.com/T/#u . There's been no response yet, so I guess we should just work around it. I'll prepare a PR. |
So how are we going to test for kernel versions? |
@br0adcast: Would you mind changing the issue title to "fs::File reads first 4096 bytes only on Linux v5.9 and v5.10". That way it'll be easier for people to find this bug. |
@carllerche by calling EDIT: Oh, I didn't see @Darksonn's PR, which is exactly doing that. |
Sorry, I meant, how are we going to add multiple kernel versions to CI to test for kernel specific regressions in the future :) |
Just to make sure everyone heard about it: We released |
@praseodym wonderful. It doesn't matter with 4096 bytes(I have verified it and use 32,64,128,,, 4096, they all return 0 if I drop cache before preadv2 with RWF_NOWAIT flag). Look kernel commit fs: RWF_NOWAIT should imply IOCB_NOIO, with the change allowing read-ahead for IOCB_NOWAIT, we changed the RWF_NOWAIT semantics of only doing cached reads. So if we use sync or echo 1 to drop cache interface, then we will get 0. I plan to write a regression test to ltp community for this. |
To save the next person before they fall foul of it. See <https://lore.kernel.org/linux-fsdevel/fea8b16d-5a69-40f9-b123-e84dcd6e8f2e@www.fastmail.com/T/#u> and <tokio-rs/tokio#3803> for more information. Signed-off-by: Will Manley <will@williammanley.net> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Can this be closed now? The commit in question was reverted in 3fbcf1b |
Version
└── tokio v1.6.0
└── tokio-macros v1.2.0 (proc-macro)
Platform
Raspberry Pi 32-bit
Linux raspberrypi 5.10.17-v7l+ #1414 SMP Fri Apr 30 13:20:47 BST 2021 armv7l GNU/Linux
Description
After upgrading tokio from 1.5 to 1.6 only 4096 bytes can be read from the file
I tried this code:
and I get File size = 4096 for any file.
The text was updated successfully, but these errors were encountered: