-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.48 beta broken on mips due to copy_file_range EOVERFLOW #78979
Comments
So the problem is that indexmap's build script uses autocfg to try to discover the availability of some Rust features, and it is likely breaking on mips for some reason. See https://zulip-archive.rust-lang.org/182449tcompilerhelp/46175petgraphwrongnumberoftypearguments.html for some previous discussion. I suspect the next step here is going to be identifying the cause of the problem -- you should be able to get logs from the autocfg build script somewhere in the build directory, along the lines of |
There's no API change: indexmap-rs/indexmap@1.5.1...1.6.0 The error you're seeing is because See indexmap-rs/indexmap#144 and indexmap-rs/indexmap#151 for more context. The |
Thanks for the context, digging through the log I see:
The first occurence of this error "could not copy A to B: Value too large for defined data type (os error 79)" is for indexmap but it also happens for several other crates, so the problem is likely something else, I'll keep investigating.. |
If they're all in |
|
It's the stage1 build of indexmap that is failing, so annoyingly I can't reproduce this outside of the rustc-build context with just |
@the8472 You were the only one to make changes to |
I will try reverting 4ddedd5 to see if this fixes the problem, will report back in a few hours. |
I wonder if mips (or your filesystem) doesn't like the request for rust/library/std/src/sys/unix/fs.rs Line 1233 in 9722952
|
That was already there in 1.47. However the general gist of that commit seems to be to copy MAX bytes regardless of the size of the file, which perhaps is not allowed on mips or something? It seems like a weird thing to want to do, anyway. |
I would be fine disabling copy_file_range on mips (or perhaps all platforms) for 1.48 at least since it seems like it's causing problems, presuming that fixes the bug here. I wouldn't want to revert just the latest commit though since it seems like it does fix real bugs (or at least claims to). |
The commit description says "race conditions where a file is truncated while copying from it. if we blindly trusted the file size this would lead to an infinite loop" but this doesn't sound right since MAX is always greater than the actual file size, and we only read it once. It could help in the situation {X: where a file is appended to while copying from it}, but again the previous behaviour would only copy fewer bytes than wanted without going into an infinite loop. The other parts of the commit sound legit. I will report back whether the revert helps the issue, but I suspect changing the |
The previous implementation queried the file size via Here's the
That's comparing signed and unsigned types, I'm not familiar with the C integer rules but I guess that this just happens to work for the file offset of 0 and fails on a subsequent try if the first copy_file_range does a short copy, i.e. not the whole requested range. If that is correct then it could also fail on other platforms if a short copy is encountered. This might also be dependent on kernel versions since copy_file_range behavior changed several times. I have an improved implementation that should (almost) handle this case in #75272 I say almost because I assumed this could only happen with I assume lifting that entire PR to beta would be inappropriate. If so I can extract the |
It's read in a loop. If the truncate happens between the |
I can't see this - 4ddedd5 removed one call to I am guessing this is the cause of EOVERFLOW but I have yet to confirm this, I should just write a short program to test it. Where is your |
The prior code was capped by the file length, whereas
With 32-bit unsigned With 64-bit |
I think we should either disable it on all platforms or add general EOVERFLOW handling and lower the chunk size so that an overflow is less likely to happen for anything but very large file offsets.
The length is not "read" in the loop (I was confused by the word read, I thought you meant
That's from the kernel sources. |
Right, but your commit changes |
I did not say that that particular change solves the infinite loop case on its own. The change changes the semantics and that's why the name was changed from
Odd, mipsel is 32bit, right? So this should never be able to overflow unless the file seek offset is in the range |
It appears that #define _GNU_SOURCE
#include <unistd.h>
#include <limits.h>
#include <stdio.h>
size_t max = INT_MAX + 1;
int main() {
FILE *fa = fopen("a.txt", "rb");
FILE *fb = fopen("b.txt", "wb");
printf("calling copy_file_range with len %d\n", max);
if (copy_file_range(fileno(fa), NULL, fileno(fb), NULL, max, 0) != 0) {
perror(NULL);
}
}
|
That would be a problem. The libc crate lists it as unsigned https://rust-lang.github.io/libc/mipsel-unknown-linux-gnu/libc/type.size_t.html |
Sorry my bad, I miscoded, it should be #define _GNU_SOURCE
#include <unistd.h>
#include <limits.h>
#include <stdio.h>
size_t max = ((size_t)INT_MAX);
int main() {
FILE *fa = fopen("a.txt", "rb");
FILE *fb = fopen("b.txt", "wb");
printf("calling copy_file_range with len %p\n", max);
if (copy_file_range(fileno(fa), NULL, fileno(fb), NULL, max, 0) != 0) {
perror(NULL);
}
}
|
"Invalid argument" is not EOVERFLOW however, so I'll have to investigate some more. |
It's good to know either way, then I'll lower the chunk size further to 1GB. |
The current implementation already deals with EINVAL by falling back to the userspace copy implementation. This isn't ideal but shouldn't be a release blocker. The question is under which circumstances EOVERFLOW happens. @Mark-Simulacrum what's the release timeline? How much time do we have for a fix and how small should it be kept? Rendering copy_file_range inert would be a 1-line change. A proper fix takes some more effort but I have most of the needed code already in a different PR. |
Any fix would ideally land in the next 72 hours, roughly. Please ping me and @pietroalbini on relevant PRs. I suspect that given the timeframe just rendering this code inert is our best bet. |
As a slight aside, @jrtc27 mentioned to me that mips64 does sign-extension by default, and the I'll launch a build to confirm that |
@infinity0 yeah right now the size has no relevance, that will change again with #75272
Well, that's a funny way to put it, but yes.
Ok, will do. |
@rustbot claim |
This seems better than fully disabling
I'm not sure what you mean -- |
Well, setting the length to 1GB is mostly about avoiding the EINVAL that @infinity0 reported in #78979 (comment) The actual issue is EOVERFLOW for which we haven't pinpointed the exact cause yet. It could also be due to 32bit calculations on mipsel or older kernel versions so it might happen for any large files too. I intend to handle this properly in #75272 We can try lowering the chunk size if you want, but that's leaves a bit more room for error than outright disabling it. |
Created two PRs against beta, pick one or the other. For nightly I'll update #75272 to cover this case too and hope we'll get that merged soon(ish). |
Keep an eye on when that lands -- if it's after the next branch date, it may need backporting to 1.49-beta. |
The build on mips with disabling the flag succeeded https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=mipsel&ver=1.48.0%7Ebeta.8%2Bdfsg1-1%7Eexp3&stamp=1605339163&raw=0 I'm still waiting on shell access on a similar machine. |
[beta] always disable copy_file_range to avoid EOVERFLOW errors A bigger hammer as alternative to rust-lang#79007 Pro: will certainly fix the issue Cons: will disable copy_file_range for everyone Resolves rust-lang#78979
I couldn't reproduce the original EOVERFLOW issue on a mips machine with linux 4.19, the build succeeded even without patching away that flag. (So probably the existing EINVAL workaround was sufficient.) I'm trying to source a mips linux 5 box to try to reproduce it... |
@infinity0 It would be helpful if you could confirm that current beta branch (likely no release quite yet) fixes this issue. |
@Mark-Simulacrum with the flag set from true to false (as per #79008) the build succeeds, so I assume it works - sadly I don't have time to test the branch specifically before the Thursday release but I'm sure it'll be fine. |
Setting milestone to 1.49, we need to land #79008 on that branch as well. |
Assigning |
backported in #79838 -- I think this should be fixed? Closing. |
https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=mipsel&ver=1.48.0%7Ebeta.8%2Bdfsg1-1%7Eexp1&stamp=1605149146&raw=0
The text was updated successfully, but these errors were encountered: