Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle impossible errors from the kernel with error.Unexpected instead of unreachable #6389

Open
jorangreef opened this issue Sep 21, 2020 · 10 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@jorangreef
Copy link
Contributor

On Linux, e.g. in os.zig the std lib will often interrogate errno and map to Zig errors, or unreachable if the std lib wants to assert that the std lib implementation would never cause an EINVAL or EFAULT, e.g.:

EINVAL => unreachable,
EFAULT => unreachable,

However, while implementing #6356, I was about to follow this pattern but then I realized that the kernel often overloads errors in new kernel versions, which is particularly the case for the io_uring syscalls.

This means that we might think our std lib implementation cannot cause EINVAL, and then the kernel adds a new feature which could, leading to undefined behavior instead of a safe error.

In other words, we need to start going through the std lib and make this usage of unreachable an anti-pattern because there's no way we can assert what the kernel can or cannot be returning like this.

@jorangreef jorangreef changed the title unsafe error handling in the std lib of overloaded kernel errors std: unsafe error handling of overloaded kernel errors Sep 21, 2020
@jorangreef jorangreef changed the title std: unsafe error handling of overloaded kernel errors std: unsafe handling of errors overloaded by the kernel Sep 21, 2020
@daurnimator daurnimator added the standard library This issue involves writing Zig code for the standard library. label Sep 21, 2020
@jorangreef
Copy link
Contributor Author

jorangreef commented Sep 21, 2020

Either that, or we need an unreachable which doesn't become undefined behavior in fast builds, or another separate keyword. To be clear, I don't mean to suggest it's necessarily a good idea to add keywords...

@pfgithub
Copy link
Contributor

Either that, or we need an unreachable which doesn't become undefined behavior in fast builds, or another separate keyword.

I think @panic() does this

@Rocknest
Copy link
Contributor

Rocknest commented Sep 21, 2020

Do you have any example of a syscall that have changed its behaviour, except ioring stuff? If its specific to that feature, then do not use this pattern with these apis.

@jorangreef
Copy link
Contributor Author

jorangreef commented Sep 21, 2020

I haven't looked, but it's not about changing behavior so much as adding behavior, i.e. two error states (one old, one new) map to one error code, and Zig only checks one error state (because that was the only error state that existed at the time in the man pages) or assumes unreachable.

@rohlem
Copy link
Contributor

rohlem commented Sep 23, 2020

I don't know much about Linux; maybe I'm spouting nonsense, but couldn't the opposite, hypothetically, be true as well? If a syscall is "upgraded" to loosen its preconditions, then what was previously an error now succeeds. If the program relies on that logic, it breaks semantically (which is its own class of undefined behaviour).

To me it seems like the actual solution would be to instead put an upper limit on the supported kernel version (so do a version check and @panic at run time if it is too high/new). While it's a shame supporting every new release would mean having to manually verify none of the syscall behaviour has changed, fwict that's the only reliable way to go about it.
(Really there'd need to be a "compatibility profile" mode when these changes happen, like OpenGL has, though it's of course easier to just not change the interface. Assuming the latter, I too would be inclined to think this is an io_uring-specific growth pain / symptom, though again, I know nothing.)

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 23, 2020
@andrewrk andrewrk added this to the 0.8.0 milestone Sep 23, 2020
@jorangreef
Copy link
Contributor Author

jorangreef commented Sep 24, 2020

but couldn't the opposite, hypothetically, be true as well? If a syscall is "upgraded" to loosen its preconditions, then what was previously an error now succeeds.

Thanks @rohlem, for the interesting inverse example. That's certainly true, but I think at least it wouldn't lead to undefined behavior from a safety point of view? The program simply wouldn't receive an error from the kernel, so this kind of code branch wouldn't be reached.

Unless, then again, it could be a safety issue if the program required the kernel to return an error, and fell back to unreachable if it did not.

To me it seems like the actual solution would be to instead put an upper limit on the supported kernel version (so do a version check and @Panic at run time if it is too high/new). While it's a shame supporting every new release would mean having to manually verify none of the syscall behaviour has changed, fwict that's the only reliable way to go about it.

Perhaps at first glance, but remember the kernel is free to backport features. Ultimately, checking kernel versions is a slippery slope and should be avoided in favor of relying on the error mechanism the kernel already exposes, albeit without using unreachable to assert what the kernel can/cannot return as an error.

@Rocknest
Copy link
Contributor

Its expected that OSes have some backward compatibility, if the system has a breaking change thats not our problem (everything would break if for example read syscall suddenly changed). All documented errors should be handled and nothing more.

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk added accepted This proposal is planned. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. labels Nov 23, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@andrewrk andrewrk changed the title std: unsafe handling of errors overloaded by the kernel handle impossible errors from the kernel with error.Unexpected instead of unreachable Nov 23, 2021
@andrewrk
Copy link
Member

My only concern here is with the debug experience. For example the following application:

const std = @import("std");

test "close() twice" {
    var file = try std.fs.cwd().createFile("aoeuaoeu", .{});
    defer file.close();

    file.close();
}

The debug experience here is excellent:

[nix-shell:~/Downloads/zig/build-release]$ ./zig test test.zig 
Test [1/1] test "close() twice"... thread 2813 panic: reached unreachable code
/home/andy/Downloads/zig/lib/std/os.zig:252:18: 0x209312 in std.os.close (test)
        .BADF => unreachable, // Always a race condition.
                 ^
/home/andy/Downloads/zig/lib/std/fs/file.zig:188:21: 0x2080fa in std.fs.file.File.close (test)
            os.close(self.handle);
                    ^
/home/andy/Downloads/zig/build-release/test.zig:5:21: 0x20689d in test "close() twice" (test)
    defer file.close();
                    ^
/home/andy/Downloads/zig/lib/std/special/test_runner.zig:77:28: 0x22d302 in std.special.main (test)
        } else test_fn.func();
                           ^
/home/andy/Downloads/zig/lib/std/start.zig:525:22: 0x225eec in std.start.callMain (test)
            root.main();
                     ^
/home/andy/Downloads/zig/lib/std/start.zig:477:12: 0x20986e in std.start.callMainWithArgs (test)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/andy/Downloads/zig/lib/std/start.zig:391:17: 0x208486 in std.start.posixCallMainAndExit (test)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/home/andy/Downloads/zig/lib/std/start.zig:304:5: 0x208292 in std.start._start (test)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
error: the following test command crashed:
zig-cache/o/82032e287c7c598235bf900b138d0986/test /home/andy/Downloads/zig/build-release/zig

Let's just make sure it remains excellent after the changes incurred by this proposal.

@jorangreef
Copy link
Contributor Author

My only concern here is with the debug experience.

Yes, thanks for the example.

On second thought I think we should actually leave the status quo as is, because it's a better experience, and safe in Debug and ReleaseSafe even if the kernel does overload errors. It's also something that can be maintained across different kernel versions and they're not going to be overloading errors that often.

Alternatively, simply a @panic with clear explanation.

@andrewrk andrewrk removed contributor friendly This issue is limited in scope and/or knowledge of Zig internals. accepted This proposal is planned. labels Nov 23, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@andrewrk andrewrk added accepted This proposal is planned. and removed accepted This proposal is planned. labels Apr 9, 2023
@richiejp
Copy link

richiejp commented Sep 6, 2023

Just to complicate things. Any syscall can produce any error due to seccompbpf or an LSM although usually it would be PERM, but maybe some people would use NOSYS. This is a very common scenario now with sandboxing and containers.

Syscalls that act on FDs can produce any error because FDs can point to anything. You can create a file system that represents DNS and start returning NSRCNAMELOOP on writes or reads. It doesn't have to be a mainline kernel, it can be an external module or FUSE.

Even with just the mainline kernel, if you actually mapped all the error codes that a syscall like read or setsockopt can return and someone may want to deal with, then you'll have a big long list of expected errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

7 participants