handle impossible errors from the kernel with error.Unexpected instead of unreachable #6389

jorangreef · 2020-09-21T10:48:01Z

On Linux, e.g. in os.zig the std lib will often interrogate errno and map to Zig errors, or unreachable if the std lib wants to assert that the std lib implementation would never cause an EINVAL or EFAULT, e.g.:

EINVAL => unreachable,
EFAULT => unreachable,

However, while implementing #6356, I was about to follow this pattern but then I realized that the kernel often overloads errors in new kernel versions, which is particularly the case for the io_uring syscalls.

This means that we might think our std lib implementation cannot cause EINVAL, and then the kernel adds a new feature which could, leading to undefined behavior instead of a safe error.

In other words, we need to start going through the std lib and make this usage of unreachable an anti-pattern because there's no way we can assert what the kernel can or cannot be returning like this.

The text was updated successfully, but these errors were encountered:

jorangreef · 2020-09-21T10:54:00Z

Either that, or we need an unreachable which doesn't become undefined behavior in fast builds, or another separate keyword. To be clear, I don't mean to suggest it's necessarily a good idea to add keywords...

pfgithub · 2020-09-21T13:35:17Z

Either that, or we need an unreachable which doesn't become undefined behavior in fast builds, or another separate keyword.

I think @panic() does this

Rocknest · 2020-09-21T18:03:44Z

Do you have any example of a syscall that have changed its behaviour, except ioring stuff? If its specific to that feature, then do not use this pattern with these apis.

jorangreef · 2020-09-21T18:33:25Z

I haven't looked, but it's not about changing behavior so much as adding behavior, i.e. two error states (one old, one new) map to one error code, and Zig only checks one error state (because that was the only error state that existed at the time in the man pages) or assumes unreachable.

rohlem · 2020-09-23T21:42:36Z

I don't know much about Linux; maybe I'm spouting nonsense, but couldn't the opposite, hypothetically, be true as well? If a syscall is "upgraded" to loosen its preconditions, then what was previously an error now succeeds. If the program relies on that logic, it breaks semantically (which is its own class of undefined behaviour).

To me it seems like the actual solution would be to instead put an upper limit on the supported kernel version (so do a version check and @panic at run time if it is too high/new). While it's a shame supporting every new release would mean having to manually verify none of the syscall behaviour has changed, fwict that's the only reliable way to go about it.
(Really there'd need to be a "compatibility profile" mode when these changes happen, like OpenGL has, though it's of course easier to just not change the interface. Assuming the latter, I too would be inclined to think this is an io_uring-specific growth pain / symptom, though again, I know nothing.)

jorangreef · 2020-09-24T10:03:24Z

but couldn't the opposite, hypothetically, be true as well? If a syscall is "upgraded" to loosen its preconditions, then what was previously an error now succeeds.

Thanks @rohlem, for the interesting inverse example. That's certainly true, but I think at least it wouldn't lead to undefined behavior from a safety point of view? The program simply wouldn't receive an error from the kernel, so this kind of code branch wouldn't be reached.

Unless, then again, it could be a safety issue if the program required the kernel to return an error, and fell back to unreachable if it did not.

To me it seems like the actual solution would be to instead put an upper limit on the supported kernel version (so do a version check and @Panic at run time if it is too high/new). While it's a shame supporting every new release would mean having to manually verify none of the syscall behaviour has changed, fwict that's the only reliable way to go about it.

Perhaps at first glance, but remember the kernel is free to backport features. Ultimately, checking kernel versions is a slippery slope and should be avoided in favor of relying on the error mechanism the kernel already exposes, albeit without using unreachable to assert what the kernel can/cannot return as an error.

Rocknest · 2020-09-25T18:27:15Z

Its expected that OSes have some backward compatibility, if the system has a breaking change thats not our problem (everything would break if for example read syscall suddenly changed). All documented errors should be handled and nothing more.

andrewrk · 2021-11-23T06:47:32Z

My only concern here is with the debug experience. For example the following application:

const std = @import("std");

test "close() twice" {
    var file = try std.fs.cwd().createFile("aoeuaoeu", .{});
    defer file.close();

    file.close();
}

The debug experience here is excellent:

[nix-shell:~/Downloads/zig/build-release]$ ./zig test test.zig 
Test [1/1] test "close() twice"... thread 2813 panic: reached unreachable code
/home/andy/Downloads/zig/lib/std/os.zig:252:18: 0x209312 in std.os.close (test)
        .BADF => unreachable, // Always a race condition.
                 ^
/home/andy/Downloads/zig/lib/std/fs/file.zig:188:21: 0x2080fa in std.fs.file.File.close (test)
            os.close(self.handle);
                    ^
/home/andy/Downloads/zig/build-release/test.zig:5:21: 0x20689d in test "close() twice" (test)
    defer file.close();
                    ^
/home/andy/Downloads/zig/lib/std/special/test_runner.zig:77:28: 0x22d302 in std.special.main (test)
        } else test_fn.func();
                           ^
/home/andy/Downloads/zig/lib/std/start.zig:525:22: 0x225eec in std.start.callMain (test)
            root.main();
                     ^
/home/andy/Downloads/zig/lib/std/start.zig:477:12: 0x20986e in std.start.callMainWithArgs (test)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/andy/Downloads/zig/lib/std/start.zig:391:17: 0x208486 in std.start.posixCallMainAndExit (test)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/home/andy/Downloads/zig/lib/std/start.zig:304:5: 0x208292 in std.start._start (test)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
error: the following test command crashed:
zig-cache/o/82032e287c7c598235bf900b138d0986/test /home/andy/Downloads/zig/build-release/zig

Let's just make sure it remains excellent after the changes incurred by this proposal.

jorangreef · 2021-11-23T06:54:53Z

My only concern here is with the debug experience.

Yes, thanks for the example.

On second thought I think we should actually leave the status quo as is, because it's a better experience, and safe in Debug and ReleaseSafe even if the kernel does overload errors. It's also something that can be maintained across different kernel versions and they're not going to be overloading errors that often.

Alternatively, simply a @panic with clear explanation.

richiejp · 2023-09-06T23:35:05Z

Just to complicate things. Any syscall can produce any error due to seccompbpf or an LSM although usually it would be PERM, but maybe some people would use NOSYS. This is a very common scenario now with sandboxing and containers.

Syscalls that act on FDs can produce any error because FDs can point to anything. You can create a file system that represents DNS and start returning NSRCNAMELOOP on writes or reads. It doesn't have to be a mainline kernel, it can be an external module or FUSE.

Even with just the mainline kernel, if you actually mapped all the error codes that a syscall like read or setsockopt can return and someone may want to deal with, then you'll have a big long list of expected errors.

jorangreef changed the title ~~unsafe error handling in the std lib of overloaded kernel errors~~ std: unsafe error handling of overloaded kernel errors Sep 21, 2020

jorangreef changed the title ~~std: unsafe error handling of overloaded kernel errors~~ std: unsafe handling of errors overloaded by the kernel Sep 21, 2020

jorangreef mentioned this issue Sep 21, 2020

std: add io_uring library #6356

Merged

daurnimator added the standard library This issue involves writing Zig code for the standard library. label Sep 21, 2020

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 23, 2020

andrewrk added this to the 0.8.0 milestone Sep 23, 2020

jorangreef mentioned this issue Nov 2, 2020

Should EINVAL always be unreachable? #6925

Open

xackus mentioned this issue Nov 3, 2020

stage2: ask for more file descriptors #6921

Merged

andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021

andrewrk added accepted This proposal is planned. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. labels Nov 23, 2021

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021

andrewrk changed the title ~~std: unsafe handling of errors overloaded by the kernel~~ handle impossible errors from the kernel with error.Unexpected instead of unreachable Nov 23, 2021

andrewrk removed contributor friendly This issue is limited in scope and/or knowledge of Zig internals. accepted This proposal is planned. labels Nov 23, 2021

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

andrewrk added accepted This proposal is planned. and removed accepted This proposal is planned. labels Apr 9, 2023

andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023

andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023

This was referenced Aug 8, 2023

sync stdout run into unreachable code #16735

Closed

add KillError.NoSuchProcess #16816

Closed

squeek502 mentioned this issue Aug 15, 2023

std.os | Fix EFAULT read error #16827

Closed

squeek502 mentioned this issue Sep 26, 2023

panic in std.os.poll, INVAL is not unreachable #17285

Open

GalaxySnail mentioned this issue Nov 11, 2023

std.os.fsync, File.sync hit unreachable EINVAL for special files #17950

Open

rootbeer mentioned this issue Mar 27, 2024

std.posix.accept does not return when interrupted. #19445

Closed

rootbeer mentioned this issue Apr 22, 2024

std.posix.accept: improve EINVAL's error name from SocketNotListening to SocketOrAddrLenIncorrect. #19722

Open

karlseguin mentioned this issue Oct 11, 2024

Linux system calls may return unreachable or unexpected error codes #10776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle impossible errors from the kernel with error.Unexpected instead of unreachable #6389

handle impossible errors from the kernel with error.Unexpected instead of unreachable #6389

jorangreef commented Sep 21, 2020

jorangreef commented Sep 21, 2020 •

edited

Loading

pfgithub commented Sep 21, 2020

Rocknest commented Sep 21, 2020 •

edited

Loading

jorangreef commented Sep 21, 2020 •

edited

Loading

rohlem commented Sep 23, 2020 •

edited

Loading

jorangreef commented Sep 24, 2020 •

edited

Loading

Rocknest commented Sep 25, 2020

andrewrk commented Nov 23, 2021

jorangreef commented Nov 23, 2021

richiejp commented Sep 6, 2023

handle impossible errors from the kernel with error.Unexpected instead of unreachable #6389

handle impossible errors from the kernel with error.Unexpected instead of unreachable #6389

Comments

jorangreef commented Sep 21, 2020

jorangreef commented Sep 21, 2020 • edited Loading

pfgithub commented Sep 21, 2020

Rocknest commented Sep 21, 2020 • edited Loading

jorangreef commented Sep 21, 2020 • edited Loading

rohlem commented Sep 23, 2020 • edited Loading

jorangreef commented Sep 24, 2020 • edited Loading

Rocknest commented Sep 25, 2020

andrewrk commented Nov 23, 2021

jorangreef commented Nov 23, 2021

richiejp commented Sep 6, 2023

jorangreef commented Sep 21, 2020 •

edited

Loading

Rocknest commented Sep 21, 2020 •

edited

Loading

jorangreef commented Sep 21, 2020 •

edited

Loading

rohlem commented Sep 23, 2020 •

edited

Loading

jorangreef commented Sep 24, 2020 •

edited

Loading