-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] lib: os: fdtable: Add syscalls for POSIX read/write/close/ioctl operations #25443
Conversation
@andrewboie, long-standing technical debt. Plan of work: conversion for read/write/close is pushed. Will see if CI passes. There're existing socket tests which could be easily (albeit in Do-Repeat-Yourself manner) converted to test read/write. But this test also calls more stuff, like fcntl. That in turn is based on ioctl. And ioctl is used for some kernel-level stuff. So, there's a bit of scope creep, and changes would be exactly "leaf in dependency graph" and "isolated", as there can be potential changes to socket family implementations (or maybe not, didn't look into that yet). @cfriedt: FYI (and please review once finished, currently it's not, per above). |
5ad8073
to
3d50cab
Compare
lib/os/fdtable.c
Outdated
*/ | ||
ssize_t write(int fd, const void *buf, size_t sz) | ||
{ | ||
return _write(fd, buf, sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting "undefined reference to _write
" and "undefined reference to _read
" in libc_nano.a . It doesn't jump out at me immediately why that might be the case.
https://app.shippable.com/github/zephyrproject-rtos/zephyr/runs/73259/5/console
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... But it should explain why this spent so much time in the backlog - dealing with all these naming (ab)uses across different targets is a bit boring (and I'm not particularly interested in kernel/userspace split ;-) ). Anyway, fixing it.
6127e52
to
6c4206e
Compare
Ok, read/write/close syscall are now clean re: build. Will proceed with ioctl tomorrow, then test is the next step after that. |
6c4206e
to
c732f3d
Compare
For the record, to elaborate what happened there. Prebuilt Newlib, as shipped with Zephyr SDK is call into write to dump some error on stdout/stderr. The issue is that the actual function called is inconsistent among the builds for different architectures. For example, x86 calls write(), while ARM - _write(). My original idea was to make _write() a syscall, while write() a normal userspace func, that due to the above, I fell back to Linux-like convention of sys_write() being a syscall, and write()/_write() userspace funcs. Thinking about it, ARM newlib build is right here, while x86 is wrong. The whole idea of underscore-prefixed functions is that they don't provide full POSIX semantics. So, if x86 called _write, we wouldn't need to define write() for !CONFIG_POSIX_API case, and @cfriedt would never get confused by calling write() on a socket in this case, and getting output dumped to console. Actually, that would be the right way to resolve the issue, so let me even try to submit zephyrproject-rtos/sdk-ng#221 (though I'm not sure who would be looking into that). |
In the meantime, pushed a commit with ioctl() conversion to syscall. |
c732f3d
to
6dfe2eb
Compare
6dfe2eb
to
ff160a3
Compare
And it's the usual suspect, @andrewboie, I'm sure there's nothing new in this #if 0, it was expected all along (and again, why it spent so much time in backlog). |
All checks passed. checkpatch (informational only, not a failure)
Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
Otherwise, 3rd commit in series patches an existing socket test to add read()/write() testcases. |
@dleach02 That's the problem: the individual arguments needs to be validated. They could contain pointer values, and if so every single one of them needs to be bounded and checked for appropriate access. We can't just unmarshal them on the kernel side and call it a day handing them off to the implementation; if the implementation treats any of the args as a pointer and not an integral value we have a potential exploit because none of these pointers have been validated. What we have here is a variadic interface to send up to N arbitrary uintptr_t arguments to the kernel, with no knowledge on the kernel side what these arguments actually represent, since the verification is being done in a generic verifier for any ioctl. This generic interface will not work, the particular semantics of the arguments entirely depend on what the particular function is. Which is why I asked to just demux ioctl calls in userspace and send them each as different syscalls. Now that I think of it, there's also the problem that there's no accounting for what threads can use what file descriptors, unless I am missing something a malicious thread could manipulate any file descriptor active in the system in the current design. That's something that should probably be looked at separately. |
1f96bf4
to
ebad591
Compare
Yep. That's a biggie. |
Indeed, that's a "biggie": File descriptors are per process, not per thread. So whoever will implement support for multiple processes in Zephyr, will also make sure that fdtable is per-process. |
ebad591
to
60a7af8
Compare
@andrewboie, I understand what you are saying. We should clarify (if it isn't already done), what the rules are for validating the parameters in a system call. It seems like the rule you are articulating is that all parameters must be properly typed and known at the time of the call and they must be able to be properly verified. Hiding the type of the parameter is not allowed. I'm assuming the baseline problem is that the IOCTL is directed at different underlying drivers/interfaces and as such, the interpretation of the data is different depending on the driver needs. The generic interface of the IOCTL will not be able to be supported via a syscall. If the "syscall" happens lower where the user space code takes in the IOCTL and then breaks out the parameters of the veriadic that could potentially be a syscall point of entry that could validate the parameters? |
Of course. I didn't want to go on a diatribe, but obviously processes, permissions, pre-opened file descriptors (stdin, stdout, stderr), virtual devfs, console on serial port, etc. However, currently Zephyr does attempt to separate thread permissions based on (physical / virtual) address space and metadata. So even though the entire "process" has a flat set of file descriptors, certain threads really shouldn't be able to write to some file descriptors and until processes are implemented, fd's may need to have permissions associated with a thread or thread group. |
@cfriedt: Show me a reference to the general purpose POSIX standard which says that, and we'll discuss it (later). |
60a7af8
to
af60bba
Compare
Permission accounting is a must-have in order to introduce these syscalls, you wouldn't want Minecraft to be able read Firefox's file descriptors at-will. I have a solution I'm testing to fix this for sockets (we used to have it but it broke, it uses kernel object permissions) that should be generalize-able About ioctl(): consider for a moment that although having syscalls with arbitrary variadic argument types is a nightmare, nobody I know of against having an extensible vectored driver interface mechanism...there's no requirement to use ioctl()'s interface specifically, ioctls aren't portable anyway and you can always write wrappers have a look at how Windows does it: https://docs.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol?redirectedfrom=MSDN their definition with a device, handle, and in/out buffers would be trivial to validate in a syscall |
224e5b1
to
648c847
Compare
As far as |
Fdtable is a kernel-level structure, so functions accessing it, like read(), write(), close(), should be syscalls. This conversion however emphasizes a difference between "syscall" and "C standard library function". C stdlib functions should be represented by just a standard linkage symbol, and can be used even without prototype. Syscall is however a special code sequence, which may require special attributes and thus a prototype for usage. To deal with these distinction, we define a convention that "sys_name" is a syscall, while "name" is a normal C function (implemented as a wrapper around syscall). (And we also need to define "_name" to interface with Newlib (actually, just "_write".)) Fixes: zephyrproject-rtos#25407 Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
This is similar to previously added syscalls for read/write/close. sys_ioctl accepts va_list. fcntl() is clearly rewritten as a userspace operation, delegating to sys_ioctl(). Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
Just a test for send()/recv() duplicated to call write()/read() instead. Duplication is not ideal, but to deal with it, we would need some no-nonsense metagramming capabilities. Sadly, C classically have problems with that. Standard way to instantiate different functions based on the same template is to use preprecessor, but that requires suffixing each line with a backslash, and thus is pretty unwieldy. A more modern approach would be to use always-inline function as a template, but always-inline functions aren't part of the C standard, and just inline functions don't guarantee inlinability. (Besides metaprogramming, a mundane-programming solution is possible, but somehow that was considered cumersome too.) Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
…a_list As we cannot pass va_list across userspace/kernel boundary (because we don't control its internel representation, and it may be quite, complex), just get rid of it on the lowest level of ioctl machinery. Pass just explicit array of machine words (uintptr_t) to the ioctl vmethod implementation. On the API convenience layer, there's still support for variadic arguments calls (both on the level of userspace ioctl() function, and for internal in-kernel usage), with explicit marshalling from va_list to array of uintptr_t's. Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
Following a refactor to allow pass ioctl arguments across userspace/kernel boundary. Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
648c847
to
58a19b8
Compare
Ok, so I split off changes for just read()/write() and added a dedicated test for them: #26117 |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
Fdtable is a kernel-level structure, so functions accessing it, like
read(), write(), close(), should be syscalls. This conversion however
emphasizes a difference between "syscall" and "C standard library
function". C stdlib functions should be represented by just a
standard linkage symbol, and can be used even without prototype.
Syscall is however a special code sequence, which may require special
attributes and thus a prototype for usage. To deal with these
distinction, we define a convention that "sys_name" is a syscall,
while "name" is a normal C function (implemented as a wrapper around
syscall). (And we also need to define "_name" to interface with Newlib
(actually, just "_write".))
Fixes: #25407
Signed-off-by: Paul Sokolovsky paul.sokolovsky@linaro.org