-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newlib: inconsistent usage of write() vs _write() in builds for different architectures to dump error messages to console #221
Comments
This is submitted as offshot of zephyrproject-rtos/zephyr#25443 (comment) . cc @cfriedt . More information is expected to follow. |
Can you clarify, when you say the toolchain calls write(), what do you mean? Trying to understand what the function call chain you are seeing on x86 that differs from arm. |
Not toolchain, but pre-built Newlib static lib as included with the Zephyr SDK toolchains. I'll post steps to reproduce a bit later. |
Sure, I guess having a backtrace comparison between x86 and arm would be useful to see the difference. |
…tation To show that Newlib builds for different architectures (as included in Zephyr SDK) inconsistently call either write() or _write() (dependending on the architecture Newlib was built for, again). Related to zephyrproject-rtos/sdk-ng#221 . Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
Well, I'm not sure about backtrace, the best way to expose an issue is link-time error. So, let's comment definitions of both write() and _write() in https://github.com/zephyrproject-rtos/zephyr/blob/master/lib/libc/newlib/libc-hooks.c#L183 . The diff is in top commit of https://github.com/pfalcon/zephyr/commits/newlib-write-vs-_write (pfalcon/zephyr@65dafeb as of now, may be rebased). Then let's build
So, what we see here is that write() (without underscore) gets called (among Zephyr in-tree cases) from:
Which is Newlib prebuilt libc.a as shipped with Zephyr SDK (0.11.2, as can be see). Let's compare with errors for an Arm board:
So, we now see that prebuilt Newlib tries to call _write() (with underscore), specifically from:
That pinpoints the issue reported: when built for different architectures, the same module within Newlib (specifically, But looking closer, we see that the situation is even more tangled: for different architectures (or straight, boards), different Newlib variants are used, e.g. for qemu_x86 "full" Newlib is used by default, while for frdm_k64f - "nano". It's easy to imagine that such a difference in Newlib variants was done as a "feature" and with the best intention, but it ends up just another inconsistency among platforms/boards. |
Ok, so given that we spotted "full" vs "nano" Newlib discrepancy, let's check whether write vs _write discrepancy is actually caused by it:
So, nope, the original hypothesis appear to hold - the discrepancy is per-architecture, not per-Newlib-variant. |
Properly working write() (dispatching to various types of file descriptors) requires CONFIG_POSIX_API=y. Without it, we have just _write(), which works only with a console. Sadly, due to difficiencies of Newlib prebuilt libraries for some (but not others) architectures, as shipped as part of Zephyr SDK, we also must alias write() to _write() for the case of CONFIG_POSIX_API=n (simply because prebuilt Newlib for some architectures calls into it, leading to link errors if "write" symbol is not defined). This issue is tracked on Zephyr SDK side as zephyrproject-rtos/sdk-ng#221 . Until it's resolved, at least warn users who may call write() for something else than STDOUT_FILENO/STDERR_FILENO without defining CONFIG_POSIX_API=y. Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
To elaborate on the choices we have, until we have this fixed, we have to apply pretty ugly workarounds like zephyrproject-rtos/zephyr#25479 (where we effectively say "we can't get it right, so d'oh, it's your problem"). Another alternative is to keep good people like @cfriedt confused (which is the current situation). |
From the prebuilt toolchain (version 0.11.2):
|
@cfriedt: Yep. Maybe if you're on it, you could grep libs for other archs. |
@cfriedt, thanks! Mind that when you post a new comment, people involved a ticket get notification (and that's how most follow multitude of their tickets, I guess), but when you edit, they don't. It's certainly not a problem here, thanks for finding time to do that. Just e.g. in PR reviews, it makes sense to post new info as a new comment (or if it makes more sense to edit existing, to post a short notice that previous info was udpated). Thanks. Anyway, looking at the details above, it's clear that x86 is not the only arch which calls Ok, when other tasks permit, I'll have a look at Newlib source to figure out if something can be done about it. |
Add a zephyr case to newlib/configure.host so we get the same config on all arch's. Fix the fact that some arch's like x86 were getting -DMISSING_SYSCALL_NAMES set in newlib which we don't want. Fixes: zephyrproject-rtos#221 Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
@pfalcon can you test builds.zephyrproject.org/zephyrproject-rtos/sdk-ng/223/zephyr-sdk-0.11.3-pr-223-setup.run -- this seems to address the issue. |
@galak: Will do, thanks! |
@galak - Hopefully you caught it too, but it seemed to be a similar case for |
the fix should match behavior for all the syscalls. Try out |
Ok, to test this, we'd once again need to specify what's the desired, and thus expected, behavior. In my list it's:
And the methodology to test will be:
|
Sadly, the testing fails even on p.1. Actually, doing black-box testing everything seems to be ok: after removal of all aliases, samples/net/sockets/big_http_download links (it didn't before). But as soon as we peek what's inside, we see:
So, the usual suspect from the bloatland, And where that write() comes from, given that we no longer define it? Well:
So well, now Newlib defines it itself with a callchain of write() -> _write_r() -> _write(). And of course, if we try to call write() from the app side, it will just dump to stdout. So, the issue is not resolved. |
Odd, when I build qemu_x86 I get:
|
@pfalcon can you do a shasum on |
Nevermind, if I remove the ALIAS I get the same behavior as you
Are saying that is how newlib works, or how it should work? Has So now I'm not sure what behavior you are wanting out of newlib? |
@galak: So, it's tricky. If I use zephyr-sdk-0.11.3-pr-223 with pristine master d8560f698b5d9aae25ad1ba8ff1ac84e2cc38ab8, I also get
But that's go to be the effect of |
I'm ready to look at Newlib source to confirm. |
That's we want it to do, to resolve the write() problem for users. |
Then we're hosed and zephyrproject-rtos/zephyr#25479 is the only way. Well, we can also build our stuff with |
Here's where it comes from in big_http_download. Sure, for embedded project, we may assume that developers can call their memcpy() right? (But then to be fair, a usual case for __chk_fail() is something like sprintf(), that's where people love to overflow their buffers, but then we wouldn't use it, would we?) |
Comes from: https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=3e8fc7d9f21329d5a98ec3cf6de138bce9bc2c05 So, a software developer said "While GCC also provides an implementation in libssp, it is completely broken (CVE-2016-4973, RHBZ#1324759) and seemingly unfixable", and well, made their turn on adding an implementation suitable for embedded libc. Actually wait, the repo is called "newlib-cygwin", so it's Cygwin libc, and nobody advertises embedded-suitability, and if the world uses it as such, that's world's problem ;-). Bottom line: we of course can try to fix the issue in upstream ;-). |
So, as can be seen from the full patch link posted in the previous comment, here it would be:
(Though it may be defined depending on _FORTIFY_SOURCE which is kinda "public" interface to that feature (at least AFAIK)). |
But, obvious eureka: we can define __chk_fail on our side, problem solved. (Maybe). Will try. |
Seems to work, top commit of https://github.com/pfalcon/zephyr/commits/mo-more-write-alias . So, now I would need to test it with just 0.11.3, and see if there's value in #223 . |
Ok, I think there's still value in #223 as it makes all the newlib's consistent across all the arch's we have in the sdk. |
Ack. Let me still play with (and ponder about) it, to see whether we could get an ideal behavior of write() not being defined at all by newlib. |
We should really have a Zephyr fork of newlib (and upstream them once it is mature enough) so that we can address these types of issues: This also came up a while back when we were looking into the thread safety and re-entrancy issues. |
If there's someone who's going to maintain it - why not.
There were a few ideas, like moving Newlib on the Zephyr source tree side. All such ideas are bound by balance of the actual benefits achieved (vs overheads), practicality, and effort required (and available). |
Add a zephyr case to newlib/configure.host so we get the same config on all arch's. Fix the fact that some arch's like x86 were getting -DMISSING_SYSCALL_NAMES set in newlib which we don't want. Fixes: #221 Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
Cleaned-up __chk_fail() implementation as produced here was submitted as zephyrproject-rtos/zephyr#26135. |
Newlib build included with x86 toolchain calls write() in [TBF], while e.g. arm - _write(). This is inconsistent and causes confusion. x86 way of calling write() also requires it to be defined, but default implementation (without POSIX subsystem enabled) is able to write only to a console. But people don't know that, and call it e.g. on sockets, with output dumped to console. All these problems could be avoided in x86 build also used _write() (the whole idea of underscore-prefixed function is that "it works kind like posix, but doesn't support full posix functionality").
The text was updated successfully, but these errors were encountered: