Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libc: newlib: libc-hooks: Warn that there's no write() without POSIX_API #25479

Closed

Conversation

pfalcon
Copy link
Contributor

@pfalcon pfalcon commented May 20, 2020

Properly working write() (dispatching to various types of file
descriptors) requires CONFIG_POSIX_API=y. Without it, we have just
_write(), which works only with a console.

Sadly, due to difficiencies of Newlib prebuilt libraries for some
(but not others) architectures, as shipped as part of Zephyr SDK,
we also must alias write() to _write() for the case of
CONFIG_POSIX_API=n (simply because prebuilt Newlib for some
architectures calls into it, leading to link errors if "write" symbol
is not defined).

This issue is tracked on Zephyr SDK side as
zephyrproject-rtos/sdk-ng#221 . Until it's
resolved, at least warn users who may call write() for something else
than STDOUT_FILENO/STDERR_FILENO without defining CONFIG_POSIX_API=y.

Signed-off-by: Paul Sokolovsky paul.sokolovsky@linaro.org

Properly working write() (dispatching to various types of file
descriptors) requires CONFIG_POSIX_API=y. Without it, we have just
_write(), which works only with a console.

Sadly, due to difficiencies of Newlib prebuilt libraries for some
(but not others) architectures, as shipped as part of Zephyr SDK,
we also must alias write() to _write() for the case of
CONFIG_POSIX_API=n (simply because prebuilt Newlib for some
architectures calls into it, leading to link errors if "write" symbol
is not defined).

This issue is tracked on Zephyr SDK side as
zephyrproject-rtos/sdk-ng#221 . Until it's
resolved, at least warn users who may call write() for something else
than STDOUT_FILENO/STDERR_FILENO without defining CONFIG_POSIX_API=y.

Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
@pfalcon
Copy link
Contributor Author

pfalcon commented May 20, 2020

cc @cfriedt

Proposed as an interim measure until (and if) zephyrproject-rtos/sdk-ng#221 is resolved.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

cc @stephanosio

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

@cfriedt, There were no comments from you, and I would expect them. I made it clear (somewhere in zephyrproject-rtos/sdk-ng#221) that I'm not happy with this "solution". But even if zephyrproject-rtos/sdk-ng#221 (comment) fixes it properly (still TBC), it will be available only with the next SDK release, and we will be able to rely on it when that SDK release will be made mandatory minimum version. That may take months. So, judge how much were you confused and frustrated being hit by this issue, and is merging this is a reasonable interim measure. (I can't judge that, for me it's oh-so-clear that you don't use write() without CONFIG_POSIX_API ;-) ).

And I bump this ticket because we've got another very similar case: #25599 . And such cases may become pervasive (so, sure, one of the possible decisions can be "let's wait until we will be swamped by these issue reports, and cumulative user effort wasted on this will be counted in man-months").

And for me, just investigating #25599 is predicated on the case of this PR. And likely, it is to be resolved in the same way (i.e., attempting to call _read() without prior configuration will print a warning at runtime that d'oh, this doesn't work).

@cfriedt
Copy link
Member

cfriedt commented May 26, 2020

@pfalcon - sorry for the delay. It's tricky. scanf is also c89, c99, so it doesn't fall into the same boat as POSIX API calls. Which (to me) implies that scanf should always Just Work (TM).
But if the weak reference prints out a message saying to "consider enabling CONFIG_CONSOLE", then I think that should work for #25599.

Aha - I missed this part underneath the diff.

__weak FUNC_ALIAS(_write, write, int);

I agree, this makes much more sense now.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

@cfriedt, thanks for approval, but I wish, there were few words said about this patch. As it stands, all words belong to #25599 ;-). Let me reply though:

scanf is also c89, c99, so it doesn't fall into the same boat as POSIX API calls. Which (to me) implies that scanf should always Just Work (TM).

I don't follow the logic. What's so special about those "c89", "c99" things? Do you hint that Zephyr mandates support for those things (and only those things, like _Complex of C11 is out of luck)? Nope, didn't hear about that. If that were true, patches like #16511 wouldn't be stuck in the queue for a year (and there're more instances, which were just shot down).

Now to elaborate my logic: Zephyr should support out of the box only things which don't incur high overhead. Any other feature should be optional, gated behind configuration. Note that configuration can be both compile-time and run-time. E.g. here we're talking about CONFIG_POSIX_API compile-time option, and #25599 talks about the need to call __stdin_hook_install() at runtime. So yes, write() and scanf() are instances of the same issue for me, as described.

And the main issue is how to ensure that, if configuration for these options weren't done, that the behavior of the program was not confusing to the user.

Finally, how scanf() and POSIX are connected. We'll never get people satisfied. As soon as they get just scanf() working, they will want to get my_app <file.txt working under Zephyr (by the usual close(0); dup(file_fd); stanza). So, there will be no bliss until POSIX support is pervasive. And it's far from that, for example, there's no /dev/console, even though some commenters in #25599 think it already should be there (how? who'd have implemented it, when and why?)

Copy link
Member

@stephanosio stephanosio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder this should be an assert so that it does not end up in a production build. We might as well just hard-fail here anyway since such an implementation should never get past initial testing stage.

@cfriedt
Copy link
Member

cfriedt commented May 26, 2020

@cfriedt, thanks for approval, but I wish, there were few words said about this patch. As it stands, all words belong to #25599 ;-). Let me reply though:

scanf is also c89, c99, so it doesn't fall into the same boat as POSIX API calls. Which (to me) implies that scanf should always Just Work (TM).

I don't follow the logic. What's so special about those "c89", "c99" things? Do you hint that Zephyr

I'm just saying c89 and c99 came before posix, so we can't predecate scanf support on posix.

Now to elaborate my logic: Zephyr should support out of the box only things which don't incur high overhead. Any other feature should be optional, gated behind configuration. Note that configuration

Agreed.

And the main issue is how to ensure that, if configuration for these options weren't done, that the behavior of the program was not confusing to the user.

Agreed.

Finally, how scanf() and POSIX are connected. We'll never get people satisfied. As soon as they get just scanf() working, they will want to get my_app <file.txt working under Zephyr (by the usual close(0); dup(file_fd); stanza). So, there will be no bliss until POSIX support is pervasive. And it's far from that, for example, there's no /dev/console, even though some commenters in #25599 think it already should be there (how? who'd have implemented it, when and why?)

I would say just one step at a time for now. The warnings are a good policy for now, as the overhead is on the order of a few bytes.

Technically you can have scanf() without POSIX. It was part of ANSI C before POSIX came along.
https://www.csse.uwa.edu.au/programming/ansic-library.html

For sure, POSIX shell constructs like piping and IO redirection would require CONFIG_POSIX, but is the Zephyr shell application POSIX?? I doubt it.

The devfs virtual filesystem is a completely different story. Having overlayable filesystems alone (so mounted filesystems) is way off. My guess is that Zephyr only supports flat filesystems right now, and even then, that it probably isn't hooked into the POSIX API, which is quite fair. There is a not insignificant overhead there.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

@stephanosio

I wonder this should be an assert

Are asserts enabled by default?

@cfriedt
Copy link
Member

cfriedt commented May 26, 2020

@stephanosio

I wonder this should be an assert

Are asserts enabled by default?

Not unless you are running tests.

default y if TEST(=y)

@stephanosio
Copy link
Member

Are asserts enabled by default?

No, but it would (if not, should) be enabled when you are developing/debugging your firmware anyway.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

@stephanosio

No

Let me remind you why - because we have so many ASSERT's, that enabling that produces so bloated binaries, that in many realistic scenarios, next step after enabling them is immediately disabling them. Dark side of "quality code", whoa.

be enabled when you are developing/debugging your firmware anyway.

If we stipulate that, we can stipulate many other things, like "people should call __foo_bar() to get well-known C stdlib function working", etc. Not our usecase. Our usecase (from #25599) is newcomers putting together a sample which would work on a typical C system, and it doesn't work as expected in Zephyr. It should either work by default, or if we can't give that (hopefully we can't), it should fail-fast by default (== not link). Failing that, it should fail-slow or warn (IMHO, there's no much difference between two, as long as "warn" case leads to deterministic behavior) by default.

So, thanks for discussion, that's what I was looking for, but I don't think ASSERT would help with the usecase, that's why this patch printk's a, hopefully visible, warning message.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

(Previous comment was reply to @stephanosio, sorry for copy-paste error in the initial comment posted.)

@stephanosio
Copy link
Member

but I don't think ASSERT would help with the usecase, that's why this patch printk's a, hopefully visible, warning message.

I don't really have a strong opinion on this since it is only going to be temporary; though, I still think it would be nice to have this not end up in a production build.

After all, this is one of those "extra few (well, more than few in this case actually) bytes" added to the smallest possible firmware image you are trying to produce; but once again, this is only temporary so I guess it is fine.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

@cfriedt

I'm just saying c89 and c99 came before posix,

For reference, I don't agree. For me, "posix" means "original unix API, written down on a "standard" paper and massaged somewhat". c89/c99 are much more massaged things and thus, "less classy".

so we can't predecate scanf support on posix.

Then let me be explicit to avoid any confusion: I'm personally not interested to apply random-dirty-hacks to get scanf() working "somehow". I may be interested to make scanf() work "fully", but that's such a long pyramid climb, that it won't happen anytime soon. I'm sharing the formulation of such an approach with you (as someone clearly interested in contributing to POSIX subsys) and @stephanosio (who seems to be interested in scanf() issue) for feedback and to coordinate future development plans.

I would say just one step at a time for now.

Sure. But we should know which path our steps will lead us. Actually, we should choose a final point we want to achieve, a path leading to it, and step that way.

The warnings are a good policy for now, as the overhead is on the order of a few bytes.

Sad thing is that one part of me doesn't agree with that, or rather, has concerns. Because, as by now should be clear, scanf, fdtable, and many other things aren't enabled by default because they're too bloated. So, we don't enable them, to keep code as minimal as possible. But in a wicked twist of fate, we now need to add long warning messages, which both not resolve the issue and make the code much less minimal than it could be. And that my part reminds rules of byte arithmetic: half-byte is "little", one byte is a whole byte, and 2 bytes are "a few" (not little!). Here, we're adding gigantic, bloaty 50+ byte messages. (My other part counters that with "enabling fdtable would certainly add more bytes", but that's weak consolation, given that the idea was to have 0 bytes overhead in case fdtable is not enabled.)

Bottom line: @cfriedt if you secretly grow a proposal on making fdtable to be enabled by default, then go ahead! I can't promise that I personally will be +1 on it, pretty likely, I will be -1 ("more than half-byte is already too much"), but we definitely need fresh perspectives from different people ;-).

@pfalcon
Copy link
Contributor Author

pfalcon commented May 26, 2020

@cfriedt, for reference:

My guess is that Zephyr only supports flat filesystems right now, and even then, that it probably isn't hooked into the POSIX API, which is quite fair.

Nope, it's much more rosy (but not completely rosy as you may imagine). Non-flat filesystems are supported (but you may wonder whether there's really "s" at the end of filesystems (by now I don't remember if they finished littlefs integration); likewise, you may wonder if chdir is there). And of course, it's hooked in POSIX API, why would there be fdtable mini-subsys in the first place? (But there's no generic open() :-D ).

There is a not insignificant overhead there.

The overhead is contained in fdtable. As you imagine, it's written to be minimal (why do you think even close() is routed via ->ioctl? Because function prolog/epilog have non-negligible overhead, so it makes sense to have 1 func with a switch() rather than 5 separate funcs). That's why I was rather saddened by the need to add syscall support for that, as it adds non-negligible bloat to it (and why it wasn't added for a long, and being done only now in #25443)

@pfalcon
Copy link
Contributor Author

pfalcon commented Jun 12, 2020

Some updates here:

  1. We confirmed that "ideal solution" of failing the build link-time can't be achieved. Newlib just provides "convenience" wrapper-definitions for "missing" functions. So, where those wrappers are actually mis-convenience, our only option is to redefine them on our side - exactly the approach take here. (To mention it for completeness, there's a talk and even work on setting up Zephyr fork of Newlib, but I don't think we can "fix" this issue there, as the purpose of the fork is definitely to serve as a staging area before submitting to upstream, not to rehash Newlib paradigm and approach.)
  2. Thinking about what @stephanosio said, I guess he's right, we shouldn't just warn about incompatible use of calls vs config options, we actually should abort the app. Except it shouldn't be ASSERT which can be compiled in or out, it should be outright k_oops(). This approach is followed in e.g. libc: newlib: libc-hooks: Provide our own implementation of __chk_fail() #26135.
  3. And yeah, libc: newlib: libc-hooks: Provide our own implementation of __chk_fail() #26135 is a related patch. It will allow us to clearly separate write() and _write(), and so make precisely targeted fail-when-called-without-proper-config versions.

@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Aug 12, 2020
@pfalcon pfalcon removed the Stale label Aug 12, 2020
@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Oct 12, 2020
@github-actions github-actions bot closed this Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: C Library C Standard Library Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants