Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanf() not functional with newlib out of the box #25599

Closed
jaslob opened this issue May 25, 2020 · 20 comments
Closed

scanf() not functional with newlib out of the box #25599

jaslob opened this issue May 25, 2020 · 20 comments
Labels
area: C Library C Standard Library area: POSIX POSIX API Library Stale

Comments

@jaslob
Copy link

jaslob commented May 25, 2020

Describe the bug
I'm new to Zephyr and did a small test application.
I'm able to build the application but when I try to run the application I'm not able to enter an integer and I receive an error.
I receive the same error when I just add two floats without using scanf.

main.c

#include <zephyr.h>
#include <sys/printk.h>
#include <math.h>
#include <stdio.h>

void main(void)
{
        int i;
        int a=1;
	int b=2;
	printf("Enter an Integer: \n");
	scanf("%d", &i);
	int result = a + b;
	printf("result: %d\n", result);
        printf("scanned value: %d\n", i);

Example for float:
        /*float a=1.2;
	float b=2.3;
	float result = a + b;
	printf("result: %f\n", result);*/
}

prj.conf looks like:

    CONFIG_NEWLIB_LIBC=y
    CONFIG_NEWLIB_LIBC_FLOAT_SCANF=y
    CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y

To Reproduce
Steps to reproduce the behavior:

  1. west build -p auto -b qemu_x86 -d build
  2. cd build
  3. west build -t run

Expected behavior
Enter an Integer:

  • wait for me to enter an integer -
    result: 3
    scanned value: "entered integer"

Impact
annoyance, showstopper

Screenshots or console output
Booting from ROM..Optimal CONFIG_X86_MMU_PAGE_POOL_PAGES 7
*** Booting Zephyr OS build v2.3.0-rc1-82-gc8a30ecacc50 ***
Enter an Integer:
result: 3
scanned value: 583
FAILED: zephyr/CMakeFiles/run
cd /home/user/zephyrproject/zephyr/project/test/build && /home/user/Zephyr/zephyr-sdk-0.11.2/sysroots/x86_64-pokysdk-linux/usr/bin/qemu-system-i386 -m 9 -cpu qemu32,+nx,+pae -device isa-debug-exit,iobase=0xf4,iosize=0x04 -no-reboot -nographic -net none -pidfile qemu.pid -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel /home/user/zephyrproject/zephyr/project/test/build/zephyr/zephyr.elf
ninja: build stopped: subcommand failed.
FATAL ERROR: command exited with status 1: /usr/bin/cmake --build /home/user/zephyrproject/zephyr/project/test/build --target run.

Environment (please complete the following information):

  • OS: Clear Linux
  • Toolchain: Zephyr SDK
  • Version: 0.11.2

Additional context
Note that the samples hello world and synchonization worked just fine. And just adding 2 integer works as well. Not sure if I made a mistake somewhere or if Qemu hast problems with scanf and float.

@jaslob jaslob added the bug The issue is a bug, or the PR is fixing a bug label May 25, 2020
@carlescufi carlescufi added the area: QEMU QEMU Emulation label May 25, 2020
@carlescufi
Copy link
Member

carlescufi commented May 25, 2020

I suspect this is an issue with scanf not being connected properly to the console, since samples/subsys/console/getline/ works fine. @stephanosio any ideas?
EDIT: In fact, the result is very similar if I run it on a board too, so this has nothing to do with qemu but rather with scanf itself. CC @pfalcon and @andrewboie

@carlescufi carlescufi changed the title Problems running application on Qemu scanf() not functional with newlib May 25, 2020
@stephanosio
Copy link
Member

stephanosio commented May 26, 2020

NOTE: This problem is not specific to qemu_x86, I am also seeing this on qemu_cortex_m0 and qemu_cortex_m3. Looks like some kind of stack/buffer overflow.

I set an initial value for i and see that the variable is getting preserved, so it looks as though scanf is not writing to the var at all.

@stephanosio
Copy link
Member

stephanosio commented May 26, 2020

So what is happening is that the newlib _read hook function (called by scanf) calls _stdin_hook function pointer, and this is set to no-op (return 0) unless you call __stdin_hook_install to install one.

Try something like:

...
#include <console/console.h>
...

void __stdin_hook_install(unsigned char (*hook)(void));

static unsigned char my_stdin_hook(void)
{
        int ch = console_getchar();

        if (ch == '\r') {
                console_putchar('\r');
                console_putchar('\n');
        } else {
                console_putchar(ch);
        }

        return ch;
}

void main(void)
{
...
        console_init();
        __stdin_hook_install(my_stdin_hook);
...
}
CONFIG_CONSOLE_SUBSYS=y
CONFIG_CONSOLE_GETCHAR=y

@stephanosio
Copy link
Member

@pfalcon (and other console subsystem authors if any) I see that __stdin_hook_install is not called alongside __stdout_hook_install in the console backend drivers.

Was that a deliberate decision or an oversight by the backend driver authors?

@stephanosio
Copy link
Member

stephanosio commented May 26, 2020

ninja: build stopped: subcommand failed.
FATAL ERROR: command exited with status 1: /usr/bin/cmake --build > > > 
/home/user/zephyrproject/zephyr/project/test/build --target run.

As for the qemu_x86 failure exit on run, we DO have stack problem. Setting CONFIG_MAIN_STACK_SIZE=2048 (default is 1024) will resolve the problem.

@andrewboie Maybe we should consider increasing the default MAIN_STACK_SIZE for qemu_x86?

@stephanosio stephanosio added question and removed bug The issue is a bug, or the PR is fixing a bug labels May 26, 2020
@stephanosio
Copy link
Member

stephanosio commented May 26, 2020

Removing bug label, since this is technically not a bug, but rather a misconfiguration or lack thereof.

This could potentially lead to an enhancement in which the stdin hook is installed by default by the console backend drivers.

p.s. If, for some reason, we must not install the stdin hook in the backend drivers, the documentation could definitely use some improvements.

@pfalcon
Copy link
Contributor

pfalcon commented May 26, 2020

So what is happening is that the newlib _read hook function (called by scanf) calls _stdin_hook function pointer, and this is set to no-op (return 0) unless you call __stdin_hook_install to install one.

See also #25479 for more fun.

@pfalcon
Copy link
Contributor

pfalcon commented May 26, 2020

@pfalcon (and other console subsystem authors if any) I see that __stdin_hook_install is not called alongside __stdout_hook_install in the console backend drivers.

Was that a deliberate decision or an oversight by the backend driver authors?

To answer this question, we would first need to answer following question: "Is it reasonable to expect scanf() to work out of the box in the default Zephyr configuration?" Whoever thinks yes, please raise your hands. I for example think "no", because in the default Zephyr configuration, it should produce "hello world" app in 512 code bytes max (already violated) and be able to run full BLE stack on chips with 128K ROM/16K RAM.

And if it's unclear where I lead to, supporting scanf() out of the box means incur significant overheads, almost certainly to be paid by all users, even those not using stuff like scanf() (and that's 99.9+%). Actually, I don't think that anybody so far wanted to use scanf() on Zephyr at all, far less out of the box. So, no surprise it doesn't work out of the box.

Summing up, there're 2 questions to address:

  • How much (with what priority) we want to support scanf() and the like at all?
  • Do we want to support them out of the box?

I personally would like to see POSIX subsys to be more integrated into Zephyr core (== more of stuff work out of the box), but that doesn't include scanf(), etc. And I'm not ready to lead on the transformation. FYI @cfriedt.

What's clear that if doesn't work (OOB or at all), we shouldn't make users confused about that, and instead should fail fast. E.g. #25479 is a step in that direction: it's reasonable to call write() on a socket, but who'd know one needs to enable CONFIG_POSIX_API for that? Well, now users will know. (zephyrproject-rtos/sdk-ng#221 (comment) is even better solution, with the next SDK release, users may get a link error unless they did so (to be tested to work like that, testing help welcome)).

cc @nashif, @andrewboie, @galak

@pfalcon pfalcon added area: C Library C Standard Library area: POSIX POSIX API Library and removed area: QEMU QEMU Emulation labels May 26, 2020
@pfalcon pfalcon changed the title scanf() not functional with newlib scanf() not functional with newlib out of the box May 26, 2020
@stephanosio
Copy link
Member

stephanosio commented May 26, 2020

I can't really think of any sane usage of scanf in Zephyr, but either way __stdin_hook_install is not necessarily limited to scanf and should probably be installed alongside __stdout_hook_install -- that is, if we are installing __stdout_hook_install to begin with; or at least, there should be a configuration option to enable doing so in the console subsystem to keep things more sane.

As for scanf, if you are enabling newlib, you are already past the point of producing an ultra small image, so I am not sure if the "minimalist approach" argument is fully valid in this case.

@cfriedt
Copy link
Member

cfriedt commented May 26, 2020

From my perspective, CONFIG_CONSOLE should be independent of CONFIG_NEWLIB_LIBC, but if someone wants stdio from newlib, then CONFIG_CONSOLE should be a default.

Maybe there should be a config NEWLIB_STDIO option that has a select CONSOLE line. That would help to formalize this quirk, I would say. CONFIG_NEWLIB_STDIO could default to y if CONFIG_NEWLIB_LIBC was selected (or not).

There may be a very niche use case where someone does not want to use "standard" io in the console sense. In that case, I would say that CONFIG_NEWLIB_LIBC should be selected, but CONFIG_NEWLIB_STDIO should not be selected.

Normally, user code really shouldn't be calling any __ prefixed functions (aside from maybe gcc builtins?). But perhaps it's an acceptable thing in the very niche use case above.

@carlescufi
Copy link
Member

Maybe there should be a config NEWLIB_STDIO option that has a select CONSOLE line. That would help to formalize this quirk, I would say. CONFIG_NEWLIB_STDIO could default to y if CONFIG_NEWLIB_LIBC was selected (or not).

That's exactly what crossed my mind yesterday when looking through those Kconfig options.

@stephanosio
Copy link
Member

Note that the initiator of stdout (and potentially stdin) hook installation is currently not newlib, but the console backend drivers, e.g:

static void uart_console_hook_install(void)
{
__stdout_hook_install(console_out);
__printk_hook_install(console_out);
}

This is not specific to newlib or any libc for that matter. I believe the original intention was to make this a libc type-independent stdin/stdout hook installation interface (which, for now, only newlib is using).

Also note that there is CONFIG_STDOUT_CONSOLE, which is somewhat similar to what you suggest above; but this config does not seem to be properly enforced.

In general, I think the console subsystem as whole needs major refactoring.

@andrewboie
Copy link
Contributor

I agree with @stephanosio that the way we handle console hooks needs to be rewritten, it's some very very old stuff from the Wind River days that pre-dates anyone's involvement who's currently on the project. Some attention here would be gratefully received for anyone who has time/inclination.

I haven't looked at what it would take to fix this particular issue, but @pfalcon's question "How much (with what priority) we want to support scanf() and the like at all?" is relevant in that scanf() is widely regarded as a security disaster, on par with gets(). There are workarounds (such as in https://stackoverflow.com/questions/1621394/how-to-prevent-scanf-causing-a-buffer-overflow-in-c ) which I imagine ought to be functional though.

@andrewboie Maybe we should consider increasing the default MAIN_STACK_SIZE for qemu_x86?

the C library printf() and scanf() use tons of stack space. Not many tests use them, it might be better to just increase stack size of those particular tests, but I don't have a strong opinion about it. Is it just x86 that this falls over?

@stephanosio
Copy link
Member

the C library printf() and scanf() use tons of stack space. Not many tests use them, it might be better to just increase stack size of those particular tests, but I don't have a strong opinion about it. Is it just x86 that this falls over?

@andrewboie I haven't done a comprehensive testing, but I didn't see any such failures on the ARM platforms while testing this. I do recall seeing the x86 QEMU seldom crashing due to similar stack size related issues, so I thought it would be nice to address this at some point. But given where this discussion is going, I think it is better to wait for now.

@pfalcon
Copy link
Contributor

pfalcon commented May 26, 2020

I do recall seeing the x86 QEMU seldom crashing due to similar stack size related issues, so I thought it would be nice to address this at some point. But given where this discussion is going, I think it is better to wait for now.

I see regular regressions in stack usages, i.e. it constantly growing, and various sample around crash. I regularly submit patches to bump stack sizes, but follow a policy of doing minimal bumps (say, by 64 bytes), so we continue to catch regressions. On first few occasions I cc: @nashif and @andrewboie on these cases, but never get a response, so I assume nobody cares about stack growth (and thus constant slip in minimum requirements for Zephyr, it's clear that we're by now far from "Zephyr can run in 8K", no matter what those 8K refer to, RAM or ROM).

I personally would find bumping default stack size from 1K to 2K unacceptable. But yes, that's pretty much what more and more sample require now.

(And we talk about stack size, we don't even talk about 64-bit archs, which of course require 2x times more stack. If I was to do it right, I'd add option to automatically 2x stack sizes on such platforms. (Well, who knows, maybe it's already there.))

@andrewboie
Copy link
Contributor

I assume nobody cares about stack growth

It's possible to conduct this discussion without using unnecessary hyperbole like this.

Consider for a moment that both me and @nashif get hundreds, if not over a thousand github emails every day.

If something you see is broken, file a bug on it. That includes minimum stack sizes. Be data-driven and constructive.

@andrewboie
Copy link
Contributor

@andrewboie I haven't done a comprehensive testing, but I didn't see any such failures on the ARM platforms while testing this. I do recall seeing the x86 QEMU seldom crashing due to similar stack size related issues, so I thought it would be nice to address this at some point. But given where this discussion is going, I think it is better to wait for now.

@stephanosio Is this a deterministic failure or something that fails intermittently.
I would expect stack overflows to be deterministic and hence caught by CI 100% of the time. If it only overflows sporadically that would be very weird.

@stephanosio
Copy link
Member

stephanosio commented May 26, 2020

Is this a deterministic failure or something that fails intermittently.
I would expect stack overflows to be deterministic and hence caught by CI 100% of the time. If it only overflows sporadically that would be very weird.

@andrewboie The failures are deterministic; they occur mostly when I am adding random code for testing and/or playing with the configurations, which is to be expected (it seems x86 is either on the verge of stack overflow with the default configs, or just particularly stack hungry; though I haven't done an in depth analysis on this).

If I keep seeing related failures in the future, I will file an issue/PR for it

@jaslob
Copy link
Author

jaslob commented May 27, 2020

Thank you for your help.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: C Library C Standard Library area: POSIX POSIX API Library Stale
Projects
None yet
Development

No branches or pull requests

6 participants