Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libc: common: declare strnlen() and strtok_r() in string.h #68261

Conversation

cfriedt
Copy link
Member

@cfriedt cfriedt commented Jan 30, 2024

Rules A.4 and A.5 of the coding guidelines allow the POSIX strnlen() and strtok_r() functions to be used anywhere within Zephyr, including in the kernel.

These are not part of ISO C and typically require _POSIX_C_SOURCE to be declared. However, in Zephyr, POSIX and the interfaces it provides, are optional. _POSIX_C_SOURCE should not be used to compile the kernel and core components of the system, which is also specified by Rules A.4 and A.5 of the coding guidelines.

Declare strnlen() and strtok_r() with _POSIX_C_SOURCE or with _ZEPHYR_SOURCE so that they can be provided in a way that does not conflict with coding guidelines.

Rules A.4 and A.5 of the coding guidelines allow the POSIX
strnlen() and strtok_r() functions to be used anywhere within
Zephyr, including in the kernel.

These are not part of ISO C and typically require
_POSIX_C_SOURCE to be declared. However, in Zephyr, POSIX
and the interfaces it provides, are optional, and should not
be used to compile the kernel and core components of the
system, which is also specified by Rules A.4 and A.5 of the
coding guidelines.

Declare the POSIX strnlen() function with _POSIX_C_SOURCE or
with _ZEPHYR_SOURCE so that it can be used easily by any
C library (including those without POSIX support or those
without knowledge of _ZEPHYR_SOURCE).

Signed-off-by: Christopher Friedt <cfriedt@meta.com>
@cfriedt cfriedt force-pushed the declare-strnlen-and-strtok-r-in-libc-common branch from aa0f8d4 to c78c690 Compare January 30, 2024 01:43
@cfriedt
Copy link
Member Author

cfriedt commented Jan 30, 2024

$ twister -i -T tests/lib/c_lib/
INFO    - Using 'zephyr' toolchain.
INFO    - Selecting default platforms per test case
INFO    - Building initial testsuite list...
INFO    - Writing JSON report /home/cfriedt/zephyrproject/zephyr/twister-out/testplan.json
INFO    - JOBS: 32
INFO    - Adding tasks to the queue...
INFO    - Added initial list of jobs to queue
INFO    - Total complete:  800/ 800  100%  skipped:  295, failed:    0, error:    0
INFO    - 20 test scenarios (800 test instances) selected, 295 configurations skipped (285 by static filter, 10 at runtime).
INFO    - 505 of 800 test configurations passed (100.00%), 0 failed, 0 errored, 295 skipped with 0 warnings in 1192.91 seconds
INFO    - In total 9592 test cases were executed, 6328 skipped on 40 out of total 675 platforms (5.93%)
INFO    - 505 test configurations executed on platforms, 0 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing JSON report /home/cfriedt/zephyrproject/zephyr/twister-out/twister.json
INFO    - Writing xunit report /home/cfriedt/zephyrproject/zephyr/twister-out/twister.xml...
INFO    - Writing xunit report /home/cfriedt/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO    - Run completed

@keith-packard
Copy link
Collaborator

keith-packard commented Jan 30, 2024

What prevents this file from being included when using a library that does have Zephyr support (such as Minimal libc or picolibc?). I would expect this file to only be used when the underlying C library is known to define these functions using these specific signatures for two reasons:

  1. It's nice to get a simple 'function undefined' error when compiling files using them, rather than a link error.
  2. C libraries have habit of 'complexifying' the function definition process; the actual declarations might be wrapped in layers of attributes and/or renaming tricks.

extern "C" {
#endif

#if _POSIX_C_SOURCE >= 200809L || defined(_ZEPHYR_SOURCE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to declare these functions when _POSIX_C_SOURCE is set -- the underlying C library will do that for you.

Copy link
Collaborator

@keith-packard keith-packard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove the checks for _POSIX_C_SOURCE

@aescolar
Copy link
Member

I have created #68278 so we can agree on the direction before we implement it. So if you don't mind let's hold implementing it until we agree. I set a DNM label to reflect this.

@aescolar aescolar added the DNM This PR should not be merged (Do Not Merge) label Jan 30, 2024
@aescolar
Copy link
Member

However, in Zephyr, POSIX and the interfaces it provides, are optional.

I think it is very relevant to mention that Zephyr's POSIX_API library is one thing (which I think should be limited to the just the OS abstraction side), and the POSIX family of standards another (A superset of the ISO C standard, for which a very big part can be, and is, implemented by the C library and has no relationship with the OS)

Our coding guidelines do not forbid setting _POSIX_C_SOURCE and there is nothing wrong with doing so, beyond one checkpatch check which I think should be corrected.

@cfriedt cfriedt requested a review from andyross January 30, 2024 11:45
@cfriedt
Copy link
Member Author

cfriedt commented Jan 30, 2024

What prevents this file from being included when using a library that does have Zephyr support (such as Minimal libc or picolibc?).

Nothing. Also, it does not seem to be necessary. However, those are trivial changes that could be added (if it were necessary). Although, technically we can just move the declaration from the minimal C library to here.

I would expect this file to only be used when the underlying C library is known to define these functions using these specific signatures for two reasons:

  1. It's nice to get a simple 'function undefined' error when compiling files using them, rather than a link error.

What link error exactly? This scenario seems kind of hypothetical.

  1. C libraries have habit of 'complexifying' the function definition process; the actual declarations might be wrapped in layers of attributes and/or renaming tricks.

Ok.. this also seems very hand-wavy. In the end, the compiler should see approximately the equivalent, standard function declaration.

Copy link
Member

@aescolar aescolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before throwing ourselves into solutions like this which can easily result in trouble down the road, I would prefer to discuss if the assumption about needing to expose this by default is necessary #68278.

#endif

#if _POSIX_C_SOURCE >= 200809L || defined(_ZEPHYR_SOURCE)
size_t strnlen(const char *s, size_t maxlen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think it is correct to provide prototypes to C library functions like this. As we cannot know if some C library is defining this as a macro, or as a static inline, or..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think it is correct to provide prototypes to C library functions like this. As we cannot know if some C library is defining this as a macro, or as a static inline, or..

That seems fairly hypothetical (albeit easily accounted for, if necessary).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is certainly not hypothetical. As @keith-packard mentioned #68261 (comment) it is very normal for C libraries to name the implementation something else, use a common implementation hidden behind macros, fall into compiler builtins, or in general do any type of tricks for one reason or another.

Copy link
Member Author

@cfriedt cfriedt Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is certainly not hypothetical.

Is that what is happening here? Does it happen with any of the other C libraries we support?

Specifically, are there conflicting definitions for strnlen and strtok_r?

E.g. @evgeniy-paltsev - does this happen with your toolchain? @marc-hb does this happen with your toolchain?

So far, I'm just trying to test the hypothesis. If there are any concrete cases, that would make this scenario less hypothetical.

@cfriedt
Copy link
Member Author

cfriedt commented Jan 30, 2024

Before throwing ourselves into solutions like this which can easily result in trouble down the road, I would prefer to discuss if the assumption about needing to expose this by default is necessary #68278.

I'm curious as to what "trouble down the road" could be. That seems to be fairly hypothetical.

@aescolar
Copy link
Member

I'm curious as to what "trouble down the road" could be. That seems to be fairly hypothetical.

For ex., if the c library defined one of this as a macro renaming the call to some underlaying implementation or doing some steps before calling into such, this earlier prototype would be meaningless and the same warning would appear.
If the C library defines an incompatible prototype (for ex. inlined), the result would be a compile warning or error.
It really depends on what the C library is doing. And given that Zephyr users have a tendency to have their own compilers and C libraries, some of them quite custom, I would be careful with something like this.

@cfriedt
Copy link
Member Author

cfriedt commented Jan 30, 2024

I'm curious as to what "trouble down the road" could be. That seems to be fairly hypothetical.

For ex., if the c library defined one of this as a macro renaming the call to some underlaying implementation or doing some steps before calling into such, this earlier prototype would be meaningless and the same warning would appear. If the C library defines an incompatible prototype (for ex. inlined), the result would be a compile warning or error. It really depends on what the C library is doing. And given that Zephyr users have a tendency to have their own compilers and C libraries, some of them quite custom, I would be careful with something like this.

The "For ex." implies that the situation described is still hypothetical.

I would like to know if any of the forecasted doomsday events do occur in practice so that reviewers can make an informed, evidence-based decision rather than one based on skepticism, popularity, or FUD.

@keith-packard
Copy link
Collaborator

Ok.. this also seems very hand-wavy. In the end, the compiler should see approximately the equivalent, standard function declaration.

I've dealt with glibc headers for some time, and they have all kinds of magic; for ABI and API compatibility, to optimize some operations to inlines, for _FORTIFY_SOURCE support, etc. The result of including <string.h> is to provide the strnlen and strtok_r APIs for applications to use, but how those include files manage that can vary wildly.

Any declaration of C library symbols may be specific to the C library being used, which is why I changed #67040 to provide newlib-specific definitions of the non ISO-C functions.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 30, 2024

Oh great! Another, incredibly hairy discussion about C/C++ standards and compatibility issues. What better way to start the week?

So it sounds like my name has now become synonym with "ancient toolchain". Is any sort of fame good? Can't decide.

Rules A.4 and A.5 of the coding guidelines allow the POSIX strnlen() and strtok_r() functions to be used anywhere within Zephyr, including in the kernel.

If "including in the kernel", then why are they listed in A.5 but not in A.4? Just curious.

It is certainly not hypothetical. As @keith-packard mentioned #68261 (comment) it is very normal for C libraries to name the implementation something else, use a common implementation hidden behind macros, fall into compiler builtins, or in general do any type of tricks for one reason or another.

I don't know about this specifically but nothing there would surprise me anymore after thesofproject/sof#3880 where I discovered that the GCC optimizer can sometimes (!) invoke memset() even when you don't have it and don't use it anywhere! This resulted in a very puzzling linker failure.

These sound even less hypothetical when mentioned by people with as much expertise as @keith-packard and @aescolar.

I feel like I have other, older and better forgotten but similar scars.

Specifically, are there conflicting definitions for strnlen and strtok_r?

I can test some of our toolchains but I could use more specific guidance. Not just to save time but also to help me being accurate and relevant. So is there some specific Zephyr test/sample you would like me to go and try to compile? With which libc? With and without this commit I presume? Any other PR to test? Any specific warning to make sure is enabled? If still interested maybe ask in #68278 instead?

FWIW a quick git grep shows neither is used in SOF. strnlen_s() is not use either. You probably didn't care about that.

In 2019, sof commit fcbf763ac83b added implementations of Annex K memset_s() and memcpy_s() to
sof/src/arch/xtensa/include/arch/string.h and they are currently used in SOF. I bet it was done to comply with some Microsoft-inspired rule. Probably off-topic too.

@keith-packard
Copy link
Collaborator

If "including in the kernel", then why are they listed in A.5 but not in A.4? Just curious.

strnlen is listed in A.4, but strtok_r is not? Odd.

I don't know about this specifically but nothing there would surprise me anymore after thesofproject/sof#3880 where I discovered that the GCC optimizer can sometimes (!) invoke memset() even when you don't have it and don't use it anywhere! This resulted in a very puzzling linker failure.

When -ffreestanding is not defined, the compiler is free to use the hosted C API, including memset. And that's a good thing, because the alternative is to have local initializers like int a[100] = {}; open-code a memset inline. This is one of the "benefits" of getting rid of -ffreestanding.

FWIW a quick git grep shows neither is used in SOF. strnlen_s() is not use either. You probably didn't care about that.

Heh. Annex K considered harmful?

@cfriedt
Copy link
Member Author

cfriedt commented Jan 31, 2024

Closing this in favour of #67040 as (I believe) that solves the majority of the issues we encountered with strnlen / strtok_r and newlib in a scalable way.

@cfriedt cfriedt closed this Jan 31, 2024
@ycsin
Copy link
Member

ycsin commented Feb 1, 2024

Whoever saw this might also be interested in #68381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: C Library C Standard Library DNM This PR should not be merged (Do Not Merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants