Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base os: remove longstanding layering violation #75513

Open
6 of 13 tasks
cfriedt opened this issue Jul 5, 2024 · 46 comments · May be fixed by #75348
Open
6 of 13 tasks

base os: remove longstanding layering violation #75513

cfriedt opened this issue Jul 5, 2024 · 46 comments · May be fixed by #75348
Assignees
Labels
area: Base OS Base OS Library (lib/os) area: Bluetooth area: C Library C Standard Library area: C++ area: File System area: Modem Drivers area: native port Host native arch port (native_sim) area: Networking area: POSIX POSIX API Library bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Milestone

Comments

@cfriedt
Copy link
Member

cfriedt commented Jul 5, 2024

Describe the bug
In the Base OS, fdtable.h introduced a layering violation (in the first (?) commit) in which the POSIX header <sys/types.h> is included and the POSIX type ssize_t is used in a way that can be described as "below the line" (the line being where the POSIX API lives in the software stack). Such a layering violation creates a dependency cycle at the API level. As noted in the original commit, the wrong types seem to be added to satisfy some need of the native_posix platform.

As part of the base OS, the layering violation spread throughout the Zephyr project into the following areas (and likely more):

  • Base OS
  • Networking
  • Filesystem
  • Modem Driversd
  • Network Drivers
  • Bluetooth simulator
  • Native drivers
  • C Library
  • C++
  • POSIX API

Aside from extensive manual testing with twister and CI not catching a strange set of bugs, this layering violation was also the root cause for #75205 (as demonstrated by #75348 archived here with testing details here).

Please also mention any information which could help others to understand
the problem you're facing:

  • What target platform are you using? all
  • What have you tried to diagnose or workaround this issue? found the root cause
  • Is this a regression? Technically no, it was just always broken f484bba

To Reproduce
Steps to reproduce the behavior:

  1. See multipe tests/posix & tests/lib/c_lib/thrd build failures due to missing time.h definitions #75205

Expected behavior
Developers and reviewers adding code which does not introduce dependency cycles and layering violations.

Impact
This bug caused planned features to be removed from the v3.7.0 release. If it is not fixed, it will mean that the next LTS release cycle will still be quite full of these layering violations and will prevent further bugfixes from being backported to all of the aforementioned areas.

Logs and console output

Environment (please complete the following information):

  • OS: any
  • Toolchain (e.g Zephyr SDK, ...): all
  • Commit SHA or Version used: c9708ff

Additional context

The simpler workaround was deemed to simple. So maintainers asked for the more complicated fix. There is a PR linked to demonstrate the more complicated fix.

This bug is blocking the following (TSC-approved) features from being re-added to v3.7.0.

POSIX_FD_MGMT

POSIX_DEVICE_IO

This bug is blocking other architectural fixes that have plagued Zephyr for multiple (actually all) LTS release cycles already

This bug is also blocking numerous the stability improvements mentioned here.

@cfriedt cfriedt added bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug area: Bluetooth area: Networking area: C Library C Standard Library area: File System area: native port Host native arch port (native_sim) area: POSIX POSIX API Library area: C++ area: Base OS Base OS Library (lib/os) area: Modem Drivers labels Jul 5, 2024
@cfriedt cfriedt added this to the v3.7.0 milestone Jul 5, 2024
@cfriedt cfriedt assigned cfriedt and nashif and unassigned cfriedt Jul 5, 2024
@kartben
Copy link
Collaborator

kartben commented Jul 5, 2024

@cfriedt can you please replace the bit.ly shortened URLs with direct links to the resources (e.g. the respective pull requests/issues/Gists)?

@nashif
Copy link
Member

nashif commented Jul 5, 2024

@cfriedt base os is a collection of many things, maintained by many folks, so I do not think I am the right assignee. I consider fstable is part of the posix subsystem. Probably we should change that in the MAINTAINER file.

@nashif nashif assigned cfriedt and unassigned nashif Jul 5, 2024
@cfriedt
Copy link
Member Author

cfriedt commented Jul 5, 2024

@cfriedt base os is a collection of many things, maintained by many folks, so I do not think I am the right assignee. I consider fstable is part of the posix subsystem. Probably we should change that in the MAINTAINER file.

@nashif - you are the maintainer of the Base OS. Fstable is not at all a part of POSIX, as other parts of Zephyr use it independently of POSIX. But that certainly has historically not stopped excessive cross-pollination of APIs.

Don't worry, I've already spent some time cleaning up this mess and will be making a PR shortly.

@cfriedt cfriedt assigned nashif and unassigned cfriedt Jul 5, 2024
@cfriedt
Copy link
Member Author

cfriedt commented Jul 5, 2024

@cfriedt can you please replace the bit.ly shortened URLs with direct links to the resources (e.g. the respective pull requests/issues/Gists)?

@kartben - feel free to do so yourself. You have edit permissions, correct? Editing the bug report is not a priority for me. Fixing the bug is (as it is High Priority).

@nashif
Copy link
Member

nashif commented Jul 5, 2024

commit f484bbaa264f3a9640ca52477e3038f912ea5642
Author: Paul Sokolovsky <paul.sokolovsky@linaro.org>
Date:   Mon Oct 8 13:55:03 2018 +0300

    lib: posix: Implement generic file descriptor table

    The table allows to wrap read/write (i.e. POSIX-compatible) semantics
    of any I/O object in POSIX-compatible fd (file descriptor) handling.
    Intended I/O objects include files, sockets, special devices, etc.

    The table table itself consists of (underlying obj*, function table*)
    pairs, where function table provides entries for read(), write, and
    generalized ioctl(), where generalized ioctl handles all other
    operations, up to and including closing of the underlying I/O object.

    Fixes: #7405

    Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>

as it was introduced by the previous posix maintainer

@nashif nashif removed their assignment Jul 5, 2024
@nashif
Copy link
Member

nashif commented Jul 5, 2024

As noted in the original commit, the wrong types seem to be added to satisfy some need of the native_posix platform.

what original commit? @aescolar is this needed for native_posix/native_sim?

@aescolar
Copy link
Member

aescolar commented Jul 5, 2024

@nashif It's a bit difficult to guess what a long gone maintainer meant in a 6 year old FIXME comment which may or may not apply. What I can see is that that comment mentions off_t which is not used in that header. It also mentions ssize_t, which would indicate if anything that it was needed to get that type for some C library/ies. And yes, for POSIX compliant C libraries, you can get ssize_t from sys/types.h, but for glibc or pico also from stdio.h or a bunch of other headers.
Beyond that I can only speculate, but I cannot guess any need for "native_posix" itself to have that header included there.

@cfriedt is the issue here that you would like that ssize_t is not used in most of Zephyr because it is not a ISO C standard type, but defined in the POSIX spec?
If so, I think the best way forward would be to create an RFC to discuss it, and if there is agreement that it should not be used, an enhancement issue could be created.

As mentioned before, the current state of the code not being directly compatible with adding a new feature or supporting some refactoring is not a bug. It is just normal development.

In any case, and as also mentioned before, you should not see all that is defined in the POSIX specs as a single entity or a single layer. Some parts are simple extensions to the ISO C library types, some other simple utility functions which extend those provided by the ISO standard, some other parts specify APIs an OS should provide for different types of high level functionality, and much more.
In that regard, they cannot be seen as provided by a single component, as that would lead to misunderstandings and something architecturally not sound.

@aescolar
Copy link
Member

aescolar commented Jul 5, 2024

@cfriedt can you please replace the bit.ly shortened URLs with direct links to the resources (e.g. the respective pull requests/issues/Gists)?

@kartben - feel free to do so yourself.

@cfriedt if you have any interest in anybody using those links, please replace them yourself. Shortened links may be malicious, and you cannot expect others to click on them. It is not reasonable to expect others to clean up after you.

@aescolar
Copy link
Member

aescolar commented Jul 5, 2024

As mentioned before this seems better suited for an RFC so I convert the issue.
@cfriedt please do not convert it back, as that will just create noise and waste time. If you want we can discuss it in the release meeting.
I'd ask you to please clean up the description as it is all over the place and hampers understanding on what is it that you would like.

@aescolar aescolar removed the bug The issue is a bug, or the PR is fixing a bug label Jul 5, 2024
@henrikbrixandersen
Copy link
Member

In responding to @henrikbrixandersen , I was just realizing that we do have access to the picolibc/newlib types for ssize_t and off_t even when POSIX is not enabled;

Right, so why can't we use those with the Zephyr regardless of POSIX being enabled or not?

@keith-packard
Copy link
Collaborator

Right, so why can't we use those with the Zephyr regardless of POSIX being enabled or not?

Hrm. How much of a kludge are you interested in here? We could define ssize_t and off_t inside of Zephyr header files even when _POSIX_C_SOURCE wasn't defined. That would violate the POSIX spec, but we could make it work for specific C libraries where we knew how to work with their implementation. For picolibc and newlib:

#if defined(CONFIG_PICOLIBC) || defined(CONFIG_NEWLIB_LIBC)
#include <sys/types.h>
#ifndef _OFF_T_DECLARED
typedef	__off_t		off_t;
#define	_OFF_T_DECLARED
#endif
#ifndef _SSIZE_T_DECLARED
typedef _ssize_t ssize_t;
#define	_SSIZE_T_DECLARED
#endif
#endif

You'd need stanzas for other POSIX-compatible C libraries and a stanza for configurations using the Zephyr POSIX layer.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 10, 2024

Right, so why can't we use those with the Zephyr regardless of POSIX being enabled or not?

A few reasons:

  1. it's not great to disable an API and simultaneously try to use it (this should be obvious)
  2. the aforementioned dependency cycle

But also,

  1. why bother, when one can simply design a sound architecture, without hacks or dependency cycles?

The thing that is a major red flag for me here, is that you are defending the architecture with (objectively negative) traits 1 and 2 (antipatterns), and actively advocating against the (much cleaner) architecture with naturally ordered dependencies.

Do you care to elaborate more on your perspective?

@aescolar
Copy link
Member

aescolar commented Jul 10, 2024

The whole problem here is that these types must be defined by the POSIX layer..

I think it would be really nice to start by having a common understanding of what we expect defined in the C library and what defined/redefined/overriden by different parts of Zephyr's POSIX API compatibility components, or the C library integration.
And in that same light, what types do we expect to come from the C library or the others.

Just an overall indication of the aim, and in detail for this types in question would help.

Picolibc isn't built against the Zephyr definitions, so having Zephyr define these sizes will make for ABI fun as the Zephyr code will use Zephyr definitions and the Picolibc code will use Picolibc definitions.

Indeed.

Anyhow, I ask this because I do not know what is the long term plan here. We can stop using ssize_t in most of Zephyr. But then what?

Note

The POSIX standard specifies many components from different layers. Between those a superset of the C library.
For C library headers (https://pubs.opengroup.org/onlinepubs/9699919799/idx/head.html),
for some it simply relies on the standard definition (for ex. stdbool.h https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/stdbool.h.html), for others specifies the standard + extensions (for ex. string.h https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/string.h.html )
for others specifies something not defined by the ISO C standard (For ex. sys/types.h https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html)
In any case, as some components of the C library are an API to the OS, we cannot just see it as "one layer" in Zephyr context. Neither can we see the POSIX API compatibility components as just one layer.

sys/types.h is an annoying header because it is required to provide both some types which are very coupled to different parts of the OS abstraction components, and others which are quite basic (i.e. ssize_t or even size_t) and used in low level C library functions.
Unfortunately it is also a header C libraries tend to include from many other headers.

In any case, I think it would also help if we stopped claiming we are fixing a "layer violation" from a component, while that component is by design / or the long term goal is to override C library headers, or using the C library internal definitions or guards. This is spaghetti architecture. So please let's focus on the actual technical issues and proposals.


As a practical example, if you compile using picolibc in non-POSIX mode ... then these types are not defined

@keith-packard are you sure about this?

$ mkdir build && cd build
$ cmake -GNinja -DBOARD=qemu_x86 ../samples/hello_world/ -DCONFIG_PICOLIBC=y && ninja
# And now I just retrigger a preprocessor only pass on samples/hello_world/main.c (with or without -D__ZEPHYR__=1 to avoid _ZEPHYR_SOURCE)
$ /opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/bin/x86_64-zephyr-elf-gcc -DKERNEL -DK_HEAP_MEM_POOL_SIZE=0 -DPICOLIBC_LONG_LONG_PRINTF_SCANF -D__LINUX_ERRNO_EXTENSIONS__  -Izephyr/build/zephyr/include/generated/zephyr -Izephyr/include -Izephyr/build/zephyr/include/generated -Izephyr/soc/intel/atom -Izephyr/soc/intel/atom/. -isystem zephyr/lib/libc/common/include -m32  -fno-strict-aliasing -Os -imacros zephyr/build/zephyr/include/generated/zephyr/autoconf.h -fno-printf-return-value -fno-common -g -gdwarf-4 -fdiagnostics-color=always -m32 -msoft-float -Wa,--divide --sysroot=/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/x86_64-zephyr-elf -imacros zephyr/include/zephyr/toolchain/zephyr_stdint.h -Wall -Wformat -Wformat-security -Wno-format-zero-length -Wdouble-promotion -Wno-pointer-sign -Wpointer-arith -Wexpansion-to-defined -Wno-unused-but-set-variable -Werror=implicit-int -fno-pic -fno-pie -fno-asynchronous-unwind-tables -ftls-model=local-exec -fno-reorder-functions --param=min-pagesize=0 -fno-defer-pop -fmacro-prefix-map=zephyr/samples/hello_world=CMAKE_SOURCE_DIR -fmacro-prefix-map=zephyr=ZEPHYR_BASE -fmacro-prefix-map=zephyrproject=WEST_TOPDIR -ffunction-sections -fdata-sections -mpreferred-stack-boundary=2 -mno-mmx -mno-sse --specs=picolibc.specs -std=c99 -MD -MT CMakeFiles/app.dir/src/main.c.obj -MF CMakeFiles/app.dir/src/main.c.obj.d -o CMakeFiles/app.dir/src/main.c.obj -c zephyr/samples/hello_world/src/main.c -E -dI -dU
# Note the extra -E -dI -dU compared to the plain build command
$ grep -C 3  ssize_t ./CMakeFiles/app.dir/src/main.c.obj 
#undef __machine_size_t_defined
# 182 "/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/picolibc/include/sys/_types.h" 3 4
typedef unsigned int __size_t;
#undef __machine_ssize_t_defined
# 198 "/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/picolibc/include/sys/_types.h" 3 4
typedef signed int _ssize_t;
#define unsigned signed
# 209 "/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/picolibc/include/sys/_types.h" 3 4
typedef _ssize_t __ssize_t;
...
typedef _ssize_t ssize_t;
...
ssize_t getline(char **restrict lineptr, size_t *restrict n, FILE *restrict stream);
ssize_t getdelim(char **restrict lineptr, size_t *restrict n, int delim, FILE *restrict stream);

Note
https://github.com/picolibc/picolibc/blob/main/newlib/libc/stdio/stdio.h#L34 (says it needs it)
https://github.com/picolibc/picolibc/blob/main/newlib/libc/stdio/stdio.h#L225 (defines on its own if some other include didnt yet)
https://github.com/picolibc/picolibc/blob/main/newlib/libc/stdio/stdio.h#L550 (uses it unconditionally)

(EDIT after talk with @keith-packard , corrected links, conclusion is the same)
https://github.com/picolibc/picolibc/blob/main/newlib/libc/tinystdio/stdio.h#L46
https://github.com/picolibc/picolibc/blob/main/newlib/libc/tinystdio/stdio.h#L324

@cfriedt
Copy link
Member Author

cfriedt commented Jul 10, 2024

The whole problem here is that these types must be defined by the POSIX layer..

I think it would be really nice to start by having a common understanding of what we expect defined in the C library and what defined/redefined/overriden by different parts of Zephyr's POSIX API

Most of the types that must be defined by the POSIX layer are defined in terms of primitives provided by the OS.

Zephyr should really not be an outlier here.

If Zephyr communicates those primitives to picolibc in a manner that is directly compatible with picolibc (and newlib), then there should be no problems for anyone.

Picolibc isn't built against the Zephyr definitions, so having Zephyr define these sizes will make for ABI fun as the Zephyr code will use Zephyr definitions and the Picolibc code will use Picolibc definitions.

Ideally picolibc would be built against the Zephyr definitions like most other operating systems already do for most C libraries. That's kind of the natural dependency ordering.

Forcing the opposite introduces a lot of unnecessary complexity.

Anyhow, I ask this because I do not know what is the long term plan here. We can stop using ssize_t in most of Zephyr. But then what?

Removing dependency cycles / layering violations has been part of the long term plan for a couple of years already.

[!NOTE]
The POSIX standard specifies many components from different layers. Between those a superset of the C library.

For C library headers (https://pubs.opengroup.org/onlinepubs/9699919799/idx/head.html),

"C library headers" is a slight misnomer - most of these headers are not part of ISO C and certainly a C library can be conformant without them. So calling them "C library headers" is misleading.

These are POSIX headers for the most part.

for others specifies the standard + extensions (for ex. string.h https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/string.h.html )

Yes, unfortunately, POSIX chose to piggyback on some existing ISO C headers, so there are extensions to ISO C (at least one or two Option Groups).

Planned stabilization fixes include consolidating e.g. posix/time.h and posix/signal.h with their libc counterparts. String.h is not really an issue in Zephyr because it was never "forked" in POSIX.

This should reduce complexity significantly.

for others specifies something not defined by the ISO C standard (For ex. sys/types.h https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html)

In any case, as some components of the C library are an API to the OS, we cannot just see it as "one layer" in Zephyr context.

The big difference is that POSIX is not an API for the OS to use whereas ISO C is (at least the parts that do not involve threading or I/O).

Neither can we see the POSIX API compatibility components as just one layer.

On the contrary, it's pretty easy to see the POSIX (C language) API as one layer (with many features) that sits above the operating system. It is, by definition, the Portable Operating System Interface.

sys/types.h is an annoying header

It's just a POSIX header. It's only annoying when it's being (ab)used in ways that were never intended.

because it is required to provide both some types which are very coupled to different parts of the OS abstraction components, and others which are quite basic (i.e. ssize_t or even size_t) and used in low level C library functions.

POSIX functions. It would be good to stop burring the lines.

size_t is ISO C (not POSIX). Like many other types (bool, int32_t, etc) POSIX simply requires that it is provided by the C library.

Unfortunately it is also a header C libraries tend to include from many other headers.

Newlib (and by extension picolibc) were originally designed for POSIX runtimes (GNU), so although <sys/types.h> is a POSIX header, and used in many other POSIX headers (publicly), it would not surprise me to learn that it may also be used privately to implement ISO C functionality within newlib or picolibc.

In any case, I think it would also help if we stopped claiming we are fixing a "layer violation" from a component

I would stop pointing out the layering violation if it weren't broken.

So maybe fixing that is a good idea?

This is spaghetti architecture.

Spaghetti architecture is what we have today in Zephyr as a result of many years of tech debt. Perhaps understanding what is part of POSIX and what is part of ISO C several years ago would have changed things.

I would be happy to continue cleaning it up though.

So please let's focus on the actual technical issues and proposals.

I would be happy to continue focusing on technical details as well.

@aescolar
Copy link
Member

aescolar commented Jul 11, 2024

Most of the types that must be defined by the POSIX layer are defined in terms of primitives provided by the OS[..]
If Zephyr communicates those primitives to picolibc[..]

And how do you plan to do that, given that we support other C libraries, including C libraries provided by proprietary toolchains?

So calling them "C library headers" is misleading.

It may be misleading, but C libraries tend to provide many of them, and use them themselves.

These are POSIX headers for the most part.

The POSIX (2017 version) includes all 25 C99 headers, of which it expands these:

The big difference is that POSIX is not an API for the OS to use whereas ISO C is[..]
On the contrary, it's pretty easy to see the POSIX (C language) API as one layer (with many features) that sits above the operating system.

The POSIX standard extends the C library with things as low level as, for ex.,
a definition of LONG_BIT (number of bits in a long) in limits.h,
or requiring macros for the constants for e or pi from math.h.

It also extends it with other lowly things like ssize_t in stdio.h (and SSIZE_MAX in limits.h)
or "bare metal" functions like strnlen() in string.h,
which could easily be considered below the OS.

size_t is ISO C (not POSIX).

I mentioned size_t because the posix standard requires sys/types.h to provide it.
Even though you can also expect it from many other plain C standard headers as stddef.h
So one could expect that, if you are going to provide your own sys/types.h you would
need to rely on its C library definition or at least ensure you are aligned with it.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 11, 2024

Most of the types that must be defined by the POSIX layer are defined in terms of primitives provided by the OS[..]
If Zephyr communicates those primitives to picolibc[..]

And how do you plan to do that, given that we support other C libraries, including C libraries provided by proprietary toolchains?

The most obvious way is how things have been done for the last 30 years or so.

But there are many ways.

Each library has nuances, and may require attention here or there, which we already do. So this seems to be very much business as usual.

So calling them "C library headers" is misleading.

It may be misleading, but C libraries tend to provide many of them, and use them themselves.

Sure.

And non-POSIX C libraries? Probably not.

The point is that C and POSIX are not synonymous. Related, yes. There was a time when pthreads and rt were both separate libraries too (although some have moved away from that approach).

These are POSIX headers for the most part.

The POSIX (2017 version) includes all 25 C99 headers, of which it expands these:

I don't foresee these being problematic.

The big difference is that POSIX is not an API for the OS to use whereas ISO C is[..]
On the contrary, it's pretty easy to see the POSIX (C language) API as one layer (with many features) that sits above the operating system.

The POSIX standard extends the C library with things as low level as, for ex.,
a definition of LONG_BIT (number of bits in a long) in limits.h,
or requiring macros for the constants for e or pi from math.h.
It also extends it with other lowly things like ssize_t in stdio.h (and SSIZE_MAX in limits.h)

Yes.. constants.

or "bare metal" functions like strnlen() in string.h,
which could easily be considered below the OS.

Yes, although there is an Option Group for the extension (POSIX_C_LIB_EXT) which is already supported in Zephyr. I suggest we use it.

size_t is ISO C (not POSIX).

I mentioned size_t because the posix standard requires sys/types.h to provide it.
Even though you can also expect it from many other plain C standard headers as stddef.h
So one could expect that, if you are going to provide your own sys/types.h you would
need to rely on its C library definition or at least ensure you are aligned with it.

Yes. That is correct.

All of the above is mostly independent of the issue that this bug addresses though.

It's better to not distract from the actual topic.

@aescolar
Copy link
Member

aescolar commented Jul 12, 2024

[cfriedt:] The whole problem here is that these types must be defined by the POSIX layer..

[aescolar:] I think it would be really nice to start by having a common understanding of what we expect defined in the C library and what defined/redefined/overriden by different parts of Zephyr's POSIX API

[cfriedt:] Most of the types that must be defined by the POSIX layer are defined in terms of primitives provided by the OS[..]
If Zephyr communicates those primitives to picolibc[..]

[aescolar:] And how do you plan to do that, given that we support other C libraries, including C libraries provided by proprietary toolchains?

[cfriedt:] The most obvious way is how things have been done for the last 30 years or so.

@cfriedt I ask this because it is very relevant for this hole topic. And we need a much more precise answer than that to understand what is the best way forward.

  • Do you intend that the {Zephyr's POSIX compatibility layer} provides its own sys/types.h? overriding the C library one?
    If so:
  • How do you intend to ensure ssize_t and off_t are ABI aligned with the C library defined ones? (and not just with picolibc's)
  • When the {Zephyr's POSIX compatibility layer} is "off", the application (or any other C file) will still be able to include the C library sys/types.h(?) or is this something you intend to prevent? or do you intend to always get the {Zephyr's POSIX compatibility layer} one in the way?

@cfriedt
Copy link
Member Author

cfriedt commented Jul 12, 2024

I ask this because it is very relevant for this hole topic.

@aescolar
I disagree - it is characteristically irrelevant, as evident by the linked PR that is passing all tests.

This issue is fixing a 6-year old bug that is due to not using native zephyr types within the core OS and instead relying on types that might be defined at a higher layer (a dependency cycle).

The simple solution is breaking the dependency cycle and defining native Zephyr types to solve the use case (the cleaner architecture introduced in the linked PR). This was also suggested by @keith-packard.

And we need a much more precise answer than that to understand what is the best way forward.

The way forward is in the linked PR.

Here, I think you are using the same stalling tactics that you used in #67132, which is effectively bullying.

Sorry. I'm not wasting another 3 months of my life / peace of mind having the same kind of circular, irrelevant conversations with you.

If you do not want this (clean) architecture change in LTS, just say so instead of pretending that this issue is something else.

Simply because you are here defending bad architectural decisions (which I fully predicted would happen), I think it should be evident now to Zephyr users why this technical debt is still present, 6 years later, and why it will not be fixed as part of your LTS release.

Maybe if you have legitimate concerns with the linked PR, comment there?

@aescolar
Copy link
Member

@cfriedt , it is very sad that you perceive purely technical discussions, in which others are asking about the architectural choices or trying to provide you information, as attacks or bulling.
This last PR you linked was rejected simply because it was very wrong from a technical standpoint.
The discussions in that PR were long and circular because you rejected to accept you were wrong, and kept repeating the same incorrect statements. Others having the patience to try to point your mistakes or providing you pointers to the correct information is just an attempt to help; and not, as you seem to perceive it, an attack.
On the other hand you also seem to not realize that some of us just keep ignoring your attacks, provocations and baseless accusations.
But even if we do so, these are not only not helpful to anybody but actually quite harmful for the project.
So I ask you to please stop behaving this way.

@aescolar
Copy link
Member

aescolar commented Jul 12, 2024

it is characteristically irrelevant [..] that might be defined

The linked PR is doing many changes, including changes to stable APIs, and depending on what is planned after, those changes may help little.
Also, given the magnitude of the changes, and what you pointed at from this issue description I think it is reasonable to clarify things, specially when there may be missunderstandings driving the choices.

In the issue description you claimed that this ssize_t / sys/types.h was the cause for some of the regressions introduced by #73978
I clarified that this was not the case in #75513 (comment)
The issue was related to the {Zephyr's POSIX compatibility layer} overriding C library headers, and doing it without proper care.

Nevertheless you had proposed a fix in 468003b , which I indeed rejected because it would create a worse problem by redefining this ssize_t and off_t types in fdtable.h (a Zephyr header), in a manner that would likely be incompatible with the C library definition, causing either build erros or ABI breakages.

Similarly you have linked to an old issue #10436
which, unlike what you describe, relates to a very old state in which ssize_t was being incorrectly defined in a similar place as where you were trying to define it in that rejected PR.

All that and other comments from you led me to understand you may want to follow up that linked PR with others overriding a possible C library sys/types.h from the {Zephyr's POSIX compatibility layer}. Given the effect of such a change, its complexity, likely drawbacks, and relationship with the current proposed PR. It did seem logical to ask what are your plans in that regard.

Moreover you keep repeating as a mantra that we have "layer violations". Yet you seem to ignore that, by its very design, the POSIX extensions to the C library are deepely intermingled with the C library.
If one were to see either of the C library or the POSIX extensions as just a layer each, it would be on its own a "layer violation".
Similarly if one were to see the C library itself as just one component, one could claim it is also by sitting both below and above the kernel (say basic types vs malloc) committing layer violations we should fix.

In that light, and given the state of the {Zephyr's POSIX compatibility layer}, and other possible future claims of "layer violations", I also considered it logical to ask what other headers, APIs or functions you consider should or should not be provided by the C library or the {Zephyr's POSIX compatibility layer}.

@aescolar
Copy link
Member

aescolar commented Jul 12, 2024

Ignoring all above,
regarding the use of ssize_t, off_t, and sys/types.h in Zephyr:
I acknowledge that that header and types are not part of the ISO C standard.
That is, Zephyr today requires a C library which provides a non ISO C header, sys/types.h, where those 2 types are defined.
This requirement means that if a user has a C library which has no or so little support for the POSIX extensions as to not have that, that user would need to add on his own the sys/types.h header with those 2 definitions.
This is a relatively similar situation as with the strnlen() requirement set by Zephyr, with the difference that we are not documenting this anywhere.
I see two possible approaches in that regard:

  1. A change akin to your PR, in which we remove all uses of that header and types, changing a lot of stable APIs, and forbid the use of those types. The benefit being that we do not set a requirement of the C library or its integration providing that header and types.

  2. A more pragmatic approach, where, like for strnlen(), we document that C libraries to be used with Zephyr or their integration need to provide sys/types.h incuding those 2 types definitions. The benefit in this case is that we do not need to change any API or other Zephyr code.

I simply present possible options for a problem. How we should proceed would be up to the TSC.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 12, 2024

@cfriedt , it is very sad that you perceive purely technical discussions, in which others are asking about the architectural choices or trying to provide you information, as attacks or bulling

I'm sorry that you find it sad. Fortunately, I'm not the only one who perceives your rhetoric this way.

This last PR you linked was rejected simply because it was very wrong from a technical standpoint. The discussions in that PR were long and circular because you rejected to accept you were wrong

I'm not sure how pointing directly to the sections in the POSIX specification supporting my position could potentially be misconstrued as wrong. The PR was also approved several times?

No... it was bullying. By you and a few others.

I would be happy to quote people who have said as much.

From my perspective, you typically find any possible solution that supports your thesis to be confirmation that your thesis is correct, which is a logical fallacy. This happens over, and over, and over again.

[[ This section was deleted on behalf of the Code of Conduct investigation team ]]

There certainly is more than 1 way to fix whatever issue that was not detected by CI, but at the root of the problem are all of the endless hacks to do with native_sim and native_posix over the years.

It's extremely simple to demonstrate that, and I have done so using two separate approaches.

I do not doubt that you may have found some other issue that was incidentally tripped due to lack of test coverage in CI. It's really not the point. The point is that there is a layering violation.

Previously, you had criticized me for not replacing short links fast enough, saying "don't make others clean up after your mess". Even though I stated I had priorities and changing the links was not the highest. I eventually fixed the links - when I had time. The irony is, that many of the Zephyr maintainers are constantly cleaning up your mess.

I've come to recognize that practically anything I do in the Zephyr project results in you criticizing me, personally, rather than the technical details of the work.

I had hoped to avoid this ugliness in a public forum. It's exhausting dealing with you.

The solution provided in #75348 fixes the root of the problem, so that the Zephyr project incurs less technical debt in the long run.

Others having the patience to try to point your mistakes or providing you pointers to the correct information is just an attempt to help; and not, as you seem to perceive it, an attack.

Yes, throw more shade at me. Thanks... thanks for this.

On the other hand you also seem to not realize that some of us just keep ignoring your attacks, provocations and baseless accusations.

I would love to ignore yours as well.

But even if we do so, these are not only not helpful to anybody but actually quite harmful for the project. So I ask you to please stop behaving this way.

I will say the same thing to you, @aescolar.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 12, 2024

Ignoring all above, regarding the use of ssize_t, off_t, and sys/types.h in Zephyr: I acknowledge that that header and types are not part of the ISO C standard.

OK. I'm happy that you acknowledge one of two standards in play here?

That is, Zephyr today requires a C library which provides a non ISO C header, sys/types.h

All of the evidence available says that you are wrong. Zephyr in no way requires POSIX types to be used below the POSIX layer in the stack. POSIX code requires sys/types.h.

As evidence, observe that #75348 is passing for all test cases.

As corollary, observe that

  1. using a feature that is not selected is generally bad practice (i.e. an antipattern)
  2. a lower layer in the stack depending on types from a layer higher in the stack creates a dependency loop (i.e. is an antipattern)

As another very strong corollary, observe that most (all?) other open source operating systems have contributed compatibility layers to Newlib, Picolibc, and other C libraries with POSIX API implementations, so that POSIX types could be defined using the primitives of the Operating System.

This would be natural dependency ordering, and does not result in dependency cycles.

Please provide evidence (not just your opinion, or personal attacks on me) that justifies your position that the linked PR is incorrect.

I would accept a non-contrived failing test as evidence to your counter-argument and invited you to find one last week, which you have failed to do.

This requirement ...

Including <sys/types.h> and using POSIX types in Zephyr's core is fortunately not a requirement. That would be baking in an antipattern into Zephyr, which would very much be a mistake.

It sounds a lot more like someone defending their bad architecture; a hack that was maybe convenient to ship a feature that was forgotten about and buried.

Water under the bridge. Let's fix it and move on.

means that if a user ...

This is FUD.

  1. A change akin to your PR, in which we remove all uses of that header and types, changing a lot of stable APIs, and forbid the use of those types. The benefit being that we do not set a requirement of the C library or its integration providing that header and types.

I think some variant of this is probably safe - except obviously, the header and types should be available for POSIX code.

This highlights one of the down-sides of Newlib and Picolibc. The fact that many of the POSIX base definitions (including off_t and ssize_t, but also many other function declarations, etc) are not guarded by _POSIX_C_SOURCE. It almost seems like an implicit assumption of Newlib is that it is always used on POSIX or POSIX-like systems, which may explain some of the confusion.

  1. A more pragmatic approach, where, like for strnlen(), we document that C libraries to be used with Zephyr or their integration need to provide sys/types.h incuding those 2 types definitions. The benefit in this case is that we do not need to change any API or other Zephyr code.

This would be introducing more technical debt, so I am against Option 2 (a.k.a. sweeping more dirt under the carpet).

For strnlen(), we should really just use the POSIX_C_LIB_EXT Option Group (which corresponds to CONFIG_POSIX_C_LIB_EXT.

I simply present possible options for a problem. How we should proceed would be up to the TSC.

Sure.

@kartben
Copy link
Collaborator

kartben commented Jul 12, 2024

No... it was bullying. By you and a few others.

At the risk of stating the obvious, we will not tolerate bullying and this would very much be a violation of our Code of Conduct.

I would encourage you to reach out to conduct@zephyrproject.org to report these incidents so that they can be investigated and appropriate action can be taken.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 15, 2024

No... it was bullying. By you and a few others.

At the risk of stating the obvious, we will not tolerate bullying and this would very much be a violation of our Code of Conduct.

I would encourage you to reach out to conduct@zephyrproject.org to report these incidents so that they can be investigated and appropriate action can be taken.

I did that. I think you, Alberto & I were going to have a call together, but a call was never scheduled. It might have helped the situation though.

@nashif
Copy link
Member

nashif commented Jul 16, 2024

No... it was bullying. By you and a few others.

There is a very serious allegation of mobbing here. This goes beyond any dispute or conflict between 2 individuals.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 16, 2024

Picolibc isn't built against the Zephyr definitions, so having Zephyr define these sizes will make for ABI fun as the Zephyr code will use Zephyr definitions and the Picolibc code will use Picolibc definitions.

@keith-packard , @henrikbrixandersen - <zephyr/types.h> is included just about everywhere in Zephyr, at least indirectly, and I think it could be used to do what <linux/types.h> does too.

Brix's proposal is effectively 1 change on top of the first commit in #75348, and would alleviate the need for the majority of the second commit.

It's not a solution per se, but it works for Linux (and seems to be a workaround for a historical bug of a similar nature). It would need additional conditions set up in case one library or another uses a different macro style to guard type definitions. There are only a finite number of libc's, and certainly those who use a proprietary libc can either add their macro to Zephyr or simply define it via some compatibility header.

Would it be possible to adopt Linux's solution and something like what is below to guarantee that the types match that of the libc (if provided)?

#ifdef _POSIX_C_SOURCE
BUILD_ASSERT(sizeof(ssize_t) == sizeof(k_ssize_t));
BUILD_ASSERT(sizeof(off_t) == sizeof(k_off_t));
BUILD_ASSERT(alignof(ssize_t) == alignof(k_ssize_t));
BUILD_ASSERT(alignof(off_t) == alignof(k_off_t));
BUILD_ASSERT(((ssize_t)-1) == ((k_ssize_t)-1));
BUILD_ASSERT(((off_t)-1) == ((k_off_t)-1));
#endif

It's important to note

  • <linux/types.h> goes on to unconditionally define POSIX types such as blkcnt_t (I think it's just the one type, but there may be more)
  • <sys/types.h> from picolibc (and newlib) defines POSIX types without checking _POSIX_C_SOURCE

Of course, Linux is always "a POSIX OS".

And, AFAIK, the assumption in Picolibc and Newlib are, that if a POSIX header is included, the application intends to use a POSIX API (similarly, why many functions in the base definitions are not guarded with the application conformance macro). AFAIK, there are not many cases of Newlib being used outside of a POSIX (or BSD, or GNU) environment. I probably would add _POSIX_C_SOURCE guard around base types and definitions, if I were to write a "new libc" from scratch.

In theory, Zephyr is not that far off from being "a POSIX OS", but we still would like to disable POSIX by default.

Rather than changing every API signature, as is done in the second commit in #75348, I think a fair compromise would be to remove direct inclusion of <sys/types.h> in core parts of Zephyr, and instead rely on <zephyr/types.h>. Technically speaking, this is a workaround for the dependency cycle.

If that is the approach that we take (a workaround that we commit to, effectively forever) we should agree to minimize cross-pollination of POSIX and Zephyr types and inclusion of POSIX headers "below the line" (native platform aside), since it does create a dependency cycle, and that sort of thing can cause nasty and sometimes not-easily-predictable results. There still are dependency cycles in Zephyr (e.g. CONFIG_NET_SOCKETS_POSIX_NAMES is deprecated - I would love to remove it).

Of course, we could use #75348 as-is, add the same build assertions above, and avoid the historical workaround that Linux needed to make, and that would technically still be a far cleaner solution than baking either a workaround or a dependency cycle into Zephyr's API. Zephyr would have a more separable interface from POSIX in that case, which is A Good Thing, IMHO.

@nashif
Copy link
Member

nashif commented Jul 16, 2024

Brix's proposal is effectively 1 change on top of the first commit in #75348, and would alleviate the need for the majority of the second commit.

@cfriedt can you provide a quick draft of how this would look like so we can have something to look at later during review? If not, you can speak to it as well, but code always helps (does not need to pass....)

@nashif nashif added this to the v4.0.0 milestone Jul 16, 2024
@cfriedt cfriedt removed the RFC Request For Comments: want input from the community label Sep 1, 2024
@cfriedt
Copy link
Member Author

cfriedt commented Sep 1, 2024

Created a separate RFC issue

#77856

@cfriedt
Copy link
Member Author

cfriedt commented Sep 3, 2024

can you provide a quick draft of how this would look like so we can have something to look at later during review? If not, you can speak to it as well, but code always helps (does not need to pass....)

@nashif - the "quick" draft was #75348 (although it's hard to call it quick, since it took several days of work). With #77856 I would like to take a more pragmatic approach, with PRs made in small, more reviewable parts, and a fully bisectable git history.

@mmahadevan108
Copy link
Collaborator

Lowering the priority to medium based on discussion in the TSC meeting on Nov 13th.

@mmahadevan108 mmahadevan108 added priority: medium Medium impact/importance bug and removed priority: high High impact/importance bug labels Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Base OS Base OS Library (lib/os) area: Bluetooth area: C Library C Standard Library area: C++ area: File System area: Modem Drivers area: native port Host native arch port (native_sim) area: Networking area: POSIX POSIX API Library bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

7 participants