-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base os: remove longstanding layering violation #75513
Comments
@cfriedt can you please replace the bit.ly shortened URLs with direct links to the resources (e.g. the respective pull requests/issues/Gists)? |
@cfriedt base os is a collection of many things, maintained by many folks, so I do not think I am the right assignee. I consider fstable is part of the posix subsystem. Probably we should change that in the MAINTAINER file. |
@nashif - you are the maintainer of the Base OS. Fstable is not at all a part of POSIX, as other parts of Zephyr use it independently of POSIX. But that certainly has historically not stopped excessive cross-pollination of APIs. Don't worry, I've already spent some time cleaning up this mess and will be making a PR shortly. |
@kartben - feel free to do so yourself. You have edit permissions, correct? Editing the bug report is not a priority for me. Fixing the bug is (as it is High Priority). |
as it was introduced by the previous posix maintainer |
what original commit? @aescolar is this needed for native_posix/native_sim? |
@nashif It's a bit difficult to guess what a long gone maintainer meant in a 6 year old FIXME comment which may or may not apply. What I can see is that that comment mentions off_t which is not used in that header. It also mentions ssize_t, which would indicate if anything that it was needed to get that type for some C library/ies. And yes, for POSIX compliant C libraries, you can get ssize_t from sys/types.h, but for glibc or pico also from stdio.h or a bunch of other headers. @cfriedt is the issue here that you would like that ssize_t is not used in most of Zephyr because it is not a ISO C standard type, but defined in the POSIX spec? As mentioned before, the current state of the code not being directly compatible with adding a new feature or supporting some refactoring is not a bug. It is just normal development. In any case, and as also mentioned before, you should not see all that is defined in the POSIX specs as a single entity or a single layer. Some parts are simple extensions to the ISO C library types, some other simple utility functions which extend those provided by the ISO standard, some other parts specify APIs an OS should provide for different types of high level functionality, and much more. |
@cfriedt if you have any interest in anybody using those links, please replace them yourself. Shortened links may be malicious, and you cannot expect others to click on them. It is not reasonable to expect others to clean up after you. |
As mentioned before this seems better suited for an RFC so I convert the issue. |
Right, so why can't we use those with the Zephyr regardless of POSIX being enabled or not? |
Hrm. How much of a kludge are you interested in here? We could define
You'd need stanzas for other POSIX-compatible C libraries and a stanza for configurations using the Zephyr POSIX layer. |
A few reasons:
But also,
The thing that is a major red flag for me here, is that you are defending the architecture with (objectively negative) traits 1 and 2 (antipatterns), and actively advocating against the (much cleaner) architecture with naturally ordered dependencies. Do you care to elaborate more on your perspective? |
I think it would be really nice to start by having a common understanding of what we expect defined in the C library and what defined/redefined/overriden by different parts of Zephyr's POSIX API compatibility components, or the C library integration. Just an overall indication of the aim, and in detail for this types in question would help.
Indeed. Anyhow, I ask this because I do not know what is the long term plan here. We can stop using Note The POSIX standard specifies many components from different layers. Between those a superset of the C library.
In any case, I think it would also help if we stopped claiming we are fixing a "layer violation" from a component, while that component is by design / or the long term goal is to override C library headers, or using the C library internal definitions or guards. This is spaghetti architecture. So please let's focus on the actual technical issues and proposals.
@keith-packard are you sure about this? $ mkdir build && cd build
$ cmake -GNinja -DBOARD=qemu_x86 ../samples/hello_world/ -DCONFIG_PICOLIBC=y && ninja
# And now I just retrigger a preprocessor only pass on samples/hello_world/main.c (with or without -D__ZEPHYR__=1 to avoid _ZEPHYR_SOURCE)
$ /opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/bin/x86_64-zephyr-elf-gcc -DKERNEL -DK_HEAP_MEM_POOL_SIZE=0 -DPICOLIBC_LONG_LONG_PRINTF_SCANF -D__LINUX_ERRNO_EXTENSIONS__ -Izephyr/build/zephyr/include/generated/zephyr -Izephyr/include -Izephyr/build/zephyr/include/generated -Izephyr/soc/intel/atom -Izephyr/soc/intel/atom/. -isystem zephyr/lib/libc/common/include -m32 -fno-strict-aliasing -Os -imacros zephyr/build/zephyr/include/generated/zephyr/autoconf.h -fno-printf-return-value -fno-common -g -gdwarf-4 -fdiagnostics-color=always -m32 -msoft-float -Wa,--divide --sysroot=/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/x86_64-zephyr-elf -imacros zephyr/include/zephyr/toolchain/zephyr_stdint.h -Wall -Wformat -Wformat-security -Wno-format-zero-length -Wdouble-promotion -Wno-pointer-sign -Wpointer-arith -Wexpansion-to-defined -Wno-unused-but-set-variable -Werror=implicit-int -fno-pic -fno-pie -fno-asynchronous-unwind-tables -ftls-model=local-exec -fno-reorder-functions --param=min-pagesize=0 -fno-defer-pop -fmacro-prefix-map=zephyr/samples/hello_world=CMAKE_SOURCE_DIR -fmacro-prefix-map=zephyr=ZEPHYR_BASE -fmacro-prefix-map=zephyrproject=WEST_TOPDIR -ffunction-sections -fdata-sections -mpreferred-stack-boundary=2 -mno-mmx -mno-sse --specs=picolibc.specs -std=c99 -MD -MT CMakeFiles/app.dir/src/main.c.obj -MF CMakeFiles/app.dir/src/main.c.obj.d -o CMakeFiles/app.dir/src/main.c.obj -c zephyr/samples/hello_world/src/main.c -E -dI -dU
# Note the extra -E -dI -dU compared to the plain build command
$ grep -C 3 ssize_t ./CMakeFiles/app.dir/src/main.c.obj #undef __machine_size_t_defined
# 182 "/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/picolibc/include/sys/_types.h" 3 4
typedef unsigned int __size_t;
#undef __machine_ssize_t_defined
# 198 "/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/picolibc/include/sys/_types.h" 3 4
typedef signed int _ssize_t;
#define unsigned signed
# 209 "/opt/zephyr-sdk/zephyr-sdk-0.16.8/x86_64-zephyr-elf/picolibc/include/sys/_types.h" 3 4
typedef _ssize_t __ssize_t;
...
typedef _ssize_t ssize_t;
...
ssize_t getline(char **restrict lineptr, size_t *restrict n, FILE *restrict stream);
ssize_t getdelim(char **restrict lineptr, size_t *restrict n, int delim, FILE *restrict stream); Note |
Most of the types that must be defined by the POSIX layer are defined in terms of primitives provided by the OS. Zephyr should really not be an outlier here. If Zephyr communicates those primitives to picolibc in a manner that is directly compatible with picolibc (and newlib), then there should be no problems for anyone.
Ideally picolibc would be built against the Zephyr definitions like most other operating systems already do for most C libraries. That's kind of the natural dependency ordering. Forcing the opposite introduces a lot of unnecessary complexity.
Removing dependency cycles / layering violations has been part of the long term plan for a couple of years already.
"C library headers" is a slight misnomer - most of these headers are not part of ISO C and certainly a C library can be conformant without them. So calling them "C library headers" is misleading. These are POSIX headers for the most part.
Yes, unfortunately, POSIX chose to piggyback on some existing ISO C headers, so there are extensions to ISO C (at least one or two Option Groups). Planned stabilization fixes include consolidating e.g. posix/time.h and posix/signal.h with their libc counterparts. String.h is not really an issue in Zephyr because it was never "forked" in POSIX. This should reduce complexity significantly.
The big difference is that POSIX is not an API for the OS to use whereas ISO C is (at least the parts that do not involve threading or I/O).
On the contrary, it's pretty easy to see the POSIX (C language) API as one layer (with many features) that sits above the operating system. It is, by definition, the Portable Operating System Interface.
It's just a POSIX header. It's only annoying when it's being (ab)used in ways that were never intended.
POSIX functions. It would be good to stop burring the lines. size_t is ISO C (not POSIX). Like many other types (bool, int32_t, etc) POSIX simply requires that it is provided by the C library.
Newlib (and by extension picolibc) were originally designed for POSIX runtimes (GNU), so although
I would stop pointing out the layering violation if it weren't broken. So maybe fixing that is a good idea?
Spaghetti architecture is what we have today in Zephyr as a result of many years of tech debt. Perhaps understanding what is part of POSIX and what is part of ISO C several years ago would have changed things. I would be happy to continue cleaning it up though.
I would be happy to continue focusing on technical details as well. |
And how do you plan to do that, given that we support other C libraries, including C libraries provided by proprietary toolchains?
It may be misleading, but C libraries tend to provide many of them, and use them themselves.
The POSIX (2017 version) includes all 25 C99 headers, of which it expands these:
The POSIX standard extends the C library with things as low level as, for ex., It also extends it with other lowly things like ssize_t in stdio.h (and SSIZE_MAX in limits.h)
I mentioned size_t because the posix standard requires |
The most obvious way is how things have been done for the last 30 years or so. But there are many ways. Each library has nuances, and may require attention here or there, which we already do. So this seems to be very much business as usual.
Sure. And non-POSIX C libraries? Probably not. The point is that C and POSIX are not synonymous. Related, yes. There was a time when pthreads and rt were both separate libraries too (although some have moved away from that approach).
I don't foresee these being problematic.
Yes.. constants.
Yes, although there is an Option Group for the extension (
Yes. That is correct. All of the above is mostly independent of the issue that this bug addresses though. It's better to not distract from the actual topic. |
@cfriedt I ask this because it is very relevant for this hole topic. And we need a much more precise answer than that to understand what is the best way forward.
|
@aescolar This issue is fixing a 6-year old bug that is due to not using native zephyr types within the core OS and instead relying on types that might be defined at a higher layer (a dependency cycle). The simple solution is breaking the dependency cycle and defining native Zephyr types to solve the use case (the cleaner architecture introduced in the linked PR). This was also suggested by @keith-packard.
The way forward is in the linked PR. Here, I think you are using the same stalling tactics that you used in #67132, which is effectively bullying. Sorry. I'm not wasting another 3 months of my life / peace of mind having the same kind of circular, irrelevant conversations with you. If you do not want this (clean) architecture change in LTS, just say so instead of pretending that this issue is something else. Simply because you are here defending bad architectural decisions (which I fully predicted would happen), I think it should be evident now to Zephyr users why this technical debt is still present, 6 years later, and why it will not be fixed as part of your LTS release. Maybe if you have legitimate concerns with the linked PR, comment there? |
@cfriedt , it is very sad that you perceive purely technical discussions, in which others are asking about the architectural choices or trying to provide you information, as attacks or bulling. |
The linked PR is doing many changes, including changes to stable APIs, and depending on what is planned after, those changes may help little. In the issue description you claimed that this ssize_t / sys/types.h was the cause for some of the regressions introduced by #73978 Nevertheless you had proposed a fix in 468003b , which I indeed rejected because it would create a worse problem by redefining this ssize_t and off_t types in fdtable.h (a Zephyr header), in a manner that would likely be incompatible with the C library definition, causing either build erros or ABI breakages. Similarly you have linked to an old issue #10436 All that and other comments from you led me to understand you may want to follow up that linked PR with others overriding a possible C library sys/types.h from the {Zephyr's POSIX compatibility layer}. Given the effect of such a change, its complexity, likely drawbacks, and relationship with the current proposed PR. It did seem logical to ask what are your plans in that regard. Moreover you keep repeating as a mantra that we have "layer violations". Yet you seem to ignore that, by its very design, the POSIX extensions to the C library are deepely intermingled with the C library. In that light, and given the state of the {Zephyr's POSIX compatibility layer}, and other possible future claims of "layer violations", I also considered it logical to ask what other headers, APIs or functions you consider should or should not be provided by the C library or the {Zephyr's POSIX compatibility layer}. |
Ignoring all above,
I simply present possible options for a problem. How we should proceed would be up to the TSC. |
I'm sorry that you find it sad. Fortunately, I'm not the only one who perceives your rhetoric this way.
I'm not sure how pointing directly to the sections in the POSIX specification supporting my position could potentially be misconstrued as wrong. The PR was also approved several times? No... it was bullying. By you and a few others. I would be happy to quote people who have said as much. From my perspective, you typically find any possible solution that supports your thesis to be confirmation that your thesis is correct, which is a logical fallacy. This happens over, and over, and over again. [[ This section was deleted on behalf of the Code of Conduct investigation team ]] There certainly is more than 1 way to fix whatever issue that was not detected by CI, but at the root of the problem are all of the endless hacks to do with native_sim and native_posix over the years. It's extremely simple to demonstrate that, and I have done so using two separate approaches. I do not doubt that you may have found some other issue that was incidentally tripped due to lack of test coverage in CI. It's really not the point. The point is that there is a layering violation. Previously, you had criticized me for not replacing short links fast enough, saying "don't make others clean up after your mess". Even though I stated I had priorities and changing the links was not the highest. I eventually fixed the links - when I had time. The irony is, that many of the Zephyr maintainers are constantly cleaning up your mess. I've come to recognize that practically anything I do in the Zephyr project results in you criticizing me, personally, rather than the technical details of the work. I had hoped to avoid this ugliness in a public forum. It's exhausting dealing with you. The solution provided in #75348 fixes the root of the problem, so that the Zephyr project incurs less technical debt in the long run.
Yes, throw more shade at me. Thanks... thanks for this.
I would love to ignore yours as well.
I will say the same thing to you, @aescolar. |
OK. I'm happy that you acknowledge one of two standards in play here?
All of the evidence available says that you are wrong. Zephyr in no way requires POSIX types to be used below the POSIX layer in the stack. POSIX code requires As evidence, observe that #75348 is passing for all test cases. As corollary, observe that
As another very strong corollary, observe that most (all?) other open source operating systems have contributed compatibility layers to Newlib, Picolibc, and other C libraries with POSIX API implementations, so that POSIX types could be defined using the primitives of the Operating System. This would be natural dependency ordering, and does not result in dependency cycles. Please provide evidence (not just your opinion, or personal attacks on me) that justifies your position that the linked PR is incorrect. I would accept a non-contrived failing test as evidence to your counter-argument and invited you to find one last week, which you have failed to do.
Including It sounds a lot more like someone defending their bad architecture; a hack that was maybe convenient to ship a feature that was forgotten about and buried. Water under the bridge. Let's fix it and move on.
This is FUD.
I think some variant of this is probably safe - except obviously, the header and types should be available for POSIX code. This highlights one of the down-sides of Newlib and Picolibc. The fact that many of the POSIX base definitions (including
This would be introducing more technical debt, so I am against Option 2 (a.k.a. sweeping more dirt under the carpet). For
Sure. |
At the risk of stating the obvious, we will not tolerate bullying and this would very much be a violation of our Code of Conduct. I would encourage you to reach out to conduct@zephyrproject.org to report these incidents so that they can be investigated and appropriate action can be taken. |
I did that. I think you, Alberto & I were going to have a call together, but a call was never scheduled. It might have helped the situation though. |
There is a very serious allegation of mobbing here. This goes beyond any dispute or conflict between 2 individuals. |
@keith-packard , @henrikbrixandersen - Brix's proposal is effectively 1 change on top of the first commit in #75348, and would alleviate the need for the majority of the second commit. It's not a solution per se, but it works for Linux (and seems to be a workaround for a historical bug of a similar nature). It would need additional conditions set up in case one library or another uses a different macro style to guard type definitions. There are only a finite number of libc's, and certainly those who use a proprietary libc can either add their macro to Zephyr or simply define it via some compatibility header. Would it be possible to adopt Linux's solution and something like what is below to guarantee that the types match that of the libc (if provided)? #ifdef _POSIX_C_SOURCE
BUILD_ASSERT(sizeof(ssize_t) == sizeof(k_ssize_t));
BUILD_ASSERT(sizeof(off_t) == sizeof(k_off_t));
BUILD_ASSERT(alignof(ssize_t) == alignof(k_ssize_t));
BUILD_ASSERT(alignof(off_t) == alignof(k_off_t));
BUILD_ASSERT(((ssize_t)-1) == ((k_ssize_t)-1));
BUILD_ASSERT(((off_t)-1) == ((k_off_t)-1));
#endif It's important to note
Of course, Linux is always "a POSIX OS". And, AFAIK, the assumption in Picolibc and Newlib are, that if a POSIX header is included, the application intends to use a POSIX API (similarly, why many functions in the base definitions are not guarded with the application conformance macro). AFAIK, there are not many cases of Newlib being used outside of a POSIX (or BSD, or GNU) environment. I probably would add In theory, Zephyr is not that far off from being "a POSIX OS", but we still would like to disable POSIX by default. Rather than changing every API signature, as is done in the second commit in #75348, I think a fair compromise would be to remove direct inclusion of If that is the approach that we take (a workaround that we commit to, effectively forever) we should agree to minimize cross-pollination of POSIX and Zephyr types and inclusion of POSIX headers "below the line" (native platform aside), since it does create a dependency cycle, and that sort of thing can cause nasty and sometimes not-easily-predictable results. There still are dependency cycles in Zephyr (e.g. Of course, we could use #75348 as-is, add the same build assertions above, and avoid the historical workaround that Linux needed to make, and that would technically still be a far cleaner solution than baking either a workaround or a dependency cycle into Zephyr's API. Zephyr would have a more separable interface from POSIX in that case, which is A Good Thing, IMHO. |
@cfriedt can you provide a quick draft of how this would look like so we can have something to look at later during review? If not, you can speak to it as well, but code always helps (does not need to pass....) |
Created a separate RFC issue |
@nashif - the "quick" draft was #75348 (although it's hard to call it quick, since it took several days of work). With #77856 I would like to take a more pragmatic approach, with PRs made in small, more reviewable parts, and a fully bisectable git history. |
Lowering the priority to medium based on discussion in the TSC meeting on Nov 13th. |
Describe the bug
In the
Base OS
,fdtable.h
introduced a layering violation (in the first (?) commit) in which the POSIX header<sys/types.h>
is included and the POSIX typessize_t
is used in a way that can be described as "below the line" (the line being where the POSIX API lives in the software stack). Such a layering violation creates a dependency cycle at the API level. As noted in the original commit, the wrong types seem to be added to satisfy some need of thenative_posix
platform.As part of the base OS, the layering violation spread throughout the Zephyr project into the following areas (and likely more):
Aside from extensive manual testing with
twister
and CI not catching a strange set of bugs, this layering violation was also the root cause for #75205 (as demonstrated by #75348 archived here with testing details here).Please also mention any information which could help others to understand
the problem you're facing:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Developers and reviewers adding code which does not introduce dependency cycles and layering violations.
Impact
This bug caused planned features to be removed from the v3.7.0 release. If it is not fixed, it will mean that the next LTS release cycle will still be quite full of these layering violations and will prevent further bugfixes from being backported to all of the aforementioned areas.
Logs and console output
Environment (please complete the following information):
Additional context
The simpler workaround was deemed to simple. So maintainers asked for the more complicated fix. There is a PR linked to demonstrate the more complicated fix.
This bug is blocking the following (TSC-approved) features from being re-added to v3.7.0.
POSIX_FD_MGMT
ftello()
#74100fseeko()
#74099dup2()
#74098dup()
#74097POSIX_DEVICE_IO
fdopen()
#66932fileno()
#66938pread()
#66946pselect()
#66947pwrite()
#66948This bug is blocking other architectural fixes that have plagued Zephyr for multiple (actually all) LTS release cycles already
This bug is also blocking numerous the stability improvements mentioned here.
The text was updated successfully, but these errors were encountered: