Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core os: migrate away from posix types (ssize_t, off_t) below the posix layer #77856

Open
48 tasks
cfriedt opened this issue Sep 1, 2024 · 6 comments
Open
48 tasks
Assignees
Labels
Architecture Review Discussion in the Architecture WG required RFC Request For Comments: want input from the community

Comments

@cfriedt
Copy link
Member

cfriedt commented Sep 1, 2024

Introduction

This RFC addresses usage of POSIX types within Zephyr. We outline when POSIX types were introduced originally, problems that have arisen as a result, as well as possible courses of action.

This RFC was created to provide some rational to PR #75348, which received 7 approvals within just a few days of being posted.

Small Historical Digression

The ssize_t type limits the size of individual I/O transactions while also being able to signal an error condition (via -1). It must at least be able to provide values in the range [-1, 2^15 - 1] (or [-1, 32767] (See limits.h). In Zephyr, ssize_t is also be used to signal errors via negative errno return values. A 32-bit value here is fine, since individual I/O transaction sizes rarely exceed 2^31. There is typically no added cost to using 64-bits for this type on 64-bit systems, so ssize_t typically matches the word size of the machine.

Meanwhile, the off_t type is used to describe extent sizes but also absolute and relative positions within extents. Extent sizes as well as absolute or relative positions within extents should normally be at least as large as that of an I/O transaction. However, if extent sizes are only as large as an I/O transaction on 32-bit systems, the extent sizes can be considered severely limited (we would not be able to describe SD cards greater than 2 GiB). One might naively assume that only storage devices of a few hundred kiB or MiB might be connected to a particular system, but when one considers network resources, and in particular, networked filesystems, or for example a real-time device that processes video, then it begins to make significantly more sense to use e.g. a 64-bit type for off_t, even on 32-bit systems.

POSIX formalized these particular scenarios in unistd.h with

32-bit systems:
_POSIX_V7_ILP32_OFF32 (32-bit int, long, off_t, pointer) is conservative
_POSIX_V7_ILP32_OFFBIG (32-bit int, long, pointer. off_t >= 64-bit) is future-proof

64-bit systems:
_POSIX_V7_LP64_OFF64 (32-bit int. 64-bit long, off_t, pointer)

Why do these options exist? Why do we not always simply use OFFBIG or OFF64? Because 64-bit return values, 64-bit arithmetic, etc, can be costly on 32-bit systems that simply do not require it. This is a concern that we need to be cognizant of for the Zephyr community.

Bug History

#10436 6 years old, closed as stale (off_t and ssize_t used as far back as 8 years ago)
eb0aaca (bits copied from d2258b0 despite DNM label)

Problem description

POSIX sits above the operating system and calls-in to kernel APIs, such as multi-threading and semaphores, and OS Service APIs, such as file system and networking, as shown in the diagram below.

Screenshot 2024-09-01 at 9 24 02 AM

However, there have historically been dependency cycles in the File System and Networking APIs (potentially others). A dependency cycle can be seen as a lower-layer depending on an upper layer, which of course depends again on the lower layer (a cycle).

A significant amount of work has been done already to eliminate the dependency cycles from Zephyr’s Networking subsystem, and in fact, the few remaining dependency cycles are now deprecated, slated for removal in the 4.1 release (#77069).

On the other hand, Zephyr’s base os (including lib/os/fdtable.c) pulls in two types that are specific to POSIX; ssize_t and off_t. Neither of these types are part of ISO C or native to Zephyr. This results in a dependency cycle in the Base OS and File System areas.

The problems that arise are:

  • The dependency cycle has caused type and include-path conflicts when adding scheduled features required for Zephyr’s POSIX implementation (e.g. v3.7 posix device io, fd mgmt).
  • Every other component in Zephyr that uses fdtable.h has a forced dependency on the POSIX API (networking, FS, etc).
  • Other components in Zephyr mimic this behaviour (e.g. Flash API)
  • It should be possible to build Zephyr without POSIX support (e.g. ISO-only C library)
  • An increasing amount of technical debt (so far we have 6+ years of principal + interest)

Proposed change

The change being proposed is to switch to standard ISO C types (and possibly Zephyr types) below the POSIX line, since there is no need to depend on POSIX at all.

Specifically, this RFC proposes Option 4 outlined below.

Note

Zephyr’s native architecture is considered a special case.

Detailed RFC

Proposed change (Detailed)

The preferred solution is detailed in Option 4 (below). In general, changes must be bisectable. So, start with the least invasive change (e.g. adding new types and Kconfigs), then using unions to provide alternative options for e.g. function pointers, then converting function pointers, one area at a time, and finally removing the unnecessary POSIX types and inclusions.

Option 1: Do nothing / cherry-pick or override POSIX types

Pros:

  • Preserves the status quo

Cons:

  • Sends a clear message that dependency cycles are a good thing
  • Sends a clear message that it’s OK to use a feature, even if it’s not enabled
  • Conflicts and implementation difficulties persist

Option 2: Use ISO C types only (int64_t)

  • ssize_t replaced with int
  • off_t replaced with int64_t

Pros:

  • Clean
  • No dependency cycles
  • No interoperability / range issues for extent sizes

Cons:

  • Performance impact on 32-bit only systems

Option 3: Use ISO C types only (long)

  • ssize_t replaced with int
  • off_t replaced with long

Pros:

  • Clean
  • No dependency cycles
  • No performance impact on 32-bit only systems
  • long is 32-bit on 32-bit systems and 64-bit on 64-bit systems
  • No configuration required

Cons:

  • Interoperability / range issues for extent sizes on 32-bit systems

Option 4: Use ISO C + Zephyr types

  • ssize_t replaced with int
  • off_t replaced with k_off_t
  • k_off_t size via Kconfig

Pros:

  • Clean
  • No dependency cycles
  • 32-bit users given a choice between interoperability and performance
  • Kconfig helps simplify library configuration

Cons:
A compromise on 32-bit-only systems resulting in one of the tradeoffs below. No negative consequences on 64-bit systems.

  • Performance impact on 32-bit only systems (if CONFIG_OFFSET_64BIT=y)
  • Interoperability issues for offset sizes (if CONFIG_OFFSET_64BIT=n)

The <sys/types.h> header shall be removed from Zephyr and Core Services.

Dependencies

Along with entries under include/, samples/, and tests/, changes are required in each of the following areas (see e.g. #75348):

  • drivers/bluetooth
  • drivers/display
  • drivers/eeprom
  • drivers/ethernet
  • drivers/flash
  • drivers/hwinfo
  • drivers/input
  • drivers/memc
  • drivers/mipi_dsi
  • drivers/misc
  • drivers/modem
  • drivers/net
  • drivers/retained_mem
  • drivers/sensor
  • drivers/wifi
  • lib/libc/minimal
  • lib/os
  • lib/utils
  • modules/canopennode
  • modules/lvgl
  • modules/openthread
  • subsys/bluetooth/audio
  • subsys/bluetooth/controller
  • subsys/bluetooth/host
  • subsys/bluetooth/mesh
  • subsys/bluetooth/services
  • subsys/bluetooth/shell
  • subsys/console
  • subsys/debug/coredump
  • subsys/dfu
  • subsys/fs
  • subsys/ipc
  • subsys/llext
  • subsys/logging
  • subsys/lorawan/nvm
  • subsys/mgmt/hawkbit
  • subsys/mgmt/mcumgr
  • subsys/mgmt/updatehub
  • subsys/net/lib/coap
  • subsys/net/lib/lwm2m
  • subsys/net/lib/mqtt_sn
  • subsys/net/lib/sockets
  • subsys/net/lib/websocket
  • subsys/retention
  • subsys/settings
  • subsys/shell
  • subsys/storage
  • subsys/usb/device_next

Concerns and Unresolved Questions

It would be very interesting to calculate the monthly compound interest rate (or even annualized interest rate) of technical debt between 2016 (or 2018) and today, measured in terms of lines of code.

Will add as needed.

Alternatives

Option 1 is the main alternative at this time although it is non-constructive.

Open to suggestions.

@cfriedt cfriedt added RFC Request For Comments: want input from the community Architecture Review Discussion in the Architecture WG required labels Sep 1, 2024
@cfriedt cfriedt self-assigned this Sep 1, 2024
@henrikbrixandersen
Copy link
Member

POSIX sits above the operating system and calls-in to kernel APIs, such as multi-threading and semaphores, and OS Service APIs, such as file system and networking, as shown in the diagram below.

Where does that diagram originate from? I am not sure I agree that "POSIX" can be considered as just one layer in this regard.

@cfriedt
Copy link
Member Author

cfriedt commented Sep 3, 2024

Where does that diagram originate from?

@henrikbrixandersen - this particular diagram was added in 72f52c9.

I am not sure I agree that "POSIX" can be considered as just one layer in this regard.

Would you care to elaborate?

@cfriedt
Copy link
Member Author

cfriedt commented Sep 15, 2024

Two weeks have expired without any kind of elaboration. I'm not sure if there needs to be a debate on the viewpoint above, since it's somewhat tangential.

The main purpose of this RFC is to resolve the dependency cycles introduced by the ssize_t and off_t types, choosing a suitable option out of the proposed options, and implementing the necessary changes in a controlled way.

@henrikbrixandersen
Copy link
Member

Two weeks have expired without any kind of elaboration. I'm not sure if there needs to be a debate on the viewpoint above, since it's somewhat tangential.

Well, this is not the first time, we are discussing this. My comments given in #75513 still stand.

@nashif
Copy link
Member

nashif commented Sep 16, 2024

Where does that diagram originate from? I am not sure I agree that "POSIX" can be considered as just one layer in this regard.

that is probably originally from me. POSIX in this diagram is the posix portability layer implementation, i.e. the type definition used prior to introducing the POSIX compatibility layer are not part of this box.

@cfriedt
Copy link
Member Author

cfriedt commented Sep 16, 2024

Well, this is not the first time, we are discussing this.

It certainly isn't.

My comments given in #75513 still stand.

Well, they are standing on a slippery slope.

So far, what is propping them up are defense of two anti-patterns, and pointing out how one implementation blurred the lines as a workaround, so we should do the same workaround. It unfortunately does not seem like a convincing argument for Option 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture Review Discussion in the Architecture WG required RFC Request For Comments: want input from the community
Projects
Status: Todo
Status: No status
Development

No branches or pull requests

3 participants