Adding specification to handle the new C2x _BitInt type #191

mmalcomson · 2023-02-20T12:15:46Z

Here we add mappings from the new _BitInt type to machine types.

All that I believe needs to be changed in the ABI documents themselves are entries in the Mapping tables for the PCS rules.
I explain in the commit messages below the reasoning that this is all that is needed.

This PR also contains a rationale document.
As discussed I'm putting the initial proposed document up and going to add the comments I've already heard onto this github request (to keep the discussion as in the open as possible).

mmalcomson

Added comments from Kristof Beyls gave me out-of-band.

Also added two comments that I noticed myself around improperly updating the ChangeLogs of the ABI documents.

mmalcomson · 2023-02-20T12:18:37Z

aapcs32/aapcs32.rst

+  | 2023Q1| 5\ :sup:`th` February 2022          | Document mapping from C2x _BitInt(N) and unsigned _BitInt(N) types|
+  |       |                                     | to machine types.                                                 |
+  +-------+-------------------------------------+-------------------------------------------------------------------+


On reviewing my own patch -- I guess it shouldn't include an update in this table since this is the table.
I'm now starting to think that maybe this table is just for changes in releases, and that should be made upon a release.
Would appreciate a confirmation/refutation on that from someone more in touch with the releases @stuij ?

Similar for the aapcs64.rst document.

mmalcomson · 2023-02-20T12:19:02Z

aapcs64/aapcs64.rst

+| 2023Q1     | 5\ :sup:`th`       | - Document mapping from C2x _BitInt(N) and unsigned _BitInt(N)   |
+|            | February 2022      |   types to machine types.                                        |
+------------+--------------------+------------------------------------------------------------------+


As above with the aapcs32.rst changelog.

mmalcomson · 2023-02-20T12:20:11Z

design-documents/bit-precise-types.rst

+As the rationale mentioned, some applications have uses for a specific bit-width
+type.  In the case of writing C code which can be used to determine FPGA


nitpick: /As the rationale in that proposal mentions/

mmalcomson · 2023-02-20T12:21:08Z

design-documents/bit-precise-types.rst

+general familiarity of programmers with the representation.
+


nitpick: maybe it reads more easily if this were a 3-item bullet list?

mmalcomson · 2023-02-20T12:21:38Z

design-documents/bit-precise-types.rst

+For this particular type we are estimating that the use of ``_BitInt`` types
+will not be such that operations on these types are performance critical.
+


Less of a nitpick: I'm not sure we agreed that operations on these types
will never be performance critical, which the above sentence seems to
suggest?
I'd probably start by explaining the difficulty in defining ABI
rules/understanding the trade-offs as there is almost no code currently
using _BitInt, and it is next-to-impossible to predict exactly how it
will be used most often in the future.

Nonetheless, we need to make trade-off decisions.

Therefore, let's analyze some of the characteristics and features of
this type to try and make a best guess of how these types may get used
when target Arm CPUs.

(The specifics on not much code with _BitInt being performance critical
could be repeated after the below analysis of use cases?)

The justifications in N2763 mention that _BitInt to some degree exists to allow code that is used in high-level synthesis to have better performance by not having to participate in the usual arithmetic promotions, like promoting up to int (which is about 32 bits), and only promoting up to the other _BitInt in size (so that adding _BitInt(4) and _BitInt(7) gives you a _BitInt(7), not a 32-bit int, which is easier to translate to an FPGA).

At larger sizes, it's more of a nicety (being able to "natively" specify "bigints" that the compiler handles for you), but an extremely appealing nicety.

I've just updated the rationale with extra details -- would appreciate your opinion.

Around the high-level synthesis point, my thoughts were that the resulting code would be run in the performance-critical scenario on an FPGA, and would only be run on an Arm architecture for testing or the like.
So I was thinking that the incentives on the Arm ABI coming from that use-case were not quite so focussed on performance.
The possibility that this code would end up being run "in anger" on an Arm architecture later down the line does then add some incentive towards good performance -- though not as much as if the first-order use were to be on Arm CPU's.

Does that seems reasonable from your perspective?

mmalcomson · 2023-02-20T12:21:52Z

design-documents/bit-precise-types.rst

+There seem to be two different regimes for these types.  The "small" regime
+where bit-precise types could be stored in a single register, and the "large"
+regime where bit-precise types must span multiple registers.


nitpick: maybe say general purpose register? Is the term "general
purpose register" used in other parts of the ABI docs?

mmalcomson · 2023-02-20T12:22:11Z

design-documents/bit-precise-types.rst

+These types must be at least byte-aligned so they are addressable, and at least
+rounded to a byte boundary in size for ``sizeof``.  Since these types have an
+aesthetic similarity to bit-fields one might expect better packing in an array
+of ``_BitInt(24)`` than an array of ``int32_t`` types (i.e. packing as good as a
+byte-array).  However, this would require a low alignment of such types and that
+would mean loading and storing of even "small" sized ``_BitInt``'s crossing
+cache boundaries -- leading to an unnecessary performance hit and hindering any
+atomic operations on these.
+


nitpick on style: I'm wondering if it would be better to more explicitly
surface the pros and cons of a trade-off decision in the structure of
this document, rather than using full prose? For the example above, I'm
thinking of for example something like:

Trade-off decision 1: alignment of small-regime _BitInt

There are 2 obvious options:

option 1: 1 byte alignment
pros: list them
cons: list them

option 2: alignment rules same as 8/16/32/64-byte larger datatype (todo:
maybe improve the wording of this title/option)
pros: list them
cons: list them

Discussion of option 1 vs option 2.
Describe why we chose option 2 based on the information and estimates we
currently have available.

(I noticed you have something similar further down for "Representation
in bits". It might be good to have the same structure for analyzing
different trade-off choices to be made in this document?)

mmalcomson · 2023-02-20T12:22:31Z

design-documents/bit-precise-types.rst

+Representation in bits
+----------------------


Should the above title be "Representation in registers" instead of "...
in bits"?

mmalcomson · 2023-02-20T12:23:11Z

design-documents/bit-precise-types.rst

+As mentioned above, we do not expect operations on ``_BitInt`` types to be
+performance critical.  Given that providing a productive environment for


Overall, I'm not sure if we can expect _BitInt to never be used in
performance-critical code.
If people will write code targeting FPGAs with _BitInt, presumably that
code is performance-critical (why would anyone otherwise target an FPGA
for that code?). If such code becomes popular, at some point, there may
be good reasons to use the same source code to target CPUs too. In that
case, the _BitInt code would be performance critical...

Perhaps it's more coherent to say "As mentioned above, we do not expect operations on _BitInt types at large sizes to be performance critical." since the benefits of _BitInt allowing sidestepping the arithmetic promotions disappear for sizes larger than _BitInt(32).

workingjubilee · 2023-02-21T21:11:39Z

Regarding _BitInt, I would like to raise a concern about whether C's _BitInt(128) and AArch64's __int128 will be compatible in terms of layout, alignment, and parameter passing. If they are not, it may significantly complicate ABI handling and type definitions, especially for other languages that are not C, and on AArch64, FFI compatibility with __int128 is much more important than on x86-64 because it is used in the "hardware ABI" (it appears a lot in the layout of essential types). It also seems like a programmer footgun in general if they are inconsistent, even for just normal C code, as well as its relevance for STXP/LDXP.

It's been a while since I've read the AAPCS64 last so apologies if this has already been addressed and the spec as-is will guarantee that, it wasn't immediately obvious from my skimming of this PR whether this was the case.

I agree that I don't think these will be performance-critical types at the larger sizes, however.

mmalcomson · 2023-02-22T11:09:43Z

@workingjubilee Thanks for the feedback!

W.r.t. the _BitInt(128) yeah, the current spec does have them compatible with AArch64 __int128, but it's still good to hear that it was on someones mind since it increases our confidence in that choice.

The "performance critical" statement was a bit of a clumsy glossing-over of some analysis on my part.
I'm in the process of adding the details of the analysis to the rationale document, hopefully that would make it clear what the current thought process is -- I'd appreciate hearing your thoughts on whether the adjustment makes sense and is clear when I post it.

Cheers!

workingjubilee · 2023-02-22T18:48:27Z

Aha, excellent!

and on AArch64 _BitInt(128) has the same alignment and size as a __int128 which is
the largest type defined on that platform. This falls out of the fact that
double-register size maps to the largest integral Fundamental Data Type defined
on both platforms.

Ah, good, this was the part I missed. Thank you!

I'll be happy to look at things when you follow up!

Wilco1

I agree with choosing option B and C for AArch64/AArch32 but I'd reduce the major focus on atomics since I'd be surprised if they would ever be used - it's not even clear whether they are supported by the mid-end since bitints are like bitfields and would require extra work.

Wilco1 · 2023-02-27T22:59:05Z

design-documents/bit-precise-types.rst


 - On AArch32 this could cause surprises to developers, given that on this
  architecture small Fundamental Data Types are have zero- or sign-extended
  extra bits.  So a ``char`` would not have the same representation as a
  ``_BitInt(8)`` on this architecture.

+- If used in calls to variadic functions which were written for standard
+  integral types this can give surprising results.


It would be reasonable to normalize a bitint when passed as a variadic argument without an explicit cast to int so eg. printf works fine either way. This makes it equivalent to how a char is passed as a variadic argument.

Hmm, honestly I'd not seriously considered this since the way that the PCS is defined is through a two step process of
"source code type" -> "fundamental data type"
then
"fundamental data type" -> "rule for PCS passing".

Now you've pointed it out it does seem reasonable.
Not sure whether the exceptional case could lead to confusion -- will invite other feedback to see what people think.
@smithp35 @stuij @rsandifo-arm

Test @rearnsha (wanted to tag you, the comment box doesn't want to auto-complete your name, don't know if typing it fully out will properly link -- this is a test).

Wilco1 · 2023-02-28T11:21:33Z

design-documents/bit-precise-types.rst

+  ``_BitInt``'s will not cross cache boundaries.
+- Atomic loads and stores can be made on these objects.
+- The representation of bit-precise types of the same size as standard integer
+  types will have the same alignment and size in memory.


That last point is really the most important. The first 2 points are not too convincing since unaligned accesses are practically for free on modern cores, and LSE2 allows atomics on unaligned data.

Ooh -- thanks!
I did not realise the cost of unaligned accesses were essentially free (useful to know).

Yes, optimization guides mention things like "The Neoverse N2 handles most unaligned accesses without performance penalties.", so you are unlikely to notice a penalty in typical compiled code.

It's not just about the atomic RMW operations, you also (I think) want ordinary reads and writes of _BitInt variables to be single-copy-atomic and implemtable with ordinary loads and stores. This implies they should be aligned so as not to cross cache boundaries, and also not packed along with other objects (i.e. no packing a _BitInt(24) together with a _BitInt(8) in a 32-bit word).

Wilco1 · 2023-02-28T11:33:09Z

design-documents/bit-precise-types.rst

@@ -135,16 +390,19 @@ It has the following negatives:



Again, mentioning atomics as the first, most important thing does not make any sense. For large bitints you can't do atomics at all (no atomics on int arr[100] either!), small bitints up to int128 could support atomics similar to how bitfields could support atomics (typically using CAS due to needing extra normalization step). And it's not like LDSMAX/LDUMAX is a popular atomic given it's not even a supported builtin in GCC...

Yeah -- I had not realised the order of bullet points would imply a priority order.
The atomics were not supposed to be presented as the most important reason -- will adjust.

To make sure I'm not missing something -- what extra normalization step are you referring to?
Is it the normalisation required for the set of operations which are not OK for a given representation (e.g. if choosing option B there would be a normalisation required for /, but not +) or is it something else you've noticed?

Yes, I mean the standard normalization. So while you can't use LDADD with option C, you can still use a CAS or load exclusive loop with an extra zero/sign-extend. Since you can do this for every atomic operation, being able to use LDADD or LDSMAX directly might be nice but shouldn't count as an advantage. Not being able to use atomics at all would be a disadvantage for a particular layout if atomic support is essential (and I'm not convinced it is for bitint).

Wilco1 · 2023-02-28T11:45:55Z

design-documents/bit-precise-types.rst

+values.  However, when acting on "large" values (i.e. greater than the size of
+one register) it loses some of that benefit.  Storing to and from memory would
+also come at a cost for this representation.  This is also likely to be the most
+surprising representation for developers on an Arm platform.



Note there is no reason why a compiler couldn't do register operations in format A, while passing arguments/storing as C and assuming B when loading bitints or using parameters/return values. A seems most useful as an optimized internal representation (which is entirely private) rather than as the storage format.

👍 I'll add a paragraph explicitly mentioning that the compiler could always choose to use a different representation at non ABI boundaries.

stuij · 2023-03-01T10:55:55Z

@rearnsha

`_BitInt(N)` and `unsigned _BitInt(N)` are new integral types added for C23. These types have a bit width of `N` and define a different type for each `N`. Here we define the language mapping between these language types and the machine data types used throughout AAPCS64 through which our calling standard is defined. Along with the AAPCS32 commit, this closes ARM-software#175. The rationale for the choices in this patch are presented in a seperate commit with a rationale document. Some points of note around the language in this commit: -- This commit does not update any wording around bit-fields. The current wording under "Bit-fields subdivision" mentions that "A member of an aggregate that is a Fundamental Data Type may be subdivided into bit-fields". I do not believe this needs updating. While a _BitInt(N>128) can be subdivided into a bit-field at the *language level*, once it has been converted to a Machine Type such a subdivision would look like some number of quad-words which have not been subdivided (either fully used or fully unused) and either one or zero quad-words which have been subdivided. The alignment requirements of the _BitInt are also the same as the alignment requirements of this fundamental data type of a quad-word, which this paragraph uses to explain the resulting alignment requirements on the aggregate containing a bit-field. In the C/C++ language mapping description of "Bit-fields" we mention that a bit-field may have any integral type and since a bit-precise integer is an integral type this still holds. The explanation of where a field can be placed in this section relies on *alignment* requirements of the field type. For _BitInt this lines up with the discussion on *fundamental data types* at the machine-level, since the alignment requirements of a _BitInt are that of the "chunk" it is made up of, which is the fundamental data type of a quad-word. Hence I believe the current language does not need updating for bit-precise integers. -- _BitInt(N > 128) types and Homogeneous Aggregates. With this changeset, _BitInt(N>128) types are treated as arrays of __int128 values. Hence at the machine level they would be a Homogeneous Aggregate of quad-words. The wording in section 5.9.5 currently mentions "uniquely addressable Members". I am not sure what this is referring to, but would expect this is referring to addressable members of the base type. If that is the case then I don't believe anything needs to be updated. If it were referring to addressable members at the language level (which would be strange given the context) then this may need updating since one language-level _BitInt(256) type would not equat to one Fundamental Data Type. -- Combination of unspecified bits in _BitInt and C.16 in PCS rules. The mapping this commits defines from a _BitInt to a Machine Type specifies that the bits of the relevant Machine Type that are unused in a _BitInt(N) have unspecified value. The rules C.12 and C.16 of our parameter passing standard specify that when there are unused bits of a structure and/or Integral fundamental data type that are passed in registers, those unused bits are unspecified. The combination of these two rules means that e.g. when passing a _BitInt(2) across a PCS boundary in a register, bits [2-63] inclusive are unspecified. -- In-memory and in-register representations match. This commit only specifies the mapping from language level type to machine type. The machine type is then treated as it currently is in memory and in registers.

`_BitInt(N)` and `unsigned _BitInt(N)` are new integral types added for C23. These types have a bit width of `N` and define a different type for each `N`. Here we define the language mapping between these language types and the machine data types used throughout AAPCS32 through which our calling standard is defined. Along with the AAPCS64 commit, this closes ARM-software#175. The rationale for the choices in this patch are presented in a seperate commit with a rationale document. Some points of note around the language in this commit: -- This commit does not update any wording around bit-fields. The current wording under "Bit-fields" section 5.3.4 mentions that "A member of an aggregate that is a Fundamental Data Type may be subdivided into bit-fields". I do not believe this needs updating. While a _BitInt(N>64) can be subdivided into a bit-field at the *language level*, once it has been converted to a Machine Type, such a subdivision would look like some number of double-words which have not been subdivided (either fully used or fully unused) and either one or zero double-words which have been subdivided. The alignment requirements of the _BitInt are also the same as the alignment requirements of this fundamental data type of a double-word, which this paragraph uses to explain the resulting alignment requirements on the aggregate containing a bit-field. In the C/C++ language mapping description of "Bit-fields" we mention that a bit-field may have any integral type and since a bit-precise integer is an integral type this still holds. The explanation of where a field can be placed in this section relies on *alignment* requirements of the field type. For _BitInt this lines up with the discussion on *fundamental data types* at the machine-level, since the alignment requirements of a _BitInt are that of the "chunk" it is made up of, which is the fundamental data type of a double-word. Hence I believe the current language does not need updating for bit-precise integers. -- _BitInt(N > 64) types and Homogeneous Aggregates. With this changeset, _BitInt(N>64) types are treated as arrays of uint64_t values. Hence at the machine level they would be a Homogeneous Aggregate of double-words. -- Combination of unspecified bits in _BitInt and B.2 in PCS rules. The mapping this commits defines from a _BitInt to a Machine Type specifies that the bits of the relevant Machine Type that are unused in a _BitInt(N) have unspecified value. Rule B.2 of our parameter passing standard specifies that when there are unused bits in an integral Fundamental Data Type that is passed in registers, those unused bits are zero- or sign-extended to a full-word. The combination of this rule with the fact that _BitInt types are zero- or sign-extended to the Fundamental Data Type which they are passed in means that e.g. when passing a _BitInt(2) across a PCS boundary in a register, bits [2-63] inclusive are sign-extended. -- In-memory and in-register representations match. This commit only specifies the mapping from language level type to machine type. The machine type is then treated as it currently is in memory and in registers.

This type has been added into the C2x specification, alongside changes to describe how they are represented at a machine level we also add a design document describing the rationale behind the choices we made. One of the main contentious issues has been around the alignment of large _BitInt types. In this document we have chosen to specify that large _BitInt's are treated as if they were an array of double-register sized chunks. Other psABI's have chosen to represent large _BitInt's as an array of single-register sized chunks. This means that differences in alignment might be surprising to developers attempting to write portable code. However in contrast we believe that developers familiar with ABI's for Arm Architectures would be less surprised by this decision. N.b. I happened to also notice a stack overflow question which was suggesting the use of _BitInt(128) for u128. Given the discussion did not have any mention of the different ABI between this and __uint128 I would take it as evidence for a benefit to having the two integrals ABI match. https://stackoverflow.com/questions/16088282/is-there-a-128-bit-integer-in-gcc

mmalcomson · 2023-09-29T10:17:17Z

Just pinging everyone that has been watching this.
Things seem to have settled on a decision here -- if there are still objections that I've missed could someone raise them again.

Otherwise could someone @rearnsha @smithp35 @stuij have a double-check that everything seems fine to them and approve this?

sallyarmneale

Looks good to me.

design-documents/bit-precise-types.rst

sallyarmneale · 2023-10-02T11:07:35Z

design-documents/bit-precise-types.rst

+Use-cases known of so far
+-------------------------
+
+There seem to be two different regimes for these types.  The "small" regime


Suggested change

There seem to be two different regimes for these types. The "small" regime

There are two different regimes for these types. The "small" regime

I adjusted to "We believe there are two different regimes" (point being that we could be mistaken -- there could be more logical regimes to think about -- and want to convey that this is how we view things rather than something which we can be 100% confident on)

sallyarmneale · 2023-10-02T11:07:56Z

design-documents/bit-precise-types.rst

+
+Since this is a new type there is large uncertainty on how it will be used by
+programmers in the future.  Decisions we make here may also influence future
+usage.  Nonetheless we must make trade-off decisions with this uncertainty.  The


Suggested change

usage. Nonetheless we must make trade-off decisions with this uncertainty. The

usage. Therefore, we must make trade-off decisions with this uncertainty. The

design-documents/bit-precise-types.rst

sallyarmneale · 2023-10-02T11:27:36Z

design-documents/bit-precise-types.rst

+critical than performance concerns in this use-case.
+
+
+To help the compiler optimize (e.g. for auto vectorization)


Suggested change

To help the compiler optimize (e.g. for auto vectorization)

To help the compiler optimize (for example, for auto vectorization)

Adjusted this to (possibly for auto vectorization) -- hope that isn't falling into another ease-of-parsing pitfall ;-)

design-documents/bit-precise-types.rst

sallyarmneale · 2023-10-02T12:49:30Z

design-documents/bit-precise-types.rst

+
+- This would be a less familiar representation to programmers.  Especially the
+  fact that a ``_BitInt(8)`` would not have the same representation in a
+  register as a ``char`` could cause confusion (e.g. when debugging, or writing


Suggested change

register as a ``char`` could cause confusion (e.g. when debugging, or writing

register as a ``char`` could cause confusion (For example, when debugging, or writing

For some reason the "for example" reads clunkier than "e.g." to me.
I went with cause confusion (we imagine when debugging or writing assembly code).
Hopefully that avoids the ease-of-parsing difficulty for non-native speakers.

sallyarmneale

Some minor changes - mostly phrasing. Please do avoid using latin contractions (e.g., i.e, and etc) they're not good for non-native speakers.

mmalcomson · 2023-10-02T13:58:05Z

Please do avoid using latin contractions (e.g., i.e, and etc) they're not good for non-native speakers.

Noted -- thanks for the info!

sallyarmneale · 2023-10-02T14:01:36Z

design-documents/bit-precise-types.rst

+``unsigned _BitInt(N)``.  These are defined for integral ``N`` and each ``N`` is
+a different type.
+
+The proposal for these types can be found in following link.


Suggested change

The proposal for these types can be found in following link.

The proposal for these types can be found in following link:

design-documents/bit-precise-types.rst

sallyarmneale

Some minor changes - mostly phrasing. Please do avoid using latin contractions (e.g., i.e, and etc) they're not good for non-native speakers.

Mostly things that should help the document be easily understood for non-native speakers.

mmalcomson · 2023-10-03T12:36:23Z

@sallyarmneale Thanks for the reviews -- I've updated all the things you pointed at and upon the re-read found some places that I felt could be better worded as well.
N.b. to anyone else reading -- I also re-ordered the reasons in a few places.
A while ago Wilco pointed out that the ordering of reasons gave an impression that we weighted the behaviour around atomics higher than we actually did, and on re-reading it seems I had only re-ordered some of the mentions.

sallyarmneale

Looks good to me

mmalcomson requested review from stuij and smithp35 February 20, 2023 12:15

mmalcomson commented Feb 20, 2023

View reviewed changes

mmalcomson force-pushed the bitint branch from e99c4fd to 1e0e0fa Compare February 24, 2023 09:58

Wilco1 suggested changes Feb 28, 2023

View reviewed changes

mmalcomson mentioned this pull request May 30, 2023

Define _BitInt ABI #175

Closed

mmalcomson added 2 commits September 12, 2023 13:34

mmalcomson force-pushed the bitint branch from 2e8bb26 to 19b1e5a Compare September 15, 2023 10:58

mmalcomson force-pushed the bitint branch from dc77de6 to 2a3ec7e Compare September 27, 2023 16:15

sallyarmneale approved these changes Oct 2, 2023

View reviewed changes

sallyarmneale reviewed Oct 2, 2023

View reviewed changes

design-documents/bit-precise-types.rst Outdated Show resolved Hide resolved