diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index fa836e4..3db4550 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -109,6 +109,7 @@ document | owner | Github handle [Morello extensions to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/aaelf64-morello) | Silviu Baranga | @sbaranga-arm [Morello Descriptor ABI for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/descabi-morello) | Silviu Baranga | @sbaranga-arm [Memtag ABI Extension to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/memtagabielf64) | Mitch Phillips | @hctim +[C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/atomicsabi64) | Luke Geeson | @lukeg101 3. Merging the change diff --git a/README.md b/README.md index 571a0e0..973d82a 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,7 @@ ELF for the Arm 64-bit Architecture | [aaelf64](a DWARF for the Arm 64-bit Architecture | [aadwarf64](aadwarf64/aadwarf64.rst) | [2020Q2](legacy-documents/aadwarf64/ihi0057_E/IHI0057_E_2020Q2_aadwarf64.pdf) C++ ABI for the Arm 64-bit Architecture | [cppabi64](cppabi64/cppabi64.rst) | [2020Q2](legacy-documents/cppabi64/ihi0059_E/IHI0059E_2020Q2_cppabi64.pdf) Vector Function ABI for the Arm 64-bit Architecture | [vfabia64](vfabia64/vfabia64.rst) | [2019Q2](legacy-documents/vfabia64/101129_1920/101129_1920_01_en.pdf) +C/C++ Atomics ABI for the Arm 64-bit Architecture | [atomicsabi64](atomicsabi64/atomicsabi64.rst) | n/a ### ABI for the Arm 64-bit Architecture with SVE support diff --git a/atomicsabi64/Arm_logo_blue_RGB.svg b/atomicsabi64/Arm_logo_blue_RGB.svg new file mode 100644 index 0000000..1f9a9ba --- /dev/null +++ b/atomicsabi64/Arm_logo_blue_RGB.svg @@ -0,0 +1,15 @@ + + + + + + diff --git a/atomicsabi64/CONTRIBUTIONS b/atomicsabi64/CONTRIBUTIONS new file mode 100644 index 0000000..113f5fa --- /dev/null +++ b/atomicsabi64/CONTRIBUTIONS @@ -0,0 +1,3 @@ +Contributions to this project are licensed under an inbound=outbound +model such that any such contributions are licensed by the contributor +under the same terms as those in the LICENSE file. diff --git a/atomicsabi64/LICENSE b/atomicsabi64/LICENSE new file mode 100644 index 0000000..aa6d839 --- /dev/null +++ b/atomicsabi64/LICENSE @@ -0,0 +1,22 @@ +This work is licensed under the Creative Commons +Attribution-ShareAlike 4.0 International License. To view a copy of +this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or +send a letter to Creative Commons, PO Box 1866, Mountain View, CA +94042, USA. + +Grant of Patent License. Subject to the terms and conditions of this +license (both the Public License and this Patent License), each +Licensor hereby grants to You a perpetual, worldwide, non-exclusive, +no-charge, royalty-free, irrevocable (except as stated in this +section) patent license to make, have made, use, offer to sell, sell, +import, and otherwise transfer the Licensed Material, where such +license applies only to those patent claims licensable by such +Licensor that are necessarily infringed by their contribution(s) alone +or by combination of their contribution(s) with the Licensed Material +to which such contribution(s) was submitted. If You institute patent +litigation against any entity (including a cross-claim or counterclaim +in a lawsuit) alleging that the Licensed Material or a contribution +incorporated within the Licensed Material constitutes direct or +contributory patent infringement, then any licenses granted to You +under this license for that Licensed Material shall terminate as of +the date such litigation is filed. diff --git a/atomicsabi64/README.md b/atomicsabi64/README.md new file mode 100644 index 0000000..24bea6b --- /dev/null +++ b/atomicsabi64/README.md @@ -0,0 +1,38 @@ +
+ +
+ +# C/C++ Atomics ABI for the Arm® 64-bit Architecture (AArch64) + + +## About this document + +This document describes the [Application Binary Interface for the use +of code generated by compiling C/C++ atomics targeting the Arm 64-bit architecture](atomicsabi64.rst). + +## About the license + +As identified more fully in the [LICENSE](LICENSE) file, this project +is licensed under CC-BY-SA-4.0 along with an additional patent +license. The language in the additional patent license is largely +identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0 +as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two +exceptions. + +First, several changes were made related to the defined terms so as to +reflect the fact that such defined terms need to align with the +terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing +“Work” to “Licensed Material”). + +Second, the defensive termination clause was changed such that the +scope of defensive termination applies to “any licenses granted to +You” (rather than “any patent licenses granted to You”). This change +is intended to help maintain a healthy ecosystem by providing +additional protection to the community against patent litigation +claims. + +## Defects report + +Please report defects in the [Atomics Application Binary Interface (ABI) +for the Arm 64-bit architecture](atomicsabi64.rst) to the [issue tracker +page on GitHub](https://github.com/ARM-software/abi-aa/issues). diff --git a/atomicsabi64/TRADEMARK_NOTICE b/atomicsabi64/TRADEMARK_NOTICE new file mode 100644 index 0000000..9a7a725 --- /dev/null +++ b/atomicsabi64/TRADEMARK_NOTICE @@ -0,0 +1,8 @@ +The text of and illustrations in this document are licensed +under a Creative Commons Attribution–Share Alike 4.0 International +license ("CC-BY-SA-4.0”), with an additional clause on patents. +The Arm trademarks featured here are registered trademarks or +trademarks of Arm Limited (or its subsidiaries) in the US and/or +elsewhere. All rights reserved. Please visit +https://www.arm.com/company/policies/trademarks for more information +about Arm’s trademarks. diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst new file mode 100644 index 0000000..cf3d915 --- /dev/null +++ b/atomicsabi64/atomicsabi64.rst @@ -0,0 +1,1087 @@ +.. + Copyright (c) 2024, Arm Limited and its affiliates. All rights reserved. + CC-BY-SA-4.0 AND Apache-Patent-License + See LICENSE file for details + +.. |release| replace:: 2024Q1 +.. |date-of-issue| replace:: 19\ :sup:`th` August 2024 +.. |copyright-date| replace:: 2024 +.. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its + affiliates. All rights reserved. + +.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest +.. _AAELF64: https://github.com/ARM-software/abi-aa/releases +.. _CPPABI64: https://github.com/ARM-software/abi-aa/releases +.. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf +.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836 +.. _OOPSLA: https://2024.splashcon.org/track/splash-2024-oopsla#event-overview +.. _RATIONALE: https://github.com/ARM-software/abi-aa/design-documents/atomics-ABI.rst + +********************************************************************************************* +C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture +********************************************************************************************* + +.. class:: version + +|release| + +.. class:: issued + +Date of Issue: |date-of-issue| + +.. class:: logo + +.. image:: Arm_logo_blue_RGB.svg + :scale: 30% + +.. section-numbering:: + +.. raw:: pdf + + PageBreak oneColumn + + +Preamble +======== + +Abstract +-------- + +This document describes the C/C++ Atomics Application Binary Interface for the +Arm 64-bit architecture. This document lists the valid mappings from C/C++ +Atomic Operations to sequences of AArch64 instructions. For further information +on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. + +Keywords +-------- + +C++, C, Application Binary Interface, ABI, AArch64, C++ ABI, generic C++ ABI, +Atomics, Concurrency + +Latest release and defects report +--------------------------------- + +Please check `C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture +`_ for the latest +release of this document. + +Please report defects in this specification to the `issue tracker page +on GitHub +`_. + +.. raw:: pdf + + PageBreak + +Acknowledgement +--------------- + +This ABI was written as part of Luke Geeson’s PhD on testing the +compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's +Compiler Teams. + +It is an offshoot from a paper that will be presented at OOPSLA 2024 [OOPSLA_]: +*Mix Testing: Specifying and Testing ABI Compatibility Of C/C++ Atomics Implementations* +by Luke Geeson, James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith, +Tyler Sorensen, and John Wickerson. + + + +Licence +------- + +This work is licensed under the Creative Commons +Attribution-ShareAlike 4.0 International License. To view a copy of +this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or +send a letter to Creative Commons, PO Box 1866, Mountain View, CA +94042, USA. + +Grant of Patent License. Subject to the terms and conditions of this +license (both the Public License and this Patent License), each +Licensor hereby grants to You a perpetual, worldwide, non-exclusive, +no-charge, royalty-free, irrevocable (except as stated in this +section) patent license to make, have made, use, offer to sell, sell, +import, and otherwise transfer the Licensed Material, where such +license applies only to those patent claims licensable by such +Licensor that are necessarily infringed by their contribution(s) alone +or by combination of their contribution(s) with the Licensed Material +to which such contribution(s) was submitted. If You institute patent +litigation against any entity (including a cross-claim or counterclaim +in a lawsuit) alleging that the Licensed Material or a contribution +incorporated within the Licensed Material constitutes direct or +contributory patent infringement, then any licenses granted to You +under this license for that Licensed Material shall terminate as of +the date such litigation is filed. + +About the license +----------------- + +As identified more fully in the Licence_ section, this project +is licensed under CC-BY-SA-4.0 along with an additional patent +license. The language in the additional patent license is largely +identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0 +as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two +exceptions. + +First, several changes were made related to the defined terms so as to +reflect the fact that such defined terms need to align with the +terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing +“Work” to “Licensed Material”). + +Second, the defensive termination clause was changed such that the +scope of defensive termination applies to “any licenses granted to +You” (rather than “any patent licenses granted to You”). This change +is intended to help maintain a healthy ecosystem by providing +additional protection to the community against patent litigation +claims. + +Contributions +------------- + +Contributions to this project are licensed under an inbound=outbound +model such that any such contributions are licensed by the contributor +under the same terms as those in the `Licence`_ section. + +Trademark notice +---------------- + +The text of and illustrations in this document are licensed by Arm +under a Creative Commons Attribution–Share Alike 4.0 International +license ("CC-BY-SA-4.0”), with an additional clause on patents. +The Arm trademarks featured here are registered trademarks or +trademarks of Arm Limited (or its subsidiaries) in the US and/or +elsewhere. All rights reserved. Please visit +https://www.arm.com/company/policies/trademarks for more information +about Arm’s trademarks. + +Copyright +--------- + +Copyright (c) |copyright-date|, Arm Limited and its affiliates. All rights +reserved. + +.. raw:: pdf + + PageBreak + +.. contents:: + :depth: 3 + +.. raw:: pdf + + PageBreak + +About this document +=================== + +Change control +-------------- + +Current status and anticipated changes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following support level definitions are used by the Arm Atomics ABI +specifications: + +**Release** + Arm considers this specification to have enough implementations, which have + received sufficient testing, to verify that it is correct. The details of + these criteria are dependent on the scale and complexity of the change over + previous versions: small, simple changes might only require one + implementation, but more complex changes require multiple independent + implementations, which have been rigorously tested for cross-compatibility. + Arm anticipates that future changes to this specification will be limited to + typographical corrections, clarifications and compatible extensions. + +**Beta** + Arm considers this specification to be complete, but existing + implementations do not meet the requirements for confidence in its release + quality. Arm may need to make incompatible changes if issues emerge from its + implementation. + +**Alpha** + The content of this specification is a draft, and Arm considers the + likelihood of future incompatible changes to be significant. + +All content in this document is at the **Alpha** quality level. + +Change History +-------------- + +If there is no entry in the change history table for a release, there are no +changes to the content of the document for that release. + +.. class:: atomicsabi64-change-history + +.. table:: + + +---------+------------------------------+-------------------------------------------------------------------+ + | Issue | Date | Change | + +=========+==============================+===================================================================+ + | 00alp0 | 19\ :sup:`th` August 2024. | Alpha Release. | + +---------+------------------------------+-------------------------------------------------------------------+ + + +References +---------- + +This document refers to, or is referred to by, the following documents. + +.. table:: + + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | Ref | External reference or URL | Title | + +=============+==============================================================+=============================================================================+ + | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | CSTD_ | ISO/IEC 9899:2018 | International Standard ISO/IEC 9899:2018 – Programming languages C. | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | AAELF64_ | ELF for the Arm 64-bit Architecture (AArch64) | ELF for the Arm 64-bit Architecture (AArch64) | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | CPPABI64_ | C++ ABI for the Arm 64-bit Architecture (AArch64) | C++ ABI for the Arm 64-bit Architecture (AArch64) | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | RATIONALE_ | Rationale Document for C11 Atomics ABI | Rationale Document for C11 Atomics ABI | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + + +.. raw:: pdf + + PageBreak + +Terms and Abbreviations +----------------------- + +The C/C++ Atomics ABI for the Arm 64-bit Architecture uses the following terms and +abbreviations. + +AArch64 + The 64-bit general-purpose register width state of the Armv8 architecture. + +ABI + Application Binary Interface: + + 1. The specifications to which an executable must conform in order to + execute in a specific execution environment. For example, the + :title-reference:`Linux ABI for the Arm Architecture`. + + 2. A particular aspect of the specifications to which independently + produced relocatable files must conform in order to be statically + linkable and executable. For example, the C++ ABI for the Arm 64-bit + Architecture [CPPABI64_], or ELF for the Arm Architecture [AAELF64_]. + +Arm-based + ... based on the Arm architecture ... + +Thread + A unit of computation (e.g. a POSIX thread) of a process, managed by the OS. + +Atomic Operation + An indivisble operation on a memory location. This can be a load, store, + exchange, compare, or arithmetic operation. Atomics may be used to define + higher level primitives including locks and concurrent queues. ISO C/C++ + defines a range of supported atomic types and operations. + +Concurrent Program + A C or C++ program that consists of one or more threads. Threads may + communicate with each other through memory locations, using both Atomic + Operations and standard memory accesses. + +Memory Order Parameter + The order of memory accesses as executed by each thread may not be the same + as the order they are written in the program. The Memory Order describes + how memory accesses are ordered with respect to other memory accesses or + Atomic Operations. ISO C/C++ defines a ``memory_order`` enum type for the set + of memory orders. + +Mapping + A mapping from an Atomic Operation to a sequence of AArch64 instructions. + +.. raw:: pdf + + PageBreak + +Overview +======== + +`AArch64 atomic mappings`_ defines the mappings from C/C++ atomic operations +to AArch64 that are interoperable. + +Arbitrary registers may be used in the mappings. Instructions marked with ``*`` +in the tables cannot use ``WZR`` or ``XZR`` as a destination register. This is +further detailed in `Special Cases`_. + +Only some variants of ``fetch_`` are listed since the mappings are identical +except for a different ````. + +Atomic operations and Memory Order are abbreviated as follows: + +.. table:: + + +----------------------------------------------------+--------------------------------------+ + | Atomic Operation | Short form | + +====================================================+======================================+ + | ``atomic_store_explicit(...)`` | ``store(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_load_explicit(...)`` | ``load(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_thread_fence(...)`` | ``fence(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_exchange_explicit(...)`` | ``exchange(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_add_explicit(...)`` | ``fetch_add(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_sub_explicit(...)`` | ``fetch_sub(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_or_explicit(...)`` | ``fetch_or(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_xor_explicit(...)`` | ``fetch_xor(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_and_explicit(...)`` | ``fetch_and(...)`` | + +----------------------------------------------------+--------------------------------------+ + +.. table:: + + +----------------------------------------------------+--------------------------------------+ + | Memory Order Parameter | Short form | + +====================================================+======================================+ + | ``memory_order_relaxed`` | ``relaxed`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_acquire`` | ``acquire`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_release`` | ``release`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_acq_rel`` | ``acq_rel`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_seq_cst`` | ``seq_cst`` | + +----------------------------------------------------+--------------------------------------+ + +If there are multiple mappings for an Atomic Operation, the rows of the table +show the options: + +.. table:: + + +----------------------------------------------------+--------------------------------------+ + | Atomic Operation | AArch64 | + +========================================+===========+======================================+ + | ``store(loc,val,relaxed)`` | ARCH1 | ``option A`` | + + +-----------+--------------------------------------+ + | | ARCH2 | ``option B`` | + +----------------------------------------+-----------+--------------------------------------+ + +Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE. + + +Suggestions and improvements to this specification may be submitted to the: +`issue tracker page on GitHub `_. + + + +AArch64 atomic mappings +======================= + +Synchronization Fences +---------------------- + + +-----------------------------------------------------+--------------------------------------+ + | Fence | AArch64 | + +=====================================================+======================================+ + | ``atomic_thread_fence(relaxed)`` | .. code-block:: none | + | | | + | | NOP | + +-----------------------------------------------------+--------------------------------------+ + | ``atomic_thread_fence(acquire)`` | .. code-block:: none | + | | | + | | DMB ISHLD | + +-----------------------------------------------------+--------------------------------------+ + | ``atomic_thread_fence(release)`` | .. code-block:: none | + | | | + | ``atomic_thread_fence(acq_rel)`` | DMB ISH | + | | | + | ``atomic_thread_fence(seq_cst)`` | | + +-------------------------------------+---------------+--------------------------------------+ + +32-bit types +------------ + +In what follows, register ``X1`` contains the location ``loc`` and ``W2`` +contains ``val``. ``W0`` contains input ``exp`` in compare-exchange. The result is +returned in ``W0``. + +.. table:: + + +-----------------------------------------------------+--------------------------------------+ + | Atomic Operation | AArch64 | + +=====================================================+======================================+ + | ``store(loc,val,relaxed)`` | .. code-block:: none | + | | | + | | STR W2, [X1] | + +-----------------------------------------------------+--------------------------------------+ + | ``store(loc,val,release)`` | .. code-block:: none | + | | | + | ``store(loc,val,seq_cst)`` | STLR W2, [X1] | + +-----------------------------------------------------+--------------------------------------+ + | ``load(loc,relaxed)`` | .. code-block:: none | + | | | + | | LDR W2, [X1] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | LDAR W2, [X1] | + + +---------------+--------------------------------------+ + | | ``FEAT_RCPC`` | .. code-block:: none | + | | | | + | | | LDAPR W2, [X1] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,seq_cst)`` | .. code-block:: none | + | | | + | | LDAR W2, [X1] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWP W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWPA W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWPL W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | ``exchange(loc,val,seq_cst)`` | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWAL W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADD W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADDA W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADDL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | ``fetch_add(loc,val,seq_cst)`` | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADDAL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,relaxed,relaxed)`` | | | + | | | MOV W4, W0 | + | | | loop: | + | | | LDXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CAS W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,acquire,acquire)`` | | | + | | | MOV W4, W0 | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASA W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,release,release)`` | | | + | | | MOV W4, W0 | + | | | loop: | + | | | LDXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,acq_rel,acquire)`` | | | + | | | MOV W4, W0 | + | ``compare_exchange_strong(`` | | loop: | + | ``loc,exp,val,seq_cst,seq_cst)`` | | LDAXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASAL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + + +8-bit types +----------- + +The mappings for 8-bit types are the same as 32-bit types except they use the +``B`` variants of instructions. + + +16-bit types +------------ + +The mappings for 16-bit types are the same as 32-bit types except they use the +``H`` variants of instructions. + +64-bit types +------------ + +The mappings for 64-bit types are the same as 32-bit types except the registers +used are X-registers. + +128-bit types +------------- + +Since the access width of 128-bit types is double that of the 64-bit register +width, the following mappings use *pair* instructions, which require their own +table. + +In what follows, register ``X4`` contains the location ``loc``, ``X2`` and +``X3`` contain the input value ``val``. ``X0`` and ``X1`` contain input ``exp`` in +compare-exchange. The result is returned in ``X0`` and ``X1``. + +.. table:: + + +-----------------------------------------------------+--------------------------------------+ + | Atomic Operation | AArch64 | + +=====================================+===============+======================================+ + | ``store(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP XZR, X1, [X4] | + | | | STXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASP X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | STP X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``store(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP XZR, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + + +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | DMB ISH | + | | | STP X2, X3, [X4] | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | STILP X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``store(loc,val,seq_cst)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP XZR, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPAL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + + +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | DMB ISH | + | | | STP X2, X3, [X4] | + | | | DMB ISH | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | STILP x2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASP X0, X1, X0, X1, [X4] | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPA X0, X1, X0, X1, [X4] | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | DMB ISHLD | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | LDIAPP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,seq_cst)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPA X0, X1, X0, X1, [X4] | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | LDAR X5, [X4] | + | | | LDP X0, X1, [X4] | + | | | DMB ISHLD | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | LDAR X5, [X4] | + | | | LDIAPP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | STXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASP X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPA X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPPA X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPPL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | ``exchange(loc,val,seq_cst)`` | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPAL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPPAL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASP X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASPA X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STLXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASPL X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | ``fetch_add(loc,val,seq_cst)`` | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STLXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASPAL X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,relaxed)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,acquire)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETPA X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,release)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETPL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,acq_rel)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | ``fetch_or(loc,val,seq_cst)`` | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETPAL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,relaxed)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MVN X0, X2 | + | | | MVN X1, X3 | + | | | LDCLRP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,acquire)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MVN X0, X2 | + | | | MNV X1, X3 | + | | | LDCLRPA X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,release)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MVN X0, X2 | + | | | MVN X1, X3 | + | | | LDCLRPL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,acq_rel)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | ``fetch_and(loc,val,seq_cst)`` | | MVN X0, X2 | + | | | MVN X1, X3 | + | | | LDCLRPAL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,relaxed,relaxed)`` | | | + | | | loop: | + | | | LDXP X6, X7, [X4] | + | | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASP X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,acquire,acquire)`` | | | + | | | loop: | + | ``compare_exchange_strong(`` | | LDAXP X6, X7, [X4] | + | ``loc,exp,val,acquire,relaxed)`` | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPA X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,release,relaxed)`` | | | + | | | loop: | + | | | LDXP X6, X7, [X4] | + | | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STLXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPL X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,acq_rel,acquire)`` | | | + | | | loop: | + | ``compare_exchange_strong(`` | | LDAXP X6, X7, [X4] | + | ``loc,exp,val,seq_cst,acquire)`` | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STLXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPAL X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + + + +Special Cases +============= + +Unused result in Read-Modify-Write atomics +------------------------------------------ + +``CAS``, ``SWP`` and ``LD`` instructions must not use the zero register if +the result is not used since it allows reordering of the read past a +``DMB ISHLD`` barrier. Affected instructions are marked with ``*``. + +Const-Qualified 128-bit Atomic Loads +------------------------------------ + +Const-qualified data containing 128-bit atomic types should not be placed +in read-only memory (such as the ``.rodata`` section). + +Before FEAT_LSE2, the only way to implement a single-copy 128-bit atomic load +is by using a Read-Modify-Write sequence. The write is not visible to +software if the memory is writeable. Compilers and runtimes should prefer the +FEAT_LSE2/FEAT_LRCPC3 sequence when available. + diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst new file mode 100644 index 0000000..2f12ecd --- /dev/null +++ b/design-documents/atomics-ABI.rst @@ -0,0 +1,321 @@ +.. + Copyright (c) 2023, Arm Limited and its affiliates. All rights reserved. + CC-BY-SA-4.0 AND Apache-Patent-License + See LICENSE file for details + +.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest +.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836 +.. _ATOMICS64: https://github.com/ARM-software/abi-aa/atomicsabi64/atomicsabi64.rst + +Rationale Document for C11 Atomics ABI. +*************************************** + +Scope +===== + +This document contains the design rationale for C/C++ Atomics Application +Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture +defined in ATOMICS64_. Nothing in this document +is part of the specification. The purpose is to record the rationale +for the specification as well as alternatives that were considered. +Any contradictions between this rationale and the specification shall +be resolved in favor of the specification. + +This document assumes that the reader is familiar with ATOMICS64_ +and the 32-bit build attributes defined in ATOMICS64_ and will use +concepts defined in these documents. + +Preamble +======== + +Background +---------- + +This document describes the rationale behind the ABI choices made for mapping +from C11 atomic operations to Arm AArch64 assembly sequences. + +From the perspective of the Arm ABI we have some decisions to +make: + +- We need to choose a baseline ABI (a set of mappings), that is compatible for all versions of the Armv8 architecture. +- The mappings should cover atomic accesses of various sign, size, and type accessible through C11 atomic operations using compiler profiles. + +We have identified the following trade-offs: + +- Performance of different mappings versus compatibility with all architectures. +- Whether certain compiler operations lead to unexpected behaviours. + +The use cases expanded upon below motivate why we need an atomics abi: + +- The need for a baseline ABI. +- Knowing when an implementation departs from that baseline. +- Backwards compatibility of atomics as new mappings are added. +- Compatibility between compilers and runtimes. +- The need to constrain optimisations on specific atomic operations. +- Documenting the interoperable mappings. +- providing a basis upon which ABI compatibility can be tested. + +References +---------- + +This document refers to, or is referred to by, the following documents. + +.. table:: + + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ + | Ref | External reference or URL | Title | + +=============+==============================================================+===============================================================================================+ + | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ + | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ + | ATOMICS64_ | Atomics ABI | C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ + + + +Note: At the time of writing C23 is not released, as such ISO C17 is considered +the latest published document. + +Known use-cases +--------------- + + +A Baseline: Describing current implementations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ABI we provide is a baseline specification that compilers should implement. +Compilers that implement the baseline specification are compatible across all versions +of the Armv8 architecture. Most of the mappings in the ABI are already implemented in +LLVM and GCC and this ABI ratifies a decade of established practice, and provides +alternatives where the current practice is incompatible. + + +Sub-ABIs and ABI-islands: Departing from the baseline (or 'mainland') +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We do *not* require that compilers implement this ABI. Implementers can specify their own +ABI, whether it is a subset of the allowed mappings of the baseline ABI (a sub-ABI), or +uses different mappings altogether (an ABI-island). Currently, sub-ABIs and ABI-islands implicitly +arise with each new architecture release, and implementers quickly find new candidate mappings +that are performant on their machines. Such mappings are proposed or added to mainstream +compilers. However due to the lack of a baseline specification or widespread +concurrency expertise, testing such mappings has been a challenge and concurrency bugs have been +unintentionally introduced into compilers when new mappings are added. + +We need a baseline ABI in order to determine if a given sub-ABI respects or departs +from the baseline. Adding command-line options is a logical consequence of defining such an ABI, +and makes it possible to track ABI compatibility of concurrent programs at compile or link-time, +rather than runtime. It is the responsibility of the sub-ABI user to ensure code built +under their ABI does not mix with code built under the baseline. But a baseline must exist +for sub-ABI compatibility to be decided in the first place. + +Where a compiler implementation departs from the baseline completely (an ABI-island), +Arm cannot provide any statement on the compatibility of the extensions with respect +to the baseline specification. In the ABI-island, which could be a known incompatibility +with the base-line then users should not mix ABIs. It is QoI whether a toolchain is +able to diagnose incompatibility. + +Further, numerous parties have asked the ABI team whether the same atomics mapping is correct. +Writing down the known cases helps engineers answer these queries without the concurrency +expertise required to come up with current compatible mappings. A future section of this document +could document common queries received by the ABI team, in order to assist implementers and +engineers with such issues. + +Backwards Compatibility and New Architecture Features +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Put another way, A baseline ABI helps with the decisions of compatibility of new mappings. +Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different +single-copy atomicity guarantees with respect to different architecture versions. A baseline +decides which assembly sequences can be composed correctly (at least as far as testing can decide). + + +Compatibility Between Compilers and Runtimes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The above issues also apply when ensuring object files compiled with different compilers can be mixed. +For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of +places where this does not apply, both when compiling to target the same architecture version, and when mixing +different (compatible) architecture versions. Further, the above issues are not limited to statically compiled +code. We found one instance where proposed mappings implemented in a JiT compiler would not be interoperable +with statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and +is not subject to an ABI, it may still depend on other libraries or components that do have an ABI. + + +Constrain optimisations +~~~~~~~~~~~~~~~~~~~~~~~ + +The frequency of this behaviour justifies collecting these cases together to outline why they should not occur. +For example, consider the following Concurrent Program:: + + // Shared-Memory Locations + _Atomic int* x; + _Atomic int* y; + + // Memory Order Parameter + #define relaxed memory_order_relaxed + #define release memory_order_release + #define acquire memory_order_acquire + + // Threads of Execution + void thread_0 () { + atomic_store_explicit(x,1,relaxed); + atomic_thread_fence(release); + atomic_store_explicit(y,1,relaxed); + } + + void thread_1 () { + atomic_exchange_explicit(y,2,release); + atomic_thread_fence(acquire); + int r0 = atomic_load_explicit(x,relaxed); + } + + +Under ISO C, the above Concurrent Program finishes execution in one of three +possible outcomes (a reference for this notation is found here [PAPER_]):: + + { thread_1:r0=0; y=1; } + { thread_1:r0=1; y=1; } + { thread_1:r0=1; y=2; } + +In this case the value read by the exchange on ``thread_1`` is not used, and a +compiler is free to remove references to unused data. It is not legal according +to this ABI for a compliant implementation to translate the program into +the following Assembly Sequences:: + + thread_0: + MOV W9,#1 + STR W9,[X2] + DMB ISH + STR W3,[X4] + + thread_1: + MOV W9,#2 + SWP W9, WZR, [X2] + DMB ISHLD + LDR W3,[X4] + +where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains +the address of ``y``, ``thread_1:X2`` contains the address of ``y``, and +``thread_1:X4`` contains the address of ``x``. + +The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly +Instruction, where its destination register is the zero register ``WZR``. The +``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly +Instruction. + +Executing the compiled program on an Arm-based machine from a fixed initial +state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, +according to the AArch64 Memory Model contained in §B2 of the Arm Architecture +Reference Manual [ARMARM_]:: + + { thread_1:r0=0; [y]=1; } + { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug! + { thread_1:r0=1; [y]=1; } + { thread_1:r0=1; [y]=2; } + +By comparing ``W3`` and the local variable ``r0`` of the original Concurrent +Program we see there is one additional outcome of executing the compiled +program that is not an outcome of executing the Concurrent Program. This is +because the Arm Architecture Reference Manual [ARMARM_] states that +*instructions where the destination register is WZR or XZR, are not regarded +as doing a read for the purpose of a DMB LD barrier.* + +In this case the compiler introduces another outcome of Execution. To fix this +issue, a compiler is not permitted to rewrite the destination register to be the +zero register:: + + thread_0: + MOV W9,#1 + STR W9,[X2] + DMB ISH + STR W3,[X4] + + thread_1: + MOV W9,#2 + SWP W9, W10, [X2] + DMB ISHLD + LDR W3,[X4] + +Executing the compiled program on an Arm-based machine from a fixed initial +state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, +according to the AArch64 Memory Model contained in §B2 of the Arm Architecture +Reference Manual [ARMARM_]:: + + { thread_1:r0=0; [y]=1; } + { thread_1:r0=1; [y]=1; } + { thread_1:r0=1; [y]=2; } + +As such the unexpected outcome has disappeared. There are multiple Mappings +that exhibit this behaviour. Assembly Sequences affected make use of ``SWP`` +and ``LD`` Assembly instructions. + +Documentation +~~~~~~~~~~~~~ + +The collective knowledge of atomics ABIs exists as numerous online discusions. +These discussions are neither authoritative nor persistent. Some discussions +are now inaccessible and others are out of date. This is problematic given the +inherent complexity of relaxed memory concurrency, the difficulty of finding bugs, +and the possibility of user error. We believe an ABI is necessary to document +this corner of code generation. + + +The Mix Testing Process +----------------------- + +ABI compatibility must be testable. Concurrency is not trivial, and the ABI +presents a simplification of part of the problem that is understandable by +engineers. We provide a simple technique for testing ABI compatibility. +This technique reduces the difficulty of checking compatibility from a +problem of understanding concurrent executions, to the familiar testing +domain of comparing program outcomes of tests. This document does not +preclude other means of testing compatibility. + +We test for Compiler bugs. A Compiler Bug is defined as an outcome of a +compiled program execution (under the AArch64 Memory Model contained in +§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not +an outcome of execution of the source Concurrent Program (under the +ISO C memory model). Consider the hypothetical example where a source +Concurrent Program finishes execution in one of three possible outcomes +(a reference for this notation is found here [PAPER_]):: + + { thread_0:r0=0, thread_1:r0=1 } + { thread_0:r0=1, thread_1:r0=0 } + { thread_0:r0=1, thread_1:r0=1 } + +and one compiled program execution run has the following possible outcomes +according to the AArch64 Memory Model contained in §B2 of the Arm +Architecture Reference Manual [ARMARM_]:: + + { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug! + { thread_0:X3=0, thread_1:X3=1 } + { thread_0:X3=1, thread_1:X3=0 } + { thread_0:X3=1, thread_1:X3=1 } + +By comparing ``X3`` and the local variable ``r0`` of the original Concurrent +Program in this example we see there is one additional outcome of executing the +compiled program that is not an outcome of executing the source program (under +the respective models). This suggests the Mappings under question are +incompatible, and a compiler that implements them exhibits a Compiler Bug. To +ensure compatibility we therefore test for the absence of such outcomes of the +compiled programs when mixing all combinations of the above Mappings. We define +the *Mix Testing* process as follows: + +#. Take an arbitrary Concurrent Program. When executed on the C/C++ memory + model, it will produce outcomes *S*. +#. Split out the individual Atomic Operations from the initial concurrent + program into individual source files. +#. Compile each individual source file containing an Atomic Operation + using each Compiler Profile under test that generates Assembly Sequences + under a given Mapping. +#. Combine the Assembly Sequences from above into *multiple* possible Compiled + Programs. +#. Compute the outcomes of each compiled program under the AArch64 Memory Model + contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a + *set* of compiled program outcomes *C*. +#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug + (Check that *c* is a subset of *S*), the given Mappings are not + interoperable. + diff --git a/tools/common/check-rst-syntax.sh b/tools/common/check-rst-syntax.sh index cd99217..842bec5 100755 --- a/tools/common/check-rst-syntax.sh +++ b/tools/common/check-rst-syntax.sh @@ -38,6 +38,9 @@ declare -a docs=( # semihosting "semihosting" + + # atomics + "atomicsabi64" ) for doc in "${docs[@]}"; do diff --git a/tools/common/generate-release-links.sh b/tools/common/generate-release-links.sh index db00887..2774e78 100755 --- a/tools/common/generate-release-links.sh +++ b/tools/common/generate-release-links.sh @@ -57,6 +57,7 @@ cat <