diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index fa836e4..3db4550 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -109,6 +109,7 @@ document | owner | Github handle
[Morello extensions to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/aaelf64-morello) | Silviu Baranga | @sbaranga-arm
[Morello Descriptor ABI for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/descabi-morello) | Silviu Baranga | @sbaranga-arm
[Memtag ABI Extension to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/memtagabielf64) | Mitch Phillips | @hctim
+[C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/atomicsabi64) | Luke Geeson | @lukeg101
3. Merging the change
diff --git a/README.md b/README.md
index 571a0e0..973d82a 100644
--- a/README.md
+++ b/README.md
@@ -71,6 +71,7 @@ ELF for the Arm 64-bit Architecture | [aaelf64](a
DWARF for the Arm 64-bit Architecture | [aadwarf64](aadwarf64/aadwarf64.rst) | [2020Q2](legacy-documents/aadwarf64/ihi0057_E/IHI0057_E_2020Q2_aadwarf64.pdf)
C++ ABI for the Arm 64-bit Architecture | [cppabi64](cppabi64/cppabi64.rst) | [2020Q2](legacy-documents/cppabi64/ihi0059_E/IHI0059E_2020Q2_cppabi64.pdf)
Vector Function ABI for the Arm 64-bit Architecture | [vfabia64](vfabia64/vfabia64.rst) | [2019Q2](legacy-documents/vfabia64/101129_1920/101129_1920_01_en.pdf)
+C/C++ Atomics ABI for the Arm 64-bit Architecture | [atomicsabi64](atomicsabi64/atomicsabi64.rst) | n/a
### ABI for the Arm 64-bit Architecture with SVE support
diff --git a/atomicsabi64/Arm_logo_blue_RGB.svg b/atomicsabi64/Arm_logo_blue_RGB.svg
new file mode 100644
index 0000000..1f9a9ba
--- /dev/null
+++ b/atomicsabi64/Arm_logo_blue_RGB.svg
@@ -0,0 +1,15 @@
+
+
+
+
+
+
diff --git a/atomicsabi64/CONTRIBUTIONS b/atomicsabi64/CONTRIBUTIONS
new file mode 100644
index 0000000..113f5fa
--- /dev/null
+++ b/atomicsabi64/CONTRIBUTIONS
@@ -0,0 +1,3 @@
+Contributions to this project are licensed under an inbound=outbound
+model such that any such contributions are licensed by the contributor
+under the same terms as those in the LICENSE file.
diff --git a/atomicsabi64/LICENSE b/atomicsabi64/LICENSE
new file mode 100644
index 0000000..aa6d839
--- /dev/null
+++ b/atomicsabi64/LICENSE
@@ -0,0 +1,22 @@
+This work is licensed under the Creative Commons
+Attribution-ShareAlike 4.0 International License. To view a copy of
+this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or
+send a letter to Creative Commons, PO Box 1866, Mountain View, CA
+94042, USA.
+
+Grant of Patent License. Subject to the terms and conditions of this
+license (both the Public License and this Patent License), each
+Licensor hereby grants to You a perpetual, worldwide, non-exclusive,
+no-charge, royalty-free, irrevocable (except as stated in this
+section) patent license to make, have made, use, offer to sell, sell,
+import, and otherwise transfer the Licensed Material, where such
+license applies only to those patent claims licensable by such
+Licensor that are necessarily infringed by their contribution(s) alone
+or by combination of their contribution(s) with the Licensed Material
+to which such contribution(s) was submitted. If You institute patent
+litigation against any entity (including a cross-claim or counterclaim
+in a lawsuit) alleging that the Licensed Material or a contribution
+incorporated within the Licensed Material constitutes direct or
+contributory patent infringement, then any licenses granted to You
+under this license for that Licensed Material shall terminate as of
+the date such litigation is filed.
diff --git a/atomicsabi64/README.md b/atomicsabi64/README.md
new file mode 100644
index 0000000..24bea6b
--- /dev/null
+++ b/atomicsabi64/README.md
@@ -0,0 +1,38 @@
+
+
+
+
+# C/C++ Atomics ABI for the Arm® 64-bit Architecture (AArch64)
+
+
+## About this document
+
+This document describes the [Application Binary Interface for the use
+of code generated by compiling C/C++ atomics targeting the Arm 64-bit architecture](atomicsabi64.rst).
+
+## About the license
+
+As identified more fully in the [LICENSE](LICENSE) file, this project
+is licensed under CC-BY-SA-4.0 along with an additional patent
+license. The language in the additional patent license is largely
+identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0
+as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two
+exceptions.
+
+First, several changes were made related to the defined terms so as to
+reflect the fact that such defined terms need to align with the
+terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing
+“Work” to “Licensed Material”).
+
+Second, the defensive termination clause was changed such that the
+scope of defensive termination applies to “any licenses granted to
+You” (rather than “any patent licenses granted to You”). This change
+is intended to help maintain a healthy ecosystem by providing
+additional protection to the community against patent litigation
+claims.
+
+## Defects report
+
+Please report defects in the [Atomics Application Binary Interface (ABI)
+for the Arm 64-bit architecture](atomicsabi64.rst) to the [issue tracker
+page on GitHub](https://github.com/ARM-software/abi-aa/issues).
diff --git a/atomicsabi64/TRADEMARK_NOTICE b/atomicsabi64/TRADEMARK_NOTICE
new file mode 100644
index 0000000..9a7a725
--- /dev/null
+++ b/atomicsabi64/TRADEMARK_NOTICE
@@ -0,0 +1,8 @@
+The text of and illustrations in this document are licensed
+under a Creative Commons Attribution–Share Alike 4.0 International
+license ("CC-BY-SA-4.0”), with an additional clause on patents.
+The Arm trademarks featured here are registered trademarks or
+trademarks of Arm Limited (or its subsidiaries) in the US and/or
+elsewhere. All rights reserved. Please visit
+https://www.arm.com/company/policies/trademarks for more information
+about Arm’s trademarks.
diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
new file mode 100644
index 0000000..cf3d915
--- /dev/null
+++ b/atomicsabi64/atomicsabi64.rst
@@ -0,0 +1,1087 @@
+..
+ Copyright (c) 2024, Arm Limited and its affiliates. All rights reserved.
+ CC-BY-SA-4.0 AND Apache-Patent-License
+ See LICENSE file for details
+
+.. |release| replace:: 2024Q1
+.. |date-of-issue| replace:: 19\ :sup:`th` August 2024
+.. |copyright-date| replace:: 2024
+.. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its
+ affiliates. All rights reserved.
+
+.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest
+.. _AAELF64: https://github.com/ARM-software/abi-aa/releases
+.. _CPPABI64: https://github.com/ARM-software/abi-aa/releases
+.. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
+.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836
+.. _OOPSLA: https://2024.splashcon.org/track/splash-2024-oopsla#event-overview
+.. _RATIONALE: https://github.com/ARM-software/abi-aa/design-documents/atomics-ABI.rst
+
+*********************************************************************************************
+C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture
+*********************************************************************************************
+
+.. class:: version
+
+|release|
+
+.. class:: issued
+
+Date of Issue: |date-of-issue|
+
+.. class:: logo
+
+.. image:: Arm_logo_blue_RGB.svg
+ :scale: 30%
+
+.. section-numbering::
+
+.. raw:: pdf
+
+ PageBreak oneColumn
+
+
+Preamble
+========
+
+Abstract
+--------
+
+This document describes the C/C++ Atomics Application Binary Interface for the
+Arm 64-bit architecture. This document lists the valid mappings from C/C++
+Atomic Operations to sequences of AArch64 instructions. For further information
+on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_].
+
+Keywords
+--------
+
+C++, C, Application Binary Interface, ABI, AArch64, C++ ABI, generic C++ ABI,
+Atomics, Concurrency
+
+Latest release and defects report
+---------------------------------
+
+Please check `C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture
+`_ for the latest
+release of this document.
+
+Please report defects in this specification to the `issue tracker page
+on GitHub
+`_.
+
+.. raw:: pdf
+
+ PageBreak
+
+Acknowledgement
+---------------
+
+This ABI was written as part of Luke Geeson’s PhD on testing the
+compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's
+Compiler Teams.
+
+It is an offshoot from a paper that will be presented at OOPSLA 2024 [OOPSLA_]:
+*Mix Testing: Specifying and Testing ABI Compatibility Of C/C++ Atomics Implementations*
+by Luke Geeson, James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith,
+Tyler Sorensen, and John Wickerson.
+
+
+
+Licence
+-------
+
+This work is licensed under the Creative Commons
+Attribution-ShareAlike 4.0 International License. To view a copy of
+this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or
+send a letter to Creative Commons, PO Box 1866, Mountain View, CA
+94042, USA.
+
+Grant of Patent License. Subject to the terms and conditions of this
+license (both the Public License and this Patent License), each
+Licensor hereby grants to You a perpetual, worldwide, non-exclusive,
+no-charge, royalty-free, irrevocable (except as stated in this
+section) patent license to make, have made, use, offer to sell, sell,
+import, and otherwise transfer the Licensed Material, where such
+license applies only to those patent claims licensable by such
+Licensor that are necessarily infringed by their contribution(s) alone
+or by combination of their contribution(s) with the Licensed Material
+to which such contribution(s) was submitted. If You institute patent
+litigation against any entity (including a cross-claim or counterclaim
+in a lawsuit) alleging that the Licensed Material or a contribution
+incorporated within the Licensed Material constitutes direct or
+contributory patent infringement, then any licenses granted to You
+under this license for that Licensed Material shall terminate as of
+the date such litigation is filed.
+
+About the license
+-----------------
+
+As identified more fully in the Licence_ section, this project
+is licensed under CC-BY-SA-4.0 along with an additional patent
+license. The language in the additional patent license is largely
+identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0
+as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two
+exceptions.
+
+First, several changes were made related to the defined terms so as to
+reflect the fact that such defined terms need to align with the
+terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing
+“Work” to “Licensed Material”).
+
+Second, the defensive termination clause was changed such that the
+scope of defensive termination applies to “any licenses granted to
+You” (rather than “any patent licenses granted to You”). This change
+is intended to help maintain a healthy ecosystem by providing
+additional protection to the community against patent litigation
+claims.
+
+Contributions
+-------------
+
+Contributions to this project are licensed under an inbound=outbound
+model such that any such contributions are licensed by the contributor
+under the same terms as those in the `Licence`_ section.
+
+Trademark notice
+----------------
+
+The text of and illustrations in this document are licensed by Arm
+under a Creative Commons Attribution–Share Alike 4.0 International
+license ("CC-BY-SA-4.0”), with an additional clause on patents.
+The Arm trademarks featured here are registered trademarks or
+trademarks of Arm Limited (or its subsidiaries) in the US and/or
+elsewhere. All rights reserved. Please visit
+https://www.arm.com/company/policies/trademarks for more information
+about Arm’s trademarks.
+
+Copyright
+---------
+
+Copyright (c) |copyright-date|, Arm Limited and its affiliates. All rights
+reserved.
+
+.. raw:: pdf
+
+ PageBreak
+
+.. contents::
+ :depth: 3
+
+.. raw:: pdf
+
+ PageBreak
+
+About this document
+===================
+
+Change control
+--------------
+
+Current status and anticipated changes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following support level definitions are used by the Arm Atomics ABI
+specifications:
+
+**Release**
+ Arm considers this specification to have enough implementations, which have
+ received sufficient testing, to verify that it is correct. The details of
+ these criteria are dependent on the scale and complexity of the change over
+ previous versions: small, simple changes might only require one
+ implementation, but more complex changes require multiple independent
+ implementations, which have been rigorously tested for cross-compatibility.
+ Arm anticipates that future changes to this specification will be limited to
+ typographical corrections, clarifications and compatible extensions.
+
+**Beta**
+ Arm considers this specification to be complete, but existing
+ implementations do not meet the requirements for confidence in its release
+ quality. Arm may need to make incompatible changes if issues emerge from its
+ implementation.
+
+**Alpha**
+ The content of this specification is a draft, and Arm considers the
+ likelihood of future incompatible changes to be significant.
+
+All content in this document is at the **Alpha** quality level.
+
+Change History
+--------------
+
+If there is no entry in the change history table for a release, there are no
+changes to the content of the document for that release.
+
+.. class:: atomicsabi64-change-history
+
+.. table::
+
+ +---------+------------------------------+-------------------------------------------------------------------+
+ | Issue | Date | Change |
+ +=========+==============================+===================================================================+
+ | 00alp0 | 19\ :sup:`th` August 2024. | Alpha Release. |
+ +---------+------------------------------+-------------------------------------------------------------------+
+
+
+References
+----------
+
+This document refers to, or is referred to by, the following documents.
+
+.. table::
+
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+ | Ref | External reference or URL | Title |
+ +=============+==============================================================+=============================================================================+
+ | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+ | CSTD_ | ISO/IEC 9899:2018 | International Standard ISO/IEC 9899:2018 – Programming languages C. |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+ | AAELF64_ | ELF for the Arm 64-bit Architecture (AArch64) | ELF for the Arm 64-bit Architecture (AArch64) |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+ | CPPABI64_ | C++ ABI for the Arm 64-bit Architecture (AArch64) | C++ ABI for the Arm 64-bit Architecture (AArch64) |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+ | RATIONALE_ | Rationale Document for C11 Atomics ABI | Rationale Document for C11 Atomics ABI |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+ | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+
+
+.. raw:: pdf
+
+ PageBreak
+
+Terms and Abbreviations
+-----------------------
+
+The C/C++ Atomics ABI for the Arm 64-bit Architecture uses the following terms and
+abbreviations.
+
+AArch64
+ The 64-bit general-purpose register width state of the Armv8 architecture.
+
+ABI
+ Application Binary Interface:
+
+ 1. The specifications to which an executable must conform in order to
+ execute in a specific execution environment. For example, the
+ :title-reference:`Linux ABI for the Arm Architecture`.
+
+ 2. A particular aspect of the specifications to which independently
+ produced relocatable files must conform in order to be statically
+ linkable and executable. For example, the C++ ABI for the Arm 64-bit
+ Architecture [CPPABI64_], or ELF for the Arm Architecture [AAELF64_].
+
+Arm-based
+ ... based on the Arm architecture ...
+
+Thread
+ A unit of computation (e.g. a POSIX thread) of a process, managed by the OS.
+
+Atomic Operation
+ An indivisble operation on a memory location. This can be a load, store,
+ exchange, compare, or arithmetic operation. Atomics may be used to define
+ higher level primitives including locks and concurrent queues. ISO C/C++
+ defines a range of supported atomic types and operations.
+
+Concurrent Program
+ A C or C++ program that consists of one or more threads. Threads may
+ communicate with each other through memory locations, using both Atomic
+ Operations and standard memory accesses.
+
+Memory Order Parameter
+ The order of memory accesses as executed by each thread may not be the same
+ as the order they are written in the program. The Memory Order describes
+ how memory accesses are ordered with respect to other memory accesses or
+ Atomic Operations. ISO C/C++ defines a ``memory_order`` enum type for the set
+ of memory orders.
+
+Mapping
+ A mapping from an Atomic Operation to a sequence of AArch64 instructions.
+
+.. raw:: pdf
+
+ PageBreak
+
+Overview
+========
+
+`AArch64 atomic mappings`_ defines the mappings from C/C++ atomic operations
+to AArch64 that are interoperable.
+
+Arbitrary registers may be used in the mappings. Instructions marked with ``*``
+in the tables cannot use ``WZR`` or ``XZR`` as a destination register. This is
+further detailed in `Special Cases`_.
+
+Only some variants of ``fetch_`` are listed since the mappings are identical
+except for a different ````.
+
+Atomic operations and Memory Order are abbreviated as follows:
+
+.. table::
+
+ +----------------------------------------------------+--------------------------------------+
+ | Atomic Operation | Short form |
+ +====================================================+======================================+
+ | ``atomic_store_explicit(...)`` | ``store(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_load_explicit(...)`` | ``load(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_thread_fence(...)`` | ``fence(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_exchange_explicit(...)`` | ``exchange(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_fetch_add_explicit(...)`` | ``fetch_add(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_fetch_sub_explicit(...)`` | ``fetch_sub(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_fetch_or_explicit(...)`` | ``fetch_or(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_fetch_xor_explicit(...)`` | ``fetch_xor(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``atomic_fetch_and_explicit(...)`` | ``fetch_and(...)`` |
+ +----------------------------------------------------+--------------------------------------+
+
+.. table::
+
+ +----------------------------------------------------+--------------------------------------+
+ | Memory Order Parameter | Short form |
+ +====================================================+======================================+
+ | ``memory_order_relaxed`` | ``relaxed`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``memory_order_acquire`` | ``acquire`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``memory_order_release`` | ``release`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``memory_order_acq_rel`` | ``acq_rel`` |
+ +----------------------------------------------------+--------------------------------------+
+ | ``memory_order_seq_cst`` | ``seq_cst`` |
+ +----------------------------------------------------+--------------------------------------+
+
+If there are multiple mappings for an Atomic Operation, the rows of the table
+show the options:
+
+.. table::
+
+ +----------------------------------------------------+--------------------------------------+
+ | Atomic Operation | AArch64 |
+ +========================================+===========+======================================+
+ | ``store(loc,val,relaxed)`` | ARCH1 | ``option A`` |
+ + +-----------+--------------------------------------+
+ | | ARCH2 | ``option B`` |
+ +----------------------------------------+-----------+--------------------------------------+
+
+Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE.
+
+
+Suggestions and improvements to this specification may be submitted to the:
+`issue tracker page on GitHub `_.
+
+
+
+AArch64 atomic mappings
+=======================
+
+Synchronization Fences
+----------------------
+
+ +-----------------------------------------------------+--------------------------------------+
+ | Fence | AArch64 |
+ +=====================================================+======================================+
+ | ``atomic_thread_fence(relaxed)`` | .. code-block:: none |
+ | | |
+ | | NOP |
+ +-----------------------------------------------------+--------------------------------------+
+ | ``atomic_thread_fence(acquire)`` | .. code-block:: none |
+ | | |
+ | | DMB ISHLD |
+ +-----------------------------------------------------+--------------------------------------+
+ | ``atomic_thread_fence(release)`` | .. code-block:: none |
+ | | |
+ | ``atomic_thread_fence(acq_rel)`` | DMB ISH |
+ | | |
+ | ``atomic_thread_fence(seq_cst)`` | |
+ +-------------------------------------+---------------+--------------------------------------+
+
+32-bit types
+------------
+
+In what follows, register ``X1`` contains the location ``loc`` and ``W2``
+contains ``val``. ``W0`` contains input ``exp`` in compare-exchange. The result is
+returned in ``W0``.
+
+.. table::
+
+ +-----------------------------------------------------+--------------------------------------+
+ | Atomic Operation | AArch64 |
+ +=====================================================+======================================+
+ | ``store(loc,val,relaxed)`` | .. code-block:: none |
+ | | |
+ | | STR W2, [X1] |
+ +-----------------------------------------------------+--------------------------------------+
+ | ``store(loc,val,release)`` | .. code-block:: none |
+ | | |
+ | ``store(loc,val,seq_cst)`` | STLR W2, [X1] |
+ +-----------------------------------------------------+--------------------------------------+
+ | ``load(loc,relaxed)`` | .. code-block:: none |
+ | | |
+ | | LDR W2, [X1] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``load(loc,acquire)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | LDAR W2, [X1] |
+ + +---------------+--------------------------------------+
+ | | ``FEAT_RCPC`` | .. code-block:: none |
+ | | | |
+ | | | LDAPR W2, [X1] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``load(loc,seq_cst)`` | .. code-block:: none |
+ | | |
+ | | LDAR W2, [X1] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXR W0, [X1] |
+ | | | STXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | SWP W2, W0, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDAXR W0, [X1] |
+ | | | STXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | SWPA W2, W0, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXR W0, [X1] |
+ | | | STLXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | SWPL W2, W0, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none |
+ | ``exchange(loc,val,seq_cst)`` | | |
+ | | | loop: |
+ | | | LDAXR W0, [X1] |
+ | | | STLXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | SWAL W2, W0, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXR W0, [X1] |
+ | | | ADD W2, W2, W0 |
+ | | | STXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDADD W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDAXR W0, [X1] |
+ | | | ADD W2, W2, W0 |
+ | | | STXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDADDA W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXR W0, [X1] |
+ | | | ADD W2, W2, W0 |
+ | | | STLXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDADDL W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none |
+ | ``fetch_add(loc,val,seq_cst)`` | | |
+ | | | loop: |
+ | | | LDAXR W0, [X1] |
+ | | | ADD W2, W2, W0 |
+ | | | STLXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDADDAL W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,relaxed,relaxed)`` | | |
+ | | | MOV W4, W0 |
+ | | | loop: |
+ | | | LDXR W0, [X1] |
+ | | | CMP W0, W4 |
+ | | | B.NE fail |
+ | | | STXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | | | fail: |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CAS W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,acquire,acquire)`` | | |
+ | | | MOV W4, W0 |
+ | | | loop: |
+ | | | LDAXR W0, [X1] |
+ | | | CMP W0, W4 |
+ | | | B.NE fail |
+ | | | STXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | | | fail: |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASA W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,release,release)`` | | |
+ | | | MOV W4, W0 |
+ | | | loop: |
+ | | | LDXR W0, [X1] |
+ | | | CMP W0, W4 |
+ | | | B.NE fail |
+ | | | STLXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | | | fail: |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASL W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,acq_rel,acquire)`` | | |
+ | | | MOV W4, W0 |
+ | ``compare_exchange_strong(`` | | loop: |
+ | ``loc,exp,val,seq_cst,seq_cst)`` | | LDAXR W0, [X1] |
+ | | | CMP W0, W4 |
+ | | | B.NE fail |
+ | | | STLXR W3, W2, [X1] |
+ | | | CBNZ W3, loop |
+ | | | fail: |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASAL W0, W2, [X1] * |
+ +-------------------------------------+---------------+--------------------------------------+
+
+
+8-bit types
+-----------
+
+The mappings for 8-bit types are the same as 32-bit types except they use the
+``B`` variants of instructions.
+
+
+16-bit types
+------------
+
+The mappings for 16-bit types are the same as 32-bit types except they use the
+``H`` variants of instructions.
+
+64-bit types
+------------
+
+The mappings for 64-bit types are the same as 32-bit types except the registers
+used are X-registers.
+
+128-bit types
+-------------
+
+Since the access width of 128-bit types is double that of the 64-bit register
+width, the following mappings use *pair* instructions, which require their own
+table.
+
+In what follows, register ``X4`` contains the location ``loc``, ``X2`` and
+``X3`` contain the input value ``val``. ``X0`` and ``X1`` contain input ``exp`` in
+compare-exchange. The result is returned in ``X0`` and ``X1``.
+
+.. table::
+
+ +-----------------------------------------------------+--------------------------------------+
+ | Atomic Operation | AArch64 |
+ +=====================================+===============+======================================+
+ | ``store(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXP XZR, X1, [X4] |
+ | | | STXP W5, X2, X3, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | CASP X0, X1, X2, X3, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE2`` | .. code-block:: none |
+ | | | |
+ | | | STP X2, X3, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``store(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXP XZR, X1, [X4] |
+ | | | STLXP W5, X2, X3, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | CASPL X0, X1, X2, X3, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ + +---------------+--------------------------------------+
+ | | ``FEAT_LSE2`` | .. code-block:: none |
+ | | | |
+ | | | DMB ISH |
+ | | | STP X2, X3, [X4] |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LRCPC3``| .. code-block:: none |
+ | | | |
+ | | | STILP X2, X3, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``store(loc,val,seq_cst)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDAXP XZR, X1, [X4] |
+ | | | STLXP W5, X2, X3, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | CASPAL X0, X1, X2, X3, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ + +---------------+--------------------------------------+
+ | | ``FEAT_LSE2`` | .. code-block:: none |
+ | | | |
+ | | | DMB ISH |
+ | | | STP X2, X3, [X4] |
+ | | | DMB ISH |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LRCPC3``| .. code-block:: none |
+ | | | |
+ | | | STILP x2, X3, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``load(loc,relaxed)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXP X0, X1, [X4] |
+ | | | STXP W5, X0, X1, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASP X0, X1, X0, X1, [X4] |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE2`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``load(loc,acquire)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDAXP X0, X1, [X4] |
+ | | | STXP W5, X0, X1, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASPA X0, X1, X0, X1, [X4] |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE2`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | DMB ISHLD |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LRCPC3``| .. code-block:: none |
+ | | | |
+ | | | LDIAPP X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``load(loc,seq_cst)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDAXP X0, X1, [X4] |
+ | | | STXP W5, X0, X1, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASPA X0, X1, X0, X1, [X4] |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE2`` | .. code-block:: none |
+ | | | |
+ | | | LDAR X5, [X4] |
+ | | | LDP X0, X1, [X4] |
+ | | | DMB ISHLD |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LRCPC3``| .. code-block:: none |
+ | | | |
+ | | | LDAR X5, [X4] |
+ | | | LDIAPP X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXP X0, X1, [X4] |
+ | | | STXP W5, X2, X3, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | CASP X0, X1, X2, X3, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | SWPP X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDAXP X0, X1, [X4] |
+ | | | STXP W5, X2, X3, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | CASPA X0, X1, X2, X3, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | SWPPA X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXP X0, X1, [X4] |
+ | | | STLXP W5, X2, X3, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | CASPL X0, X1, X2, X3, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | SWPPL X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``exchange(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | ``exchange(loc,val,seq_cst)`` | | loop: |
+ | | | LDAXP X0, X1, [X4] |
+ | | | STLXP W5, X2, X3, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | CASPAL X0, X1, X2, X3, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ | +---------------+--------------------------------------+
+ | |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | SWPPAL X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXP X0, X1, [X4] |
+ | | | ADDS X0, X0, X2 |
+ | | | ADC X1, X1, X3 |
+ | | | STXP W5, X0, X1, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | ADDS X8, X0, X2 |
+ | | | ADC X9, X1, X3 |
+ | | | CASP X0, X1, X8, X9, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDAXP X0, X1, [X4] |
+ | | | ADDS X0, X0, X2 |
+ | | | ADC X1, X1, X3 |
+ | | | STXP W5, X0, X1, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | ADDS X8, X0, X2 |
+ | | | ADC X9, X1, X3 |
+ | | | CASPA X0, X1, X8, X9, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | | | loop: |
+ | | | LDXP X0, X1, [X4] |
+ | | | ADDS X0, X0, X2 |
+ | | | ADC X1, X1, X3 |
+ | | | STLXP W5, X0, X1, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | ADDS X8, X0, X2 |
+ | | | ADC X9, X1, X3 |
+ | | | CASPL X0, X1, X8, X9, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_add(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none |
+ | | | |
+ | ``fetch_add(loc,val,seq_cst)`` | | loop: |
+ | | | LDAXP X0, X1, [X4] |
+ | | | ADDS X0, X0, X2 |
+ | | | ADC X1, X1, X3 |
+ | | | STLXP W5, X0, X1, [X4] |
+ | | | CBNZ W5, loop |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | LDP X0, X1, [X4] |
+ | | | loop: |
+ | | | MOV X6, X0 |
+ | | | MOV X7, X1 |
+ | | | ADDS X8, X0, X2 |
+ | | | ADC X9, X1, X3 |
+ | | | CASPAL X0, X1, X8, X9, [X4] |
+ | | | CMP X0, X6 |
+ | | | CCMP X1, X7, 0, EQ |
+ | | | B.NE loop |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_or(loc,val,relaxed)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | LDSETP X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_or(loc,val,acquire)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | LDSETPA X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_or(loc,val,release)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | LDSETPL X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_or(loc,val,acq_rel)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | ``fetch_or(loc,val,seq_cst)`` | | MOV X0, X2 |
+ | | | MOV X1, X3 |
+ | | | LDSETPAL X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_and(loc,val,relaxed)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MVN X0, X2 |
+ | | | MVN X1, X3 |
+ | | | LDCLRP X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_and(loc,val,acquire)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MVN X0, X2 |
+ | | | MNV X1, X3 |
+ | | | LDCLRPA X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_and(loc,val,release)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | | | MVN X0, X2 |
+ | | | MVN X1, X3 |
+ | | | LDCLRPL X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``fetch_and(loc,val,acq_rel)`` |``FEAT_LSE128``| .. code-block:: none |
+ | | | |
+ | ``fetch_and(loc,val,seq_cst)`` | | MVN X0, X2 |
+ | | | MVN X1, X3 |
+ | | | LDCLRPAL X0, X1, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,relaxed,relaxed)`` | | |
+ | | | loop: |
+ | | | LDXP X6, X7, [X4] |
+ | | | CMP X6, X0 |
+ | | | CCMP X7, X1, 0, EQ |
+ | | | CSEL X8, X2, X6, EQ |
+ | | | CSEL X9, X3, X7, EQ |
+ | | | STXP W5, X8, X9, [X4] |
+ | | | CBNZ W5, loop |
+ | | | MOV X0, X6 |
+ | | | MOV X1, X7 |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASP X0, X1, X2, X3, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,acquire,acquire)`` | | |
+ | | | loop: |
+ | ``compare_exchange_strong(`` | | LDAXP X6, X7, [X4] |
+ | ``loc,exp,val,acquire,relaxed)`` | | CMP X6, X0 |
+ | | | CCMP X7, X1, 0, EQ |
+ | | | CSEL X8, X2, X6, EQ |
+ | | | CSEL X9, X3, X7, EQ |
+ | | | STXP W5, X8, X9, [X4] |
+ | | | CBNZ W5, loop |
+ | | | MOV X0, X6 |
+ | | | MOV X1, X7 |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASPA X0, X1, X2, X3, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,release,relaxed)`` | | |
+ | | | loop: |
+ | | | LDXP X6, X7, [X4] |
+ | | | CMP X6, X0 |
+ | | | CCMP X7, X1, 0, EQ |
+ | | | CSEL X8, X2, X6, EQ |
+ | | | CSEL X9, X3, X7, EQ |
+ | | | STLXP W5, X8, X9, [X4] |
+ | | | CBNZ W5, loop |
+ | | | MOV X0, X6 |
+ | | | MOV X1, X7 |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASPL X0, X1, X2, X3, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none |
+ | ``loc,exp,val,acq_rel,acquire)`` | | |
+ | | | loop: |
+ | ``compare_exchange_strong(`` | | LDAXP X6, X7, [X4] |
+ | ``loc,exp,val,seq_cst,acquire)`` | | CMP X6, X0 |
+ | | | CCMP X7, X1, 0, EQ |
+ | | | CSEL X8, X2, X6, EQ |
+ | | | CSEL X9, X3, X7, EQ |
+ | | | STLXP W5, X8, X9, [X4] |
+ | | | CBNZ W5, loop |
+ | | | MOV X0, X6 |
+ | | | MOV X1, X7 |
+ | +---------------+--------------------------------------+
+ | | ``FEAT_LSE`` | .. code-block:: none |
+ | | | |
+ | | | CASPAL X0, X1, X2, X3, [X4] |
+ +-------------------------------------+---------------+--------------------------------------+
+
+
+
+Special Cases
+=============
+
+Unused result in Read-Modify-Write atomics
+------------------------------------------
+
+``CAS``, ``SWP`` and ``LD`` instructions must not use the zero register if
+the result is not used since it allows reordering of the read past a
+``DMB ISHLD`` barrier. Affected instructions are marked with ``*``.
+
+Const-Qualified 128-bit Atomic Loads
+------------------------------------
+
+Const-qualified data containing 128-bit atomic types should not be placed
+in read-only memory (such as the ``.rodata`` section).
+
+Before FEAT_LSE2, the only way to implement a single-copy 128-bit atomic load
+is by using a Read-Modify-Write sequence. The write is not visible to
+software if the memory is writeable. Compilers and runtimes should prefer the
+FEAT_LSE2/FEAT_LRCPC3 sequence when available.
+
diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
new file mode 100644
index 0000000..2f12ecd
--- /dev/null
+++ b/design-documents/atomics-ABI.rst
@@ -0,0 +1,321 @@
+..
+ Copyright (c) 2023, Arm Limited and its affiliates. All rights reserved.
+ CC-BY-SA-4.0 AND Apache-Patent-License
+ See LICENSE file for details
+
+.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest
+.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836
+.. _ATOMICS64: https://github.com/ARM-software/abi-aa/atomicsabi64/atomicsabi64.rst
+
+Rationale Document for C11 Atomics ABI.
+***************************************
+
+Scope
+=====
+
+This document contains the design rationale for C/C++ Atomics Application
+Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture
+defined in ATOMICS64_. Nothing in this document
+is part of the specification. The purpose is to record the rationale
+for the specification as well as alternatives that were considered.
+Any contradictions between this rationale and the specification shall
+be resolved in favor of the specification.
+
+This document assumes that the reader is familiar with ATOMICS64_
+and the 32-bit build attributes defined in ATOMICS64_ and will use
+concepts defined in these documents.
+
+Preamble
+========
+
+Background
+----------
+
+This document describes the rationale behind the ABI choices made for mapping
+from C11 atomic operations to Arm AArch64 assembly sequences.
+
+From the perspective of the Arm ABI we have some decisions to
+make:
+
+- We need to choose a baseline ABI (a set of mappings), that is compatible for all versions of the Armv8 architecture.
+- The mappings should cover atomic accesses of various sign, size, and type accessible through C11 atomic operations using compiler profiles.
+
+We have identified the following trade-offs:
+
+- Performance of different mappings versus compatibility with all architectures.
+- Whether certain compiler operations lead to unexpected behaviours.
+
+The use cases expanded upon below motivate why we need an atomics abi:
+
+- The need for a baseline ABI.
+- Knowing when an implementation departs from that baseline.
+- Backwards compatibility of atomics as new mappings are added.
+- Compatibility between compilers and runtimes.
+- The need to constrain optimisations on specific atomic operations.
+- Documenting the interoperable mappings.
+- providing a basis upon which ABI compatibility can be tested.
+
+References
+----------
+
+This document refers to, or is referred to by, the following documents.
+
+.. table::
+
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
+ | Ref | External reference or URL | Title |
+ +=============+==============================================================+===============================================================================================+
+ | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
+ | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
+ | ATOMICS64_ | Atomics ABI | C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture |
+ +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
+
+
+
+Note: At the time of writing C23 is not released, as such ISO C17 is considered
+the latest published document.
+
+Known use-cases
+---------------
+
+
+A Baseline: Describing current implementations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ABI we provide is a baseline specification that compilers should implement.
+Compilers that implement the baseline specification are compatible across all versions
+of the Armv8 architecture. Most of the mappings in the ABI are already implemented in
+LLVM and GCC and this ABI ratifies a decade of established practice, and provides
+alternatives where the current practice is incompatible.
+
+
+Sub-ABIs and ABI-islands: Departing from the baseline (or 'mainland')
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We do *not* require that compilers implement this ABI. Implementers can specify their own
+ABI, whether it is a subset of the allowed mappings of the baseline ABI (a sub-ABI), or
+uses different mappings altogether (an ABI-island). Currently, sub-ABIs and ABI-islands implicitly
+arise with each new architecture release, and implementers quickly find new candidate mappings
+that are performant on their machines. Such mappings are proposed or added to mainstream
+compilers. However due to the lack of a baseline specification or widespread
+concurrency expertise, testing such mappings has been a challenge and concurrency bugs have been
+unintentionally introduced into compilers when new mappings are added.
+
+We need a baseline ABI in order to determine if a given sub-ABI respects or departs
+from the baseline. Adding command-line options is a logical consequence of defining such an ABI,
+and makes it possible to track ABI compatibility of concurrent programs at compile or link-time,
+rather than runtime. It is the responsibility of the sub-ABI user to ensure code built
+under their ABI does not mix with code built under the baseline. But a baseline must exist
+for sub-ABI compatibility to be decided in the first place.
+
+Where a compiler implementation departs from the baseline completely (an ABI-island),
+Arm cannot provide any statement on the compatibility of the extensions with respect
+to the baseline specification. In the ABI-island, which could be a known incompatibility
+with the base-line then users should not mix ABIs. It is QoI whether a toolchain is
+able to diagnose incompatibility.
+
+Further, numerous parties have asked the ABI team whether the same atomics mapping is correct.
+Writing down the known cases helps engineers answer these queries without the concurrency
+expertise required to come up with current compatible mappings. A future section of this document
+could document common queries received by the ABI team, in order to assist implementers and
+engineers with such issues.
+
+Backwards Compatibility and New Architecture Features
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Put another way, A baseline ABI helps with the decisions of compatibility of new mappings.
+Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different
+single-copy atomicity guarantees with respect to different architecture versions. A baseline
+decides which assembly sequences can be composed correctly (at least as far as testing can decide).
+
+
+Compatibility Between Compilers and Runtimes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The above issues also apply when ensuring object files compiled with different compilers can be mixed.
+For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of
+places where this does not apply, both when compiling to target the same architecture version, and when mixing
+different (compatible) architecture versions. Further, the above issues are not limited to statically compiled
+code. We found one instance where proposed mappings implemented in a JiT compiler would not be interoperable
+with statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and
+is not subject to an ABI, it may still depend on other libraries or components that do have an ABI.
+
+
+Constrain optimisations
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The frequency of this behaviour justifies collecting these cases together to outline why they should not occur.
+For example, consider the following Concurrent Program::
+
+ // Shared-Memory Locations
+ _Atomic int* x;
+ _Atomic int* y;
+
+ // Memory Order Parameter
+ #define relaxed memory_order_relaxed
+ #define release memory_order_release
+ #define acquire memory_order_acquire
+
+ // Threads of Execution
+ void thread_0 () {
+ atomic_store_explicit(x,1,relaxed);
+ atomic_thread_fence(release);
+ atomic_store_explicit(y,1,relaxed);
+ }
+
+ void thread_1 () {
+ atomic_exchange_explicit(y,2,release);
+ atomic_thread_fence(acquire);
+ int r0 = atomic_load_explicit(x,relaxed);
+ }
+
+
+Under ISO C, the above Concurrent Program finishes execution in one of three
+possible outcomes (a reference for this notation is found here [PAPER_])::
+
+ { thread_1:r0=0; y=1; }
+ { thread_1:r0=1; y=1; }
+ { thread_1:r0=1; y=2; }
+
+In this case the value read by the exchange on ``thread_1`` is not used, and a
+compiler is free to remove references to unused data. It is not legal according
+to this ABI for a compliant implementation to translate the program into
+the following Assembly Sequences::
+
+ thread_0:
+ MOV W9,#1
+ STR W9,[X2]
+ DMB ISH
+ STR W3,[X4]
+
+ thread_1:
+ MOV W9,#2
+ SWP W9, WZR, [X2]
+ DMB ISHLD
+ LDR W3,[X4]
+
+where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains
+the address of ``y``, ``thread_1:X2`` contains the address of ``y``, and
+``thread_1:X4`` contains the address of ``x``.
+
+The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly
+Instruction, where its destination register is the zero register ``WZR``. The
+``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly
+Instruction.
+
+Executing the compiled program on an Arm-based machine from a fixed initial
+state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
+according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
+Reference Manual [ARMARM_]::
+
+ { thread_1:r0=0; [y]=1; }
+ { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug!
+ { thread_1:r0=1; [y]=1; }
+ { thread_1:r0=1; [y]=2; }
+
+By comparing ``W3`` and the local variable ``r0`` of the original Concurrent
+Program we see there is one additional outcome of executing the compiled
+program that is not an outcome of executing the Concurrent Program. This is
+because the Arm Architecture Reference Manual [ARMARM_] states that
+*instructions where the destination register is WZR or XZR, are not regarded
+as doing a read for the purpose of a DMB LD barrier.*
+
+In this case the compiler introduces another outcome of Execution. To fix this
+issue, a compiler is not permitted to rewrite the destination register to be the
+zero register::
+
+ thread_0:
+ MOV W9,#1
+ STR W9,[X2]
+ DMB ISH
+ STR W3,[X4]
+
+ thread_1:
+ MOV W9,#2
+ SWP W9, W10, [X2]
+ DMB ISHLD
+ LDR W3,[X4]
+
+Executing the compiled program on an Arm-based machine from a fixed initial
+state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
+according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
+Reference Manual [ARMARM_]::
+
+ { thread_1:r0=0; [y]=1; }
+ { thread_1:r0=1; [y]=1; }
+ { thread_1:r0=1; [y]=2; }
+
+As such the unexpected outcome has disappeared. There are multiple Mappings
+that exhibit this behaviour. Assembly Sequences affected make use of ``SWP``
+and ``LD`` Assembly instructions.
+
+Documentation
+~~~~~~~~~~~~~
+
+The collective knowledge of atomics ABIs exists as numerous online discusions.
+These discussions are neither authoritative nor persistent. Some discussions
+are now inaccessible and others are out of date. This is problematic given the
+inherent complexity of relaxed memory concurrency, the difficulty of finding bugs,
+and the possibility of user error. We believe an ABI is necessary to document
+this corner of code generation.
+
+
+The Mix Testing Process
+-----------------------
+
+ABI compatibility must be testable. Concurrency is not trivial, and the ABI
+presents a simplification of part of the problem that is understandable by
+engineers. We provide a simple technique for testing ABI compatibility.
+This technique reduces the difficulty of checking compatibility from a
+problem of understanding concurrent executions, to the familiar testing
+domain of comparing program outcomes of tests. This document does not
+preclude other means of testing compatibility.
+
+We test for Compiler bugs. A Compiler Bug is defined as an outcome of a
+compiled program execution (under the AArch64 Memory Model contained in
+§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not
+an outcome of execution of the source Concurrent Program (under the
+ISO C memory model). Consider the hypothetical example where a source
+Concurrent Program finishes execution in one of three possible outcomes
+(a reference for this notation is found here [PAPER_])::
+
+ { thread_0:r0=0, thread_1:r0=1 }
+ { thread_0:r0=1, thread_1:r0=0 }
+ { thread_0:r0=1, thread_1:r0=1 }
+
+and one compiled program execution run has the following possible outcomes
+according to the AArch64 Memory Model contained in §B2 of the Arm
+Architecture Reference Manual [ARMARM_]::
+
+ { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug!
+ { thread_0:X3=0, thread_1:X3=1 }
+ { thread_0:X3=1, thread_1:X3=0 }
+ { thread_0:X3=1, thread_1:X3=1 }
+
+By comparing ``X3`` and the local variable ``r0`` of the original Concurrent
+Program in this example we see there is one additional outcome of executing the
+compiled program that is not an outcome of executing the source program (under
+the respective models). This suggests the Mappings under question are
+incompatible, and a compiler that implements them exhibits a Compiler Bug. To
+ensure compatibility we therefore test for the absence of such outcomes of the
+compiled programs when mixing all combinations of the above Mappings. We define
+the *Mix Testing* process as follows:
+
+#. Take an arbitrary Concurrent Program. When executed on the C/C++ memory
+ model, it will produce outcomes *S*.
+#. Split out the individual Atomic Operations from the initial concurrent
+ program into individual source files.
+#. Compile each individual source file containing an Atomic Operation
+ using each Compiler Profile under test that generates Assembly Sequences
+ under a given Mapping.
+#. Combine the Assembly Sequences from above into *multiple* possible Compiled
+ Programs.
+#. Compute the outcomes of each compiled program under the AArch64 Memory Model
+ contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a
+ *set* of compiled program outcomes *C*.
+#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug
+ (Check that *c* is a subset of *S*), the given Mappings are not
+ interoperable.
+
diff --git a/tools/common/check-rst-syntax.sh b/tools/common/check-rst-syntax.sh
index cd99217..842bec5 100755
--- a/tools/common/check-rst-syntax.sh
+++ b/tools/common/check-rst-syntax.sh
@@ -38,6 +38,9 @@ declare -a docs=(
# semihosting
"semihosting"
+
+ # atomics
+ "atomicsabi64"
)
for doc in "${docs[@]}"; do
diff --git a/tools/common/generate-release-links.sh b/tools/common/generate-release-links.sh
index db00887..2774e78 100755
--- a/tools/common/generate-release-links.sh
+++ b/tools/common/generate-release-links.sh
@@ -57,6 +57,7 @@ cat <