KhronosGroup · keryell · Apr 27, 2023 · Apr 24, 2023 · Apr 25, 2023 · Apr 25, 2023
diff --git a/adoc/chapters/architecture.adoc b/adoc/chapters/architecture.adoc
@@ -599,7 +599,7 @@ a [code]#range<N># representing the number of work-items executing the kernel.
 
 ==== ND-range kernels
 
-Work-items can be organized into <<work-group,work groups>>, providing a more
+Work-items can be organized into <<work-group,work-groups>>, providing a more
 coarse-grained decomposition of the index space.  Each work-group is assigned a
 unique <<work-group-id>> with the same dimensionality as the index space used for
 the work-items. Work-items are each assigned a <<local-id>>, unique within the
@@ -915,7 +915,7 @@ Work-items executing in a kernel have access to three distinct address spaces
     of correct results unless <<mem-fence>> and atomic operations are used.
   * <<local-memory,Local-memory>> is accessible to all work-items in a single
     work-group. Attempting to access local memory in one work-group from
-    another work group results in undefined behavior. This memory region can be
+    another work-group results in undefined behavior. This memory region can be
     used to allocate variables that are shared by all work-items in a
     work-group. Work-group-level visibility allows local memory to be
     implemented as dedicated regions of the device memory where this is
@@ -1055,7 +1055,7 @@ device kernel results in undefined behavior. Developers are encouraged to use
 include::{header_dir}/memoryScope.h[lines=4..-1]
 ----
 
-The set of <<work-item,work items>> and devices to which the memory ordering
+The set of <<work-item,work-items>> and devices to which the memory ordering
 constraints of a given atomic operation apply is controlled by a
 [code]#sycl::memory_scope# parameter, which can take one of the following
 values:
@@ -1192,8 +1192,8 @@ sections of this specification.
 
 === Basic data parallel kernels
 
-Data-parallel <<kernel>>s that execute as
-multiple <<work-item>>s and where no local synchronization is required are enqueued
+Data-parallel <<kernel,kernels>> that execute as
+multiple <<work-item,work-items>> and where no local synchronization is required are enqueued
 with the [code]#sycl::parallel_for# function parameterized by a
 [code]#sycl::range# parameter. These kernels will execute the kernel
 function body once for each work-item in the specified <<range>>.

diff --git a/adoc/chapters/device_compiler.adoc b/adoc/chapters/device_compiler.adoc
@@ -520,11 +520,11 @@ reqd_work_group_size(dim0, dim1, dim2)
 Kernels that are decorated with this attribute may not call functions that are
 defined in another translation unit via the [code]#SYCL_EXTERNAL# macro.
 
-Each device may have limitations on the work group sizes that it supports.  If
+Each device may have limitations on the work-group sizes that it supports.  If
 a kernel is decorated with this attribute and then submitted to a device that
-does not support the work group size, the implementation must throw a
+does not support the work-group size, the implementation must throw a
 synchronous [code]#exception# with the [code]#errc::kernel_not_supported# error
-code.  If the kernel is submitted to a device that does support the work group
+code.  If the kernel is submitted to a device that does support the work-group
 size, but the application provides an [code]#nd_range# that does not match the
 size from the attribute, then the implementation must throw a synchronous
 [code]#exception# with the [code]#errc::nd_range# error code.

diff --git a/adoc/chapters/glossary.adoc b/adoc/chapters/glossary.adoc
@@ -199,21 +199,21 @@ object. For the full description please refer to <<subsec:buffers>>.
 
 [[global-id]]global id::
     As in OpenCL, a global ID is used to uniquely identify a <<work-item>>
-    and is derived from the number of global <<work-item,work items>> specified
+    and is derived from the number of global <<work-item,work-items>> specified
     when executing a kernel. A global ID is a one, two or three-dimensional
     value that starts at 0 per dimension.
 
 [[global-memory]]global memory::
-    Global memory is a memory region accessible to all <<work-item,work items>>
+    Global memory is a memory region accessible to all <<work-item,work-items>>
     executing on a <<device>>.
 
 [[group]]group::
     A group of work-items within the index space of a SYCL kernel execution,
     such as a <<work-group>> or <<sub-group>>.
 
 [[group-barrier]]group barrier::
-    A synchronization function within a group of <<work-item,work items>>. All the
-    <<work-item,work items>> of a group must execute the barrier construct before any
+    A synchronization function within a group of <<work-item,work-items>>. All the
+    <<work-item,work-items>> of a group must execute the barrier construct before any
     <<work-item>> continues execution beyond the barrier. Additionally all work-items
     in the group execute a release <<mem-fence>> prior to synchronizing at the
     barrier, all work-items in the group execute an acquire <<mem-fence>> after
@@ -349,16 +349,16 @@ object. For the full description please refer to <<subsec:buffers>>.
 [[nd-item]]nd-item::
     A unique identifier representing a single <<work-item>> and
     <<work-group>> within the index space of a SYCL kernel execution. Can
-    be one, two or three dimensional. In the SYCL interface a <<nd-item>>
+    be one, two or three dimensional. In the SYCL interface an <<nd-item>>
     is represented by the [code]#nd_item# class (see
     <<nditem-class>>).
 
 [[nd-range]]nd-range::
     A representation of the index space of a SYCL kernel execution, the
-    distribution of <<work-item,work items>> within into <<work-group,work groups>>.
+    distribution of <<work-item,work-items>> within into <<work-group,work-groups>>.
     Contains a <<range>> specifying the number of global
-    <<work-item,work items>>, a <<range>> specifying the number of local
-    <<work-item,work items>> and a <<id>> specifying the global offset. Can be
+    <<work-item,work-items>>, a <<range>> specifying the number of local
+    <<work-item,work-items>> and a <<id>> specifying the global offset. Can be
     one, two or three dimensional. The minimum size of <<range>>
     within the <<nd-range>> is 0 per dimension; where any dimension is set to zero,
     the index space in all dimensions will be zero.
@@ -399,10 +399,11 @@ object. For the full description please refer to <<subsec:buffers>>.
     please refer to <<sec:interface.queue.class>>.
 
 [[range]]range::
-    A representation of a number of <<work-item,work items>> or <<work-group>>
-    within the index space of a SYCL kernel execution. Can be one, two or
-    three dimensional. In the SYCL interface a <<work-group>> is
-    represented by the [code]#group# class (see <<group-class>>).
+    A representation of a number of <<work-item,work-items>> or
+    <<work-group,work-groups>> within the index space of a SYCL kernel
+    execution. Can be one, two or three dimensional. In the SYCL interface a
+    <<range>> is represented by the [code]#range# class
+    (see <<range-class>>).
 
 [[ranged-accessor]]ranged accessor::
     A ranged accessor is a host or buffer <<accessor>> that was constructed
@@ -477,10 +478,10 @@ object. For the full description please refer to <<subsec:buffers>>.
     see <<sub-group-class>>.
 
 [[sub-group-barrier]]sub-group barrier::
-    A <<group-barrier>> for all <<work-item,work items>> in a <<sub-group>>.
+    A <<group-barrier>> for all <<work-item,work-items>> in a <<sub-group>>.
 
 [[sub-group-mem-fence]]sub-group mem-fence::
-    A <<mem-fence>> for all <<work-item,work items>> in a <<sub-group>>.
+    A <<mem-fence>> for all <<work-item,work-items>> in a <<sub-group>>.
 
 [[sycl-application]]SYCL application::
     A SYCL application is a {cpp} application which uses the SYCL programming
@@ -542,20 +543,20 @@ See <<sec:usm>>
 
 [[work-group]]work-group::
     The SYCL work-group ([code]#sycl::group# class) is a representation
-    of a collection of related <<work-item,work items>> that execute on a single
-    compute unit. The <<work-item,work items>> in the group execute the same
+    of a collection of related <<work-item,work-items>> that execute on a single
+    compute unit. The <<work-item,work-items>> in the group execute the same
     kernel-instance and <<opencl12, share local memory and work-group functions>>.
     For further details for the [code]#sycl::group#
     class see <<group-class>>.
 
 [[work-group-barrier]]work-group barrier::
-    A <<group-barrier>> for all <<work-item,work items>> in a <<work-group>>.
+    A <<group-barrier>> for all <<work-item,work-items>> in a <<work-group>>.
 
 [[work-group-mem-fence]]work-group mem-fence::
-    A <<mem-fence>> for all <<work-item,work items>> in a <<work-group>>.
+    A <<mem-fence>> for all <<work-item,work-items>> in a <<work-group>>.
 
 [[work-group-id]]work-group id::
-    As in OpenCL, SYCL kernels execute in <<work-group,work groups>>. The group ID
+    As in OpenCL, SYCL kernels execute in <<work-group,work-groups>>. The group ID
     is the ID of the <<work-group>> that a <<work-item>> is executing
     within. A group ID is an one, two or three dimensional value that starts
     at 0 per dimension.
@@ -568,7 +569,7 @@ See <<sec:usm>>
     collection of parallel executions of a kernel invoked on a <<device>>
     by a <<command>>. A <<opencl12, work-item>> is executed by one or more processing
     elements as part of a <<work-group>> executing on a compute unit. A
-    <<work-item>> is distinguished from other <<work-item,work items>> by its
+    <<work-item>> is distinguished from other <<work-item,work-items>> by its
     <<global-id>> or the combination of its <<work-group-id>> and its
     <<local-id>> within a <<work-group>>.
 

diff --git a/adoc/chapters/opencl_backend.adoc b/adoc/chapters/opencl_backend.adoc
@@ -1142,7 +1142,7 @@ in this section is non-normative.
 
 The OpenCL 1.2 specification document <<opencl12, ch. 6.12.1 in Table 6.7>>
 defines work-item functions that tell various information about the currently
-executing work item in an OpenCL kernel.  SYCL provides equivalent
+executing work-item in an OpenCL kernel.  SYCL provides equivalent
 functionality through the item and group classes that are defined in
 <<subsec:item.class>>, <<nditem-class>> and <<group-class>>.
 

diff --git a/adoc/chapters/programming_interface.adoc b/adoc/chapters/programming_interface.adoc
@@ -8079,7 +8079,7 @@ template parameters are the same.
 The [code]#local_accessor# class allocates device local memory and provides
 access to this memory from within a <<sycl-kernel-function>>.  The
 <<local-memory>> that is allocated is shared between all
-<<work-item,work items>> of a <<work-group>>.  If multiple work-groups execute
+<<work-item,work-items>> of a <<work-group>>.  If multiple work-groups execute
 simultaneously in an implementation, each work-group receives its own
 independent copy of the allocated local memory.
 
@@ -11818,14 +11818,14 @@ a@
 ----
 range<Dimensions> get_group_range() const
 ----
-   a@ Returns the number of <<work-group,work groups>> in the iteration space.
+   a@ Returns the number of <<work-group,work-groups>> in the iteration space.
 
 a@
 [source]
 ----
 size_t get_group_range(int dimension) const
 ----
-   a@ Return the number of <<work-group,work groups>> for [code]#Dimension# in the
+   a@ Return the number of <<work-group,work-groups>> for [code]#Dimension# in the
       iteration space.
 
 a@
@@ -12230,7 +12230,7 @@ a@
 ----
 range<Dimensions> get_group_range() const
 ----
-   a@ Return a [code]#range# representing the number of <<work-group,work groups>> in the [code]#nd_range#.
+   a@ Return a [code]#range# representing the number of <<work-group,work-groups>> in the [code]#nd_range#.
 
 a@
 [source]
@@ -12268,7 +12268,7 @@ a@
 ----
 size_t get_group_linear_range() const
 ----
-   a@ Return the total number of <<work-group>>s in the [code]#nd_range#.
+   a@ Return the total number of <<work-group,work-groups>> in the [code]#nd_range#.
 
 a@
 [source]
@@ -13287,7 +13287,7 @@ by each event in [code]#depEvents# must complete before executing this
 
 <<kernel,Kernels>> can be invoked as [keyword]#single tasks#, basic
 [keyword]#data-parallel# <<kernel,kernels>>, <<nd-range>> in
-<<work-group,work groups>>, or [keyword]#hierarchical parallelism#.
+<<work-group,work-groups>>, or [keyword]#hierarchical parallelism#.
 
 Each function takes an optional kernel name template parameter. The user
 may optionally provide a <<kernel-name>>, otherwise an implementation-defined name
@@ -13712,7 +13712,7 @@ within the range specified by the [code]#nd_range# parameter. The
 [code]#nd_item# parameter makes all information about the work-item and
 its position in the range available, and provides access to functions
 enabling the use of a <<work-group-barrier>> to synchronize between the
-<<work-item>>s in the <<work-group>>.
+<<work-item,work-items>> in the <<work-group>>.
 
 [NOTE]
 ====
@@ -13765,7 +13765,7 @@ include::{code_dir}/parallelForWithKernelHandler.cpp[lines=4..-1]
 The hierarchical parallel kernel execution interface provides the same
 functionality as is available from the <<nd-range>> interface, but
 exposed differently. To execute the same sixty-four work-items in
-sixteen work-groups that we saw in the previous example, we execute an
+eight work-groups that we saw in a previous example, we execute an
 outer [code]#parallel_for_work_group# call to create the
 groups. The member function
 [code]#handler::parallel_for_work_group# is parameterized by the
@@ -13781,7 +13781,7 @@ heuristic suggests that it is more efficient to do so, this code will be
 executed for each work-item.
 
 Within this region any variable declared will have the semantics of
-<<local-memory>>, shared between all <<work-item,work items>> in the
+<<local-memory>>, shared between all <<work-item,work-items>> in the
 <<work-group>>. If the
 device compiler can prove that an array of such variables is accessed only by
 a single work-item throughout the lifetime of the work-group, for
@@ -13814,7 +13814,7 @@ a@
 ----
 private_memory(const group<Dimensions>&)
 ----
-   a@ Place an object of type [code]#T# in the underlying private memory of each <<work-item,work items>>.
+   a@ Place an object of type [code]#T# in the underlying private memory of each <<work-item,work-items>>.
       The type [code]#T# must be default constructible.
       The underlying constructor will be called for each <<work-item>>.
 
@@ -13830,7 +13830,7 @@ a@
 ----
 T& operator()(const h_item<Dimensions>& id)
 ----
-   a@ Retrieve a reference to the object for the <<work-item,work items>>.
+   a@ Retrieve a reference to the object for the <<work-item,work-items>>.
 
 |====
 
@@ -16030,7 +16030,7 @@ _RandomFiller_, which initializes a buffer with a random number. The
 random number is generated during the construction of the function object
 while processing the command group. The [code]#operator()# member
 function of the function object receives an [code]#item# object. This
-member function will be called for each work item of the execution range. The value
+member function will be called for each work-item of the execution range. The value
 of the random number will be assigned to each element of the buffer. In this
 case, the accessor and the scalar random number are members of the function
 object and therefore will be arguments to the device kernel. Usual
@@ -19159,13 +19159,13 @@ There are two limits that are relevant for the [code]#stream# class.  The
 [code]#totalBufferSize# limit specifies the maximum size of the overall
 character stream that can be output during a kernel invocation, and the
 [code]#workItemBufferSize# limit specifies the maximum size of the
-character stream that can be output within a work item before a flush must be
+character stream that can be output within a work-item before a flush must be
 performed.  Both of these limits are specified in bytes.  The
 [code]#totalBufferSize# limit must be sufficient to contain the characters
 output by all stream statements during execution of a kernel invocation (the
-aggregate of outputs from all work items), and the
+aggregate of outputs from all work-items), and the
 [code]#workItemBufferSize# limit must be sufficient to contain the
-characters output within a work item between stream flush operations.
+characters output within a work-item between stream flush operations.
 
 If the [code]#totalBufferSize# or [code]#workItemBufferSize#
 limits are exceeded, it is implementation-defined whether the streamed
@@ -19271,10 +19271,10 @@ a@
 ----
 flush
 ----
-   a@ Triggers a flush operation, which synchronizes the work item stream buffer
-      with the global stream buffer, and then empties the work item stream
+   a@ Triggers a flush operation, which synchronizes the work-item stream buffer
+      with the global stream buffer, and then empties the work-item stream
       buffer. After a flush, the full [code]#workItemBufferSize# is
-      available again for subsequent streaming within the work item.
+      available again for subsequent streaming within the work-item.
 
 a@
 [source]
@@ -19392,7 +19392,7 @@ stream(size_t totalBufferSize, size_t workItemBufferSize, handler& cgh,
    a@ Constructs a SYCL [code]#stream# instance associated with the command group
       specified by [code]#cgh#, with a maximum buffer size in bytes per kernel
       invocation specified by the parameter [code]#totalBufferSize#, and a maximum
-      stream size that can be buffered by a work item between stream flushes
+      stream size that can be buffered by a work-item between stream flushes
       specified by the parameter [code]#workItemBufferSize#.
       Zero or more properties can be provided to the constructed SYCL
       [code]#stream# via an instance of [code]#property_list#.
@@ -19423,7 +19423,7 @@ a@
 ----
 size_t get_work_item_buffer_size() const
 ----
-   a@ Returns the buffer size per work item, in bytes.
+   a@ Returns the buffer size per work-item, in bytes.
 
 a@
 [source]
@@ -19463,17 +19463,17 @@ performed are implementation-defined. For example it is valid for an implementat
 [code]#printf()#.
 
 The SYCL [code]#stream# class is required to output the content of each stream, between flushes (up to
-[code]#workItemBufferSize)#, without mixing with content from the same stream in other work items.
-There are no other output order guarantees between work items or between streams.  The stream flush
+[code]#workItemBufferSize)#, without mixing with content from the same stream in other work-items.
+There are no other output order guarantees between work-items or between streams.  The stream flush
 operation therefore delimits the unit of output that is guaranteed to be displayed without mixing with
-other work items, with respect to a single stream.
+other work-items, with respect to a single stream.
 
 
 === Implicit flush
 
 There is guaranteed to be an implicit flush of each stream used by a
 kernel, at the end of kernel execution, from the perspective of each
-work item.  There is also an implicit flush when the endl stream
+work-item.  There is also an implicit flush when the endl stream
 manipulator is executed.  No other implicit flushes are permitted in
 an implementation.