Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor corrections and tweaks #403

Merged
merged 6 commits into from
Apr 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions adoc/chapters/architecture.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -599,7 +599,7 @@ a [code]#range<N># representing the number of work-items executing the kernel.

==== ND-range kernels

Work-items can be organized into <<work-group,work groups>>, providing a more
Work-items can be organized into <<work-group,work-groups>>, providing a more
coarse-grained decomposition of the index space. Each work-group is assigned a
unique <<work-group-id>> with the same dimensionality as the index space used for
the work-items. Work-items are each assigned a <<local-id>>, unique within the
Expand Down Expand Up @@ -915,7 +915,7 @@ Work-items executing in a kernel have access to three distinct address spaces
of correct results unless <<mem-fence>> and atomic operations are used.
* <<local-memory,Local-memory>> is accessible to all work-items in a single
work-group. Attempting to access local memory in one work-group from
another work group results in undefined behavior. This memory region can be
another work-group results in undefined behavior. This memory region can be
used to allocate variables that are shared by all work-items in a
work-group. Work-group-level visibility allows local memory to be
implemented as dedicated regions of the device memory where this is
Expand Down Expand Up @@ -1055,7 +1055,7 @@ device kernel results in undefined behavior. Developers are encouraged to use
include::{header_dir}/memoryScope.h[lines=4..-1]
----

The set of <<work-item,work items>> and devices to which the memory ordering
The set of <<work-item,work-items>> and devices to which the memory ordering
constraints of a given atomic operation apply is controlled by a
[code]#sycl::memory_scope# parameter, which can take one of the following
values:
Expand Down Expand Up @@ -1192,8 +1192,8 @@ sections of this specification.

=== Basic data parallel kernels

Data-parallel <<kernel>>s that execute as
multiple <<work-item>>s and where no local synchronization is required are enqueued
Data-parallel <<kernel,kernels>> that execute as
multiple <<work-item,work-items>> and where no local synchronization is required are enqueued
with the [code]#sycl::parallel_for# function parameterized by a
[code]#sycl::range# parameter. These kernels will execute the kernel
function body once for each work-item in the specified <<range>>.
Expand Down
6 changes: 3 additions & 3 deletions adoc/chapters/device_compiler.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -520,11 +520,11 @@ reqd_work_group_size(dim0, dim1, dim2)
Kernels that are decorated with this attribute may not call functions that are
defined in another translation unit via the [code]#SYCL_EXTERNAL# macro.

Each device may have limitations on the work group sizes that it supports. If
Each device may have limitations on the work-group sizes that it supports. If
a kernel is decorated with this attribute and then submitted to a device that
does not support the work group size, the implementation must throw a
does not support the work-group size, the implementation must throw a
synchronous [code]#exception# with the [code]#errc::kernel_not_supported# error
code. If the kernel is submitted to a device that does support the work group
code. If the kernel is submitted to a device that does support the work-group
size, but the application provides an [code]#nd_range# that does not match the
size from the attribute, then the implementation must throw a synchronous
[code]#exception# with the [code]#errc::nd_range# error code.
Expand Down
41 changes: 21 additions & 20 deletions adoc/chapters/glossary.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -199,21 +199,21 @@ object. For the full description please refer to <<subsec:buffers>>.

[[global-id]]global id::
As in OpenCL, a global ID is used to uniquely identify a <<work-item>>
and is derived from the number of global <<work-item,work items>> specified
and is derived from the number of global <<work-item,work-items>> specified
when executing a kernel. A global ID is a one, two or three-dimensional
value that starts at 0 per dimension.

[[global-memory]]global memory::
Global memory is a memory region accessible to all <<work-item,work items>>
Global memory is a memory region accessible to all <<work-item,work-items>>
executing on a <<device>>.

[[group]]group::
A group of work-items within the index space of a SYCL kernel execution,
such as a <<work-group>> or <<sub-group>>.

[[group-barrier]]group barrier::
A synchronization function within a group of <<work-item,work items>>. All the
<<work-item,work items>> of a group must execute the barrier construct before any
A synchronization function within a group of <<work-item,work-items>>. All the
<<work-item,work-items>> of a group must execute the barrier construct before any
<<work-item>> continues execution beyond the barrier. Additionally all work-items
in the group execute a release <<mem-fence>> prior to synchronizing at the
barrier, all work-items in the group execute an acquire <<mem-fence>> after
Expand Down Expand Up @@ -349,16 +349,16 @@ object. For the full description please refer to <<subsec:buffers>>.
[[nd-item]]nd-item::
A unique identifier representing a single <<work-item>> and
<<work-group>> within the index space of a SYCL kernel execution. Can
be one, two or three dimensional. In the SYCL interface a <<nd-item>>
be one, two or three dimensional. In the SYCL interface an <<nd-item>>
is represented by the [code]#nd_item# class (see
<<nditem-class>>).

[[nd-range]]nd-range::
A representation of the index space of a SYCL kernel execution, the
distribution of <<work-item,work items>> within into <<work-group,work groups>>.
distribution of <<work-item,work-items>> within into <<work-group,work-groups>>.
Contains a <<range>> specifying the number of global
<<work-item,work items>>, a <<range>> specifying the number of local
<<work-item,work items>> and a <<id>> specifying the global offset. Can be
<<work-item,work-items>>, a <<range>> specifying the number of local
<<work-item,work-items>> and a <<id>> specifying the global offset. Can be
one, two or three dimensional. The minimum size of <<range>>
within the <<nd-range>> is 0 per dimension; where any dimension is set to zero,
the index space in all dimensions will be zero.
Expand Down Expand Up @@ -399,10 +399,11 @@ object. For the full description please refer to <<subsec:buffers>>.
please refer to <<sec:interface.queue.class>>.

[[range]]range::
A representation of a number of <<work-item,work items>> or <<work-group>>
within the index space of a SYCL kernel execution. Can be one, two or
three dimensional. In the SYCL interface a <<work-group>> is
represented by the [code]#group# class (see <<group-class>>).
A representation of a number of <<work-item,work-items>> or
<<work-group,work-groups>> within the index space of a SYCL kernel
execution. Can be one, two or three dimensional. In the SYCL interface a
<<range>> is represented by the [code]#range# class
(see <<range-class>>).

[[ranged-accessor]]ranged accessor::
A ranged accessor is a host or buffer <<accessor>> that was constructed
Expand Down Expand Up @@ -477,10 +478,10 @@ object. For the full description please refer to <<subsec:buffers>>.
see <<sub-group-class>>.

[[sub-group-barrier]]sub-group barrier::
A <<group-barrier>> for all <<work-item,work items>> in a <<sub-group>>.
A <<group-barrier>> for all <<work-item,work-items>> in a <<sub-group>>.

[[sub-group-mem-fence]]sub-group mem-fence::
A <<mem-fence>> for all <<work-item,work items>> in a <<sub-group>>.
A <<mem-fence>> for all <<work-item,work-items>> in a <<sub-group>>.

[[sycl-application]]SYCL application::
A SYCL application is a {cpp} application which uses the SYCL programming
Expand Down Expand Up @@ -542,20 +543,20 @@ See <<sec:usm>>

[[work-group]]work-group::
The SYCL work-group ([code]#sycl::group# class) is a representation
of a collection of related <<work-item,work items>> that execute on a single
compute unit. The <<work-item,work items>> in the group execute the same
of a collection of related <<work-item,work-items>> that execute on a single
compute unit. The <<work-item,work-items>> in the group execute the same
kernel-instance and <<opencl12, share local memory and work-group functions>>.
For further details for the [code]#sycl::group#
class see <<group-class>>.

[[work-group-barrier]]work-group barrier::
A <<group-barrier>> for all <<work-item,work items>> in a <<work-group>>.
A <<group-barrier>> for all <<work-item,work-items>> in a <<work-group>>.

[[work-group-mem-fence]]work-group mem-fence::
A <<mem-fence>> for all <<work-item,work items>> in a <<work-group>>.
A <<mem-fence>> for all <<work-item,work-items>> in a <<work-group>>.

[[work-group-id]]work-group id::
As in OpenCL, SYCL kernels execute in <<work-group,work groups>>. The group ID
As in OpenCL, SYCL kernels execute in <<work-group,work-groups>>. The group ID
is the ID of the <<work-group>> that a <<work-item>> is executing
within. A group ID is an one, two or three dimensional value that starts
at 0 per dimension.
Expand All @@ -568,7 +569,7 @@ See <<sec:usm>>
collection of parallel executions of a kernel invoked on a <<device>>
by a <<command>>. A <<opencl12, work-item>> is executed by one or more processing
elements as part of a <<work-group>> executing on a compute unit. A
<<work-item>> is distinguished from other <<work-item,work items>> by its
<<work-item>> is distinguished from other <<work-item,work-items>> by its
<<global-id>> or the combination of its <<work-group-id>> and its
<<local-id>> within a <<work-group>>.

Expand Down
2 changes: 1 addition & 1 deletion adoc/chapters/opencl_backend.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1142,7 +1142,7 @@ in this section is non-normative.

The OpenCL 1.2 specification document <<opencl12, ch. 6.12.1 in Table 6.7>>
defines work-item functions that tell various information about the currently
executing work item in an OpenCL kernel. SYCL provides equivalent
executing work-item in an OpenCL kernel. SYCL provides equivalent
functionality through the item and group classes that are defined in
<<subsec:item.class>>, <<nditem-class>> and <<group-class>>.

Expand Down
48 changes: 24 additions & 24 deletions adoc/chapters/programming_interface.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8079,7 +8079,7 @@ template parameters are the same.
The [code]#local_accessor# class allocates device local memory and provides
access to this memory from within a <<sycl-kernel-function>>. The
<<local-memory>> that is allocated is shared between all
<<work-item,work items>> of a <<work-group>>. If multiple work-groups execute
<<work-item,work-items>> of a <<work-group>>. If multiple work-groups execute
simultaneously in an implementation, each work-group receives its own
independent copy of the allocated local memory.

Expand Down Expand Up @@ -11818,14 +11818,14 @@ a@
----
range<Dimensions> get_group_range() const
----
a@ Returns the number of <<work-group,work groups>> in the iteration space.
a@ Returns the number of <<work-group,work-groups>> in the iteration space.

a@
[source]
----
size_t get_group_range(int dimension) const
----
a@ Return the number of <<work-group,work groups>> for [code]#Dimension# in the
a@ Return the number of <<work-group,work-groups>> for [code]#Dimension# in the
iteration space.

a@
Expand Down Expand Up @@ -12230,7 +12230,7 @@ a@
----
range<Dimensions> get_group_range() const
----
a@ Return a [code]#range# representing the number of <<work-group,work groups>> in the [code]#nd_range#.
a@ Return a [code]#range# representing the number of <<work-group,work-groups>> in the [code]#nd_range#.

a@
[source]
Expand Down Expand Up @@ -12268,7 +12268,7 @@ a@
----
size_t get_group_linear_range() const
----
a@ Return the total number of <<work-group>>s in the [code]#nd_range#.
a@ Return the total number of <<work-group,work-groups>> in the [code]#nd_range#.

a@
[source]
Expand Down Expand Up @@ -13287,7 +13287,7 @@ by each event in [code]#depEvents# must complete before executing this

<<kernel,Kernels>> can be invoked as [keyword]#single tasks#, basic
[keyword]#data-parallel# <<kernel,kernels>>, <<nd-range>> in
<<work-group,work groups>>, or [keyword]#hierarchical parallelism#.
<<work-group,work-groups>>, or [keyword]#hierarchical parallelism#.

Each function takes an optional kernel name template parameter. The user
may optionally provide a <<kernel-name>>, otherwise an implementation-defined name
Expand Down Expand Up @@ -13712,7 +13712,7 @@ within the range specified by the [code]#nd_range# parameter. The
[code]#nd_item# parameter makes all information about the work-item and
its position in the range available, and provides access to functions
enabling the use of a <<work-group-barrier>> to synchronize between the
<<work-item>>s in the <<work-group>>.
<<work-item,work-items>> in the <<work-group>>.

[NOTE]
====
Expand Down Expand Up @@ -13765,7 +13765,7 @@ include::{code_dir}/parallelForWithKernelHandler.cpp[lines=4..-1]
The hierarchical parallel kernel execution interface provides the same
functionality as is available from the <<nd-range>> interface, but
exposed differently. To execute the same sixty-four work-items in
sixteen work-groups that we saw in the previous example, we execute an
eight work-groups that we saw in a previous example, we execute an
outer [code]#parallel_for_work_group# call to create the
groups. The member function
[code]#handler::parallel_for_work_group# is parameterized by the
Expand All @@ -13781,7 +13781,7 @@ heuristic suggests that it is more efficient to do so, this code will be
executed for each work-item.

Within this region any variable declared will have the semantics of
<<local-memory>>, shared between all <<work-item,work items>> in the
<<local-memory>>, shared between all <<work-item,work-items>> in the
<<work-group>>. If the
device compiler can prove that an array of such variables is accessed only by
a single work-item throughout the lifetime of the work-group, for
Expand Down Expand Up @@ -13814,7 +13814,7 @@ a@
----
private_memory(const group<Dimensions>&)
----
a@ Place an object of type [code]#T# in the underlying private memory of each <<work-item,work items>>.
a@ Place an object of type [code]#T# in the underlying private memory of each <<work-item,work-items>>.
The type [code]#T# must be default constructible.
The underlying constructor will be called for each <<work-item>>.

Expand All @@ -13830,7 +13830,7 @@ a@
----
T& operator()(const h_item<Dimensions>& id)
----
a@ Retrieve a reference to the object for the <<work-item,work items>>.
a@ Retrieve a reference to the object for the <<work-item,work-items>>.

|====

Expand Down Expand Up @@ -16030,7 +16030,7 @@ _RandomFiller_, which initializes a buffer with a random number. The
random number is generated during the construction of the function object
while processing the command group. The [code]#operator()# member
function of the function object receives an [code]#item# object. This
member function will be called for each work item of the execution range. The value
member function will be called for each work-item of the execution range. The value
of the random number will be assigned to each element of the buffer. In this
case, the accessor and the scalar random number are members of the function
object and therefore will be arguments to the device kernel. Usual
Expand Down Expand Up @@ -19159,13 +19159,13 @@ There are two limits that are relevant for the [code]#stream# class. The
[code]#totalBufferSize# limit specifies the maximum size of the overall
character stream that can be output during a kernel invocation, and the
[code]#workItemBufferSize# limit specifies the maximum size of the
character stream that can be output within a work item before a flush must be
character stream that can be output within a work-item before a flush must be
performed. Both of these limits are specified in bytes. The
[code]#totalBufferSize# limit must be sufficient to contain the characters
output by all stream statements during execution of a kernel invocation (the
aggregate of outputs from all work items), and the
aggregate of outputs from all work-items), and the
[code]#workItemBufferSize# limit must be sufficient to contain the
characters output within a work item between stream flush operations.
characters output within a work-item between stream flush operations.

If the [code]#totalBufferSize# or [code]#workItemBufferSize#
limits are exceeded, it is implementation-defined whether the streamed
Expand Down Expand Up @@ -19271,10 +19271,10 @@ a@
----
flush
----
a@ Triggers a flush operation, which synchronizes the work item stream buffer
with the global stream buffer, and then empties the work item stream
a@ Triggers a flush operation, which synchronizes the work-item stream buffer
with the global stream buffer, and then empties the work-item stream
buffer. After a flush, the full [code]#workItemBufferSize# is
available again for subsequent streaming within the work item.
available again for subsequent streaming within the work-item.

a@
[source]
Expand Down Expand Up @@ -19392,7 +19392,7 @@ stream(size_t totalBufferSize, size_t workItemBufferSize, handler& cgh,
a@ Constructs a SYCL [code]#stream# instance associated with the command group
specified by [code]#cgh#, with a maximum buffer size in bytes per kernel
invocation specified by the parameter [code]#totalBufferSize#, and a maximum
stream size that can be buffered by a work item between stream flushes
stream size that can be buffered by a work-item between stream flushes
specified by the parameter [code]#workItemBufferSize#.
Zero or more properties can be provided to the constructed SYCL
[code]#stream# via an instance of [code]#property_list#.
Expand Down Expand Up @@ -19423,7 +19423,7 @@ a@
----
size_t get_work_item_buffer_size() const
----
a@ Returns the buffer size per work item, in bytes.
a@ Returns the buffer size per work-item, in bytes.

a@
[source]
Expand Down Expand Up @@ -19463,17 +19463,17 @@ performed are implementation-defined. For example it is valid for an implementat
[code]#printf()#.

The SYCL [code]#stream# class is required to output the content of each stream, between flushes (up to
[code]#workItemBufferSize)#, without mixing with content from the same stream in other work items.
There are no other output order guarantees between work items or between streams. The stream flush
[code]#workItemBufferSize)#, without mixing with content from the same stream in other work-items.
There are no other output order guarantees between work-items or between streams. The stream flush
operation therefore delimits the unit of output that is guaranteed to be displayed without mixing with
other work items, with respect to a single stream.
other work-items, with respect to a single stream.


=== Implicit flush

There is guaranteed to be an implicit flush of each stream used by a
kernel, at the end of kernel execution, from the perspective of each
work item. There is also an implicit flush when the endl stream
work-item. There is also an implicit flush when the endl stream
manipulator is executed. No other implicit flushes are permitted in
an implementation.

Expand Down
Loading