oneAPI DPC++ Compiler 2022-12
New features
SYCL Compiler
- Added support for per-object device code compilation under the option
-fno-sycl-rdc
. This improves compiler performance and reduces memory usage,
but can only be used if there are no cross-object dependencies. [f884993] - Added support for per-aspect device code split mode. [9a2c4fe]
- Extended support for the large GRF mode to non-ESIMD kernels. [9994934]
[ab2a42c] - Implemented the
sycl_ext_intel_device_architecture
extension. [0e32a28] [b59d93c] [5bd5c87] [e5de913] - Implemented the
sycl_ext_oneapi_kernel_properties
experimental extension. [332e4ee] [27454de] [70ee3d5] [430c722] - Added support for generic address space atomic built-ins to CUDA libclc.
[d6a8fd1]
SYCL Library
- Implemented accessor member functions
swap
,byte_size
,max_size
and
empty
. [f1f907a] - Implemented SYCL 2020 default accessor constructor. [04928f9]
- Implemented SYCL 2020 accessor iterators. [5b9fd3c] [c7b1a00]
- Changed
value_type
of read-only accessors toconst
in accordance with
SYCL 2020. [227614c] - Implemented SYCL 2020
multi_ptr
andaddress_space_cast
. [8700b76]
[483984a] [4a9e9a0] - Implemented SYCL 2020
has_extension
free functions. [7f1a6ef] - Implemented SYCL 2020
aspect_selector
. [c0a4a56] - Implemented new SYCL 2020 style FPGA selectors. [0417651]
- Implemented SYCL 2020 default
async_handler
behavior. [cd93d8f] - Implemented SYCL 2020
is_compatible
free function. [67f6bba] - Implemented queue shortcut functions with placeholder accessors. [5ee066e]
- Added support for creating a kernel bundle with descendent devices of the
passed context's members. [a782779] - Implemented non-blocking destruction and deferred release of memory objects
without attached host memory. [894ce25] - Implemented the
sycl_ext_oneapi_queue_priority
extension. [cdb09dc] - Implemented the
sycl_ext_oneapi_user_defined_reductions
extension. [8311d79] - Implemented the
sycl_ext_oneapi_queue_empty
extension proposal. [c493295] - Implemented the
sycl_ext_oneapi_weak_object
extension. [d948427] [9297f63] - Implemented the
sycl_ext_intel_cslice
extension. The old behavior that exposed compute slices as sub-sub-devices is
now deprecated. For compatibility purposes, it can be brought back via the
SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING
environment
varible. [5995c618] - Implemented the
sycl_ext_intel_queue_index
extension. [d2ec964] [7179e83] - Implemented the
sycl_ext_oneapi_memcpy2d
extension. [516d411] - Implemented device ID, memory clock rate and bus width information queries
from thesycl_ext_intel_device_info
extension. [1d99344] [4f7787c] - Implemented
ext::oneapi::experimental::radix_sorter
from the
sycl_ext_oneapi_group_sort
extension proposal. [86ba180] - Implemented a new unified interface for the
sycl_ext_oneapi_matrix
extension for CUDA. [166bbc3] - Added support for sorting over sub-groups. [168767c]
- Added C++ API wrappers for the Intel math functions
ceil
,floor
,rint
,
sqrt
,rsqrt
andtrunc
. [1b7582b] - Implemented a SYCL device library for
bfloat16
Intel math function
utilities. [fc136d6] - Added support for range reductions with any number of reduction variables.
[572bc50] - Added support for reductions with kernels accepting
item
. [5d5e9f4] - Enabled sub-group masks for 64-bit subgroups. [10d50ed]
- Implemented the new non-experimental API for DPAS. [55bf1a0] [1e7a8ea]
- Added 8/16-bit type support to
lsc_block_load
andlsc_block_store
ESIMD
API. [f9d8059] - Implemented atomic operation support in the ESIMD emulator. [a6a0dea]
- Added various trivial utility functions for the
half
type. [b4ce7c0] - Added type cast functions between
half
andfloat
/integer types to
libdevice. [599b1b9] - Implemented the
ONEAPI_DEVICE_SELECTOR
environment variable that, in
addition to supportingSYCL_DEVICE_FILTER
syntax, allows to expose GPU
sub-devices as SYCL root devices and supports negative filters.
SYCL_DEVICE_FILTER
is now deprecated. [28d0cd3] [b21e74e] [77b6f34]
[6bd5f9c] [6aefd63] - Added the
SYCL_PI_LEVEL_ZERO_SINGLE_ROOT_DEVICE_BUFFER_MIGRATION
enviornment variable. [bd03e0d]
Documentation
- Added the
sycl_ext_oneapi_device_architecture
extension specification. [7f2b17e] - Added the
sycl_ext_oneapi_memcpy2d
extension specification. [296e9c3] - Added the
sycl_ext_oneapi_user_defined_reductions
extension specification. [cd4fd8c] - Added the
sycl_ext_oneapi_weak_object
extension specification. [d948427] - Added the
sycl_ext_oneapi_prod
extension proposal. [ed7cb4b] - Added the
sycl_ext_codeplay_kernel_fusion
extension proposal. [be3dfbd] - Added the
sycl_ext_intel_queue_index
extension proposal. [f5fb759] - Added the
sycl_ext_intel_cslice
extension proposal. [5777e1f] - Added the
sycl_ext_oneapi_group_sort
extension update proposal that introduced sorting functions with fixed-size arrays. [c6d1caf] - Added device ID, memory clock rate and bus width device information queries to the
sycl_ext_intel_device_info
extension. [1d99344][4f7787c]
Improvements
SYCL Compiler
- Added the
InferAddressSpaces
pass to the SPIR/SPIR-V compilation pipeline,
reducing the size of the generated device code. [a3ae0dd] - Redesigned pointer handling so that it no longer decomposes kernel argument
types containing pointers. [3916d3b] [d55e9c2] [9b02506] - Kernel lambda operator is now always inlined in the device code entry point
unless-O0
is used. [b91b732] [2359d94] - Improved entry point handling in the
sycl-post-link
tool. [53d9c7b] - The
reqd_work_group_size
attribute now works with 1, 2 or 3 operands.
[4ff42c3] - Enabled using
-fcf-protection
option with-fsycl
, which results in it
being applied only to host code compilation and producing a warning. [b6f61f6] - Linux based compiler driver on Windows now pulls in the
sycld
debug library
whenmsvcrtd
is specified as a dependent library. [ebf6c59] - Added
/Zc:__cplusplus
as a default option during host compilation with MSVC.
[e7ed860] - Improved the
ESIMDOptimizeVecArgCallConv
optimization pass to cover more IR
patterns. [4926454] - Added support for more types in ESIMD lsc functions. [d9e40ec]
- Added error diagnostics for using
sycl::ext::oneapi::experimental::annotated_arg/ptr
as a nested type.
[321c733] - The status of
bfloat16
support was changed from experimental to supported.
[7b47ebb]
SYCL Library
- Updated
online_compiler
with Gen12 GPU support. [adfb1c1] get_kernel_bundle
andhas_kernel_bundle
now check that the kernels are
compatible with the devices. [91b1515]- Waiting for an event associated with a kernel that uses a stream now also
waits for the stream to be flushed. [1db0e81] - Added the requested device type to the message of the exception thrown when no
such devices are found. [6b83ad7] - Optimized
operator[]
ofhost_accessor
. [01e60f7] - Improved reduction performance on discrete GPUs. [99bdc82]
- Added
invoke_simd
support for functions withvoid
return type. [3fd0850] - The Level Zero plugin now creates every event as host-visible by default.
[f3d245d] - Added Level Zero plugin support for global work sizes greater than
UINT32_MAX
as long as they are divisible by some legal work-group size and
the resulting quotient does not exceedUINT32_MAX
. [62dd13d] - Improved native Level Zero event handling in the immediate command list mode
by removing excessive status queries. [179ffa1] - Removed an uninitialized buffer migration copy in the Level Zero plugin
[b0c3404] - Implemented an optimization that reuses discarded Level Zero events in the
plugin. [b1533c5] - The host device is now inaccessible. [5b13d5b]
- Removed deprecated
make_queue
API. [9fc1d68] - Deprecated
group::get_global_range()
. [9533871]
Documentation
- Updated the
sycl_ext_oneapi_device_if
extension proposal to allow chainingif_device_has
,else_if_device_has
and
else_device
calls. [7f2b17e] - Updated the
sycl_ext_intel_fpga_device_selector
extension to use SYCL 2020 style selectors. [0417651] - Updated the
sycl_ext_intel_fpga_kernel_interface_properties
extension proposal to allow the compiler to determine the initiation interval.
[1a1fd8d] - Updated the
sycl_ext_usm_address_spaces
extension to adhere to SYCL 2020multi_ptr
. [4a9e9a0] - Added a new matrix use parameter to
joint_matrix
from the
sycl_ext_oneapi_matrix
extension specification. [52f34fd] - Removed
queue::size
andqueue::get_wait_list
functions from the
sycl_ext_oneapi_queue_status_query
extension due to performance overhead
implications and renamed it to
sycl_ext_oneapi_queue_empty
. [b540f81] - Clarified ESIMD emulator device selection behavior in
sycl_ext_intel_esimd
.
[9b5f288] - Updated the
sycl_ext_oneapi_device_architecture
extension to include NVIDIA and AMD architectures. [c6091df] [e5de913] - Updated get started guide to mention CUDA specific
driver options, some of which are necessary for making use of newer hardware
features. [4e5d276] [f48f96e]
Bug fixes
SYCL Compiler
- Fixed a crash when attempting to compile code that uses a function object
without a defined call operator as a kernel. [490ee55] - Fixed a crash that occurred during compilation of device code with a captured
structured binding. [0e455c9] - Fixed the
work_group_size_hint
attribute not being applicable to lambda
functions using non-conforming syntax. [c2a0db0] - Fixed integration header parameter kind information for annotated types.
[b8f35cf] - Fixed an issue with offload dependencies when using
-fsycl-force-target
.
[54777c0] - Fixed debug information generation when an integration footer is present.
[83febf9] - Fixed a
__builtin_printf
related error when compiling device code with
_GLIBCXX_ASSERTIONS=1
. [d8fd9bc] - Fixed a compiler error that occurred during archive generation when using
-fsycl-link
for FPGA. [3159db5] - Fixed memory corruption caused by the
ESIMDOptimizeVecArgCallConv
pass.
[86f709c] - Fixed a crash during ESIMD intrinsic generation. [20b1bea]
- Fixed libclc function mangling. [e6c4c15] [f32d34f]
- Fixed an issue with specifying the offload architecture for AMD and NVIDIA
targets. [4189858] - Fixed incorrectly issued warnings about CUDA or HIP targets missing from a
linked library. [72d9b05]
SYCL Library
- Fixed an issue where the in-order queue property was not respected when
submitting USM commands and host tasks. [067d3b3] - Fixed a memory leak when enqueueing a barrier to a
discard_events
queue.
[968f9e7] - Fixed a memory leak related to submitting host tasks without memory object
dependencies. [c44050a] - Fixed an invalid event error when handling cross-queue no-op dependencies.
[6c9a380] - Fixed an error when setting a specialization constant in a command group with
no kernel. [b333cee] - Fixed an issue where submitting a kernel that explicitly depends on a host
task was a blocking call that waited for the host task. [c44050a] - Removed
noexcept
from some ofusm_allocator
member functions to align with
the specification. [7b02697] - Fixed
ext::intel::experimental::atomic_update
with thefcmpwr
operation.
[52923e6] - Fixed memory leak issues when constructing a SYCL
kernel
/kernel_bundle
using interoperability. [b083e05] [a32021b] [3c22764] - Fixed an error where the native handle returned by
get_native
from a default
constructed event was unusable. [7202173] - Fixed an issue where reinterpreting a buffer to a
const
type changed the
correspondingbuffer_allocator
type toconst
. [3aabd26] - Fixed
handler::set_arg
withlocal_accessor
. [ef792c6] - Added the missing default template argument for
sycl::info::device::max_work_item_sizes
. [3a4e797] - Fixed an issue where some aspects could be incorrectly reported as
unsupported by a device. [f90d2b4] - Fixed return type of scalar versions of relational functions. The fix
requires definingSYCL2020_CONFORMANT_APIS
macro. [45d516c] - Fixed an issue where device code cache was not used if the compilation was
triggered by different paths. [a0254c9] [1c77f9a] - Fixed a use-after-move bug when caching device code built for multiple
devices. [97c0c99] - Removed the unintended requirement of
fp64
support from stream and ESIMD
floatfmod
implementations. [a67807a] [cc18904] - Fixed several complex math operations failing on devices that don't support
fp64
. [6b24fdc] - Aligned host side float-to-half mantissa rounding with device side.
[4cbd459] - Fixed float-to-half conversion of the
half
minimum subnormal value on host.
[514708b] - Fixed
marray
math function implementation. [73a992b] - Fixed an out-of-bounds write in the group operations implementation.
[0fa7542] - Fixed a reduction performance regression caused by using the wrong
implementation for thefloat
type. [97725f1] - Fixed header deprecation warnings to work properly on Linux. [2cefad1]
- Fixed deprecation of SYCL 1.2.1 device selectors. [a6222ba]
- Fixed multiple issues in GDB xmethods scripts. [e49aa08]
- Fixed an issue with
sycl-prof
JSON output. [37a74c7] - Fixed compilation errors on Windows when using the ESIMD API. [3aa48db]
[77e92ce] - Fixed invalid calculation in the ESIMD
tanh
function. [cda6680] - Fixed
kernel_bundle
errors when using ESIMD emulator devices. [9baa9d9] - Fixed an issue where ESIMD emulator was picked by the default selector even in
the presence of other devices. [44d7926] - Fixed an error when querying an ESIMD emulator device for sub-group sizes.
[acca608] - Fixed invalid behavior of the maximum sub-group size query on some OpenCL
systems. [5998d7c] - Fixed an issue where the OpenCL plugin checked whether a program is supported
on a device by looking up platform version/extensions rather than device
ones. [9f89247] - Fixed the result of the free device memory query with the Level Zero backend.
[c191fb0] - Fixed an issue with
ext_oneapi_barrier
not working when using the Level Zero
backend. [1f8d90f] - Fixed a hang after submitting a barrier to a Level Zero in-order queue.
[dd5a191] - Fixed an issue that occurred when submitting a barrier to a Level Zero queue
with no prior submissions. [5c9e543] - Fixed a memory leak when tracking indirect access in the Level Zero plugin.
[1b79491] - Fixed an invalid read issue that occurred during Level Zero event release.
[c49eeda] - Fixed a synchronization issue when using device scope Level Zero events.
[9811ef2] - Fixed an issue that occurred when using
get_native
on a newly constructed
Level Zero queue. [5d0d4ef] - Fixed a segmentation fault related to events recycling in immediate command
list mode in the Level Zero plugin. [a3e93e0] [5b021a2] - Fixed an issue where an invalid maximum of compute units was reported for
Level Zero sub-sub-devices. [b9f4919] - Fixed a segmentation fault when using Level Zero sub-sub-devices with the
immediate command lists mode. [ed3d35c] - Reverted the Level Zero plugin change that preferred using copy engine for
memory read/write operations due to functional regressions. [44aa363] - Added the missing
fp16
case ofFMulKHR
libclc function. [4372915] - Fixed several bugs in the barrier implementation in the CUDA and HIP plugins.
[1c3d598] [ce7c594] - Fixed
get_native
not working for CUDA devices. [998fd91] - Fixed a crash when submitting a kernel with a range of 0 to a CUDA device.
[a395886] - Fixed an issue where
make_device
produced duplicate CUDA devices.
[75302c5] - Fixed an issue where the HIP backend always reported the
fp64
aspect as
missing. [cd832bf]
Known issues
- Having MESA OpenCL implementation which provides no devices on a
system may cause incorrect device discovery. As a workaround such an OpenCL
implementation can be disabled by removing/etc/OpenCL/vendor/mesa.icd
. - Compilation may fail on Windows in debug mode if a kernel uses
std::array
. This happens because debug version ofstd::array
in
Microsoft STL C++ headers calls functions that are illegal for the device
code. As a workaround the following can be done:- Dump compiler pipeline execution strings by passing
-###
option to the
compiler. The compiler will print the internal execution strings of
compilation tools. The actual compilation will not happen. - Modify the (usually) first execution string (it should have
-fsycl-is-device
option) by adding
-D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0
options to the
end of the string. Execute all string one by one.
- Dump compiler pipeline execution strings by passing
-fsycl-dead-args-optimization
can't help eliminate offset of
accessor even though it's created with no offset specified- SYCL 2020 barriers show worse performance than SYCL 1.2.1 do. [18c80fa]
- When using fallback assert in separate compilation flow it requires explicit
linking againstlib/libsycl-fallback-cassert.o
or
lib/libsycl-fallback-cassert.spv
- Limit alignment of allocation requests at 64KB which is the only alignment
supported by Level Zero. 7dfaf3b - User-defined functions with the name and signature matching those of any
OpenCL C built-in function (i.e. an exact match of arguments, return type
doesn't matter) can lead to Undefined Behavior. - A DPC++ system that has FPGAs installed does not support multi-process
execution. Creating a context opens the device associated with the context
and places a lock on it for that process. No other process may use that
device. Some queries about the device throughdevice.get_info<>()
also
open up the device and lock it to that process since the runtime needs
to query the actual device to obtain that information. - The format of the object files produced by the compiler can change between
versions. The workaround is to rebuild the application. - Using
sycl::kernel_bundle
API to refer to a kernel defined
in another translation unit leads to undefined behavior - Linkage errors with the following message:
error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
can happen when a SYCL application is built using MS Visual Studio 2019
version below 16.3.0 and user specifies-std=c++14
or/std:c++14
. - Printing internal defines isn't supported on Windows. [50628db]
This bits associated with this release have not been tested/validated for quality or functionality. They are simply the nightly build associated to the release commit that OneAPI release was based on