Skip to content

oneAPI DPC++ Compiler 2022-12

Compare
Choose a tag to compare
@tfzhu tfzhu released this 08 Feb 07:14
· 85157 commits to sycl since this release
6977f1a

New features

SYCL Compiler

SYCL Library

  • Implemented accessor member functions swap, byte_size, max_size and
    empty. [f1f907a]
  • Implemented SYCL 2020 default accessor constructor. [04928f9]
  • Implemented SYCL 2020 accessor iterators. [5b9fd3c] [c7b1a00]
  • Changed value_type of read-only accessors to const in accordance with
    SYCL 2020. [227614c]
  • Implemented SYCL 2020 multi_ptr and address_space_cast. [8700b76]
    [483984a] [4a9e9a0]
  • Implemented SYCL 2020 has_extension free functions. [7f1a6ef]
  • Implemented SYCL 2020 aspect_selector. [c0a4a56]
  • Implemented new SYCL 2020 style FPGA selectors. [0417651]
  • Implemented SYCL 2020 default async_handler behavior. [cd93d8f]
  • Implemented SYCL 2020 is_compatible free function. [67f6bba]
  • Implemented queue shortcut functions with placeholder accessors. [5ee066e]
  • Added support for creating a kernel bundle with descendent devices of the
    passed context's members. [a782779]
  • Implemented non-blocking destruction and deferred release of memory objects
    without attached host memory. [894ce25]
  • Implemented the sycl_ext_oneapi_queue_priority
    extension. [cdb09dc]
  • Implemented the sycl_ext_oneapi_user_defined_reductions
    extension. [8311d79]
  • Implemented the sycl_ext_oneapi_queue_empty
    extension proposal. [c493295]
  • Implemented the sycl_ext_oneapi_weak_object
    extension. [d948427] [9297f63]
  • Implemented the sycl_ext_intel_cslice
    extension. The old behavior that exposed compute slices as sub-sub-devices is
    now deprecated. For compatibility purposes, it can be brought back via the
    SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING environment
    varible. [5995c618]
  • Implemented the sycl_ext_intel_queue_index
    extension. [d2ec964] [7179e83]
  • Implemented the sycl_ext_oneapi_memcpy2d
    extension. [516d411]
  • Implemented device ID, memory clock rate and bus width information queries
    from the sycl_ext_intel_device_info
    extension. [1d99344] [4f7787c]
  • Implemented ext::oneapi::experimental::radix_sorter from the
    sycl_ext_oneapi_group_sort
    extension proposal. [86ba180]
  • Implemented a new unified interface for the sycl_ext_oneapi_matrix
    extension for CUDA. [166bbc3]
  • Added support for sorting over sub-groups. [168767c]
  • Added C++ API wrappers for the Intel math functions ceil, floor, rint,
    sqrt, rsqrt and trunc. [1b7582b]
  • Implemented a SYCL device library for bfloat16 Intel math function
    utilities. [fc136d6]
  • Added support for range reductions with any number of reduction variables.
    [572bc50]
  • Added support for reductions with kernels accepting item. [5d5e9f4]
  • Enabled sub-group masks for 64-bit subgroups. [10d50ed]
  • Implemented the new non-experimental API for DPAS. [55bf1a0] [1e7a8ea]
  • Added 8/16-bit type support to lsc_block_load and lsc_block_store ESIMD
    API. [f9d8059]
  • Implemented atomic operation support in the ESIMD emulator. [a6a0dea]
  • Added various trivial utility functions for the half type. [b4ce7c0]
  • Added type cast functions between half and float/integer types to
    libdevice. [599b1b9]
  • Implemented the ONEAPI_DEVICE_SELECTOR environment variable that, in
    addition to supporting SYCL_DEVICE_FILTER syntax, allows to expose GPU
    sub-devices as SYCL root devices and supports negative filters.
    SYCL_DEVICE_FILTER is now deprecated. [28d0cd3] [b21e74e] [77b6f34]
    [6bd5f9c] [6aefd63]
  • Added the SYCL_PI_LEVEL_ZERO_SINGLE_ROOT_DEVICE_BUFFER_MIGRATION
    enviornment variable. [bd03e0d]

Documentation

Improvements

SYCL Compiler

  • Added the InferAddressSpaces pass to the SPIR/SPIR-V compilation pipeline,
    reducing the size of the generated device code. [a3ae0dd]
  • Redesigned pointer handling so that it no longer decomposes kernel argument
    types containing pointers. [3916d3b] [d55e9c2] [9b02506]
  • Kernel lambda operator is now always inlined in the device code entry point
    unless -O0 is used. [b91b732] [2359d94]
  • Improved entry point handling in the sycl-post-link tool. [53d9c7b]
  • The reqd_work_group_size attribute now works with 1, 2 or 3 operands.
    [4ff42c3]
  • Enabled using -fcf-protection option with -fsycl, which results in it
    being applied only to host code compilation and producing a warning. [b6f61f6]
  • Linux based compiler driver on Windows now pulls in the sycld debug library
    when msvcrtd is specified as a dependent library. [ebf6c59]
  • Added /Zc:__cplusplus as a default option during host compilation with MSVC.
    [e7ed860]
  • Improved the ESIMDOptimizeVecArgCallConv optimization pass to cover more IR
    patterns. [4926454]
  • Added support for more types in ESIMD lsc functions. [d9e40ec]
  • Added error diagnostics for using
    sycl::ext::oneapi::experimental::annotated_arg/ptr as a nested type.
    [321c733]
  • The status of bfloat16 support was changed from experimental to supported.
    [7b47ebb]

SYCL Library

  • Updated online_compiler with Gen12 GPU support. [adfb1c1]
  • get_kernel_bundle and has_kernel_bundle now check that the kernels are
    compatible with the devices. [91b1515]
  • Waiting for an event associated with a kernel that uses a stream now also
    waits for the stream to be flushed. [1db0e81]
  • Added the requested device type to the message of the exception thrown when no
    such devices are found. [6b83ad7]
  • Optimized operator[] of host_accessor. [01e60f7]
  • Improved reduction performance on discrete GPUs. [99bdc82]
  • Added invoke_simd support for functions with void return type. [3fd0850]
  • The Level Zero plugin now creates every event as host-visible by default.
    [f3d245d]
  • Added Level Zero plugin support for global work sizes greater than
    UINT32_MAX as long as they are divisible by some legal work-group size and
    the resulting quotient does not exceed UINT32_MAX. [62dd13d]
  • Improved native Level Zero event handling in the immediate command list mode
    by removing excessive status queries. [179ffa1]
  • Removed an uninitialized buffer migration copy in the Level Zero plugin
    [b0c3404]
  • Implemented an optimization that reuses discarded Level Zero events in the
    plugin. [b1533c5]
  • The host device is now inaccessible. [5b13d5b]
  • Removed deprecated make_queue API. [9fc1d68]
  • Deprecated group::get_global_range(). [9533871]

Documentation

Bug fixes

SYCL Compiler

  • Fixed a crash when attempting to compile code that uses a function object
    without a defined call operator as a kernel. [490ee55]
  • Fixed a crash that occurred during compilation of device code with a captured
    structured binding. [0e455c9]
  • Fixed the work_group_size_hint attribute not being applicable to lambda
    functions using non-conforming syntax. [c2a0db0]
  • Fixed integration header parameter kind information for annotated types.
    [b8f35cf]
  • Fixed an issue with offload dependencies when using -fsycl-force-target.
    [54777c0]
  • Fixed debug information generation when an integration footer is present.
    [83febf9]
  • Fixed a __builtin_printf related error when compiling device code with
    _GLIBCXX_ASSERTIONS=1. [d8fd9bc]
  • Fixed a compiler error that occurred during archive generation when using
    -fsycl-link for FPGA. [3159db5]
  • Fixed memory corruption caused by the ESIMDOptimizeVecArgCallConv pass.
    [86f709c]
  • Fixed a crash during ESIMD intrinsic generation. [20b1bea]
  • Fixed libclc function mangling. [e6c4c15] [f32d34f]
  • Fixed an issue with specifying the offload architecture for AMD and NVIDIA
    targets. [4189858]
  • Fixed incorrectly issued warnings about CUDA or HIP targets missing from a
    linked library. [72d9b05]

SYCL Library

  • Fixed an issue where the in-order queue property was not respected when
    submitting USM commands and host tasks. [067d3b3]
  • Fixed a memory leak when enqueueing a barrier to a discard_events queue.
    [968f9e7]
  • Fixed a memory leak related to submitting host tasks without memory object
    dependencies. [c44050a]
  • Fixed an invalid event error when handling cross-queue no-op dependencies.
    [6c9a380]
  • Fixed an error when setting a specialization constant in a command group with
    no kernel. [b333cee]
  • Fixed an issue where submitting a kernel that explicitly depends on a host
    task was a blocking call that waited for the host task. [c44050a]
  • Removed noexcept from some of usm_allocator member functions to align with
    the specification. [7b02697]
  • Fixed ext::intel::experimental::atomic_update with the fcmpwr operation.
    [52923e6]
  • Fixed memory leak issues when constructing a SYCL kernel/kernel_bundle
    using interoperability. [b083e05] [a32021b] [3c22764]
  • Fixed an error where the native handle returned by get_native from a default
    constructed event was unusable. [7202173]
  • Fixed an issue where reinterpreting a buffer to a const type changed the
    corresponding buffer_allocator type to const. [3aabd26]
  • Fixed handler::set_arg with local_accessor. [ef792c6]
  • Added the missing default template argument for
    sycl::info::device::max_work_item_sizes. [3a4e797]
  • Fixed an issue where some aspects could be incorrectly reported as
    unsupported by a device. [f90d2b4]
  • Fixed return type of scalar versions of relational functions. The fix
    requires defining SYCL2020_CONFORMANT_APIS macro. [45d516c]
  • Fixed an issue where device code cache was not used if the compilation was
    triggered by different paths. [a0254c9] [1c77f9a]
  • Fixed a use-after-move bug when caching device code built for multiple
    devices. [97c0c99]
  • Removed the unintended requirement of fp64 support from stream and ESIMD
    float fmod implementations. [a67807a] [cc18904]
  • Fixed several complex math operations failing on devices that don't support
    fp64. [6b24fdc]
  • Aligned host side float-to-half mantissa rounding with device side.
    [4cbd459]
  • Fixed float-to-half conversion of the half minimum subnormal value on host.
    [514708b]
  • Fixed marray math function implementation. [73a992b]
  • Fixed an out-of-bounds write in the group operations implementation.
    [0fa7542]
  • Fixed a reduction performance regression caused by using the wrong
    implementation for the float type. [97725f1]
  • Fixed header deprecation warnings to work properly on Linux. [2cefad1]
  • Fixed deprecation of SYCL 1.2.1 device selectors. [a6222ba]
  • Fixed multiple issues in GDB xmethods scripts. [e49aa08]
  • Fixed an issue with sycl-prof JSON output. [37a74c7]
  • Fixed compilation errors on Windows when using the ESIMD API. [3aa48db]
    [77e92ce]
  • Fixed invalid calculation in the ESIMD tanh function. [cda6680]
  • Fixed kernel_bundle errors when using ESIMD emulator devices. [9baa9d9]
  • Fixed an issue where ESIMD emulator was picked by the default selector even in
    the presence of other devices. [44d7926]
  • Fixed an error when querying an ESIMD emulator device for sub-group sizes.
    [acca608]
  • Fixed invalid behavior of the maximum sub-group size query on some OpenCL
    systems. [5998d7c]
  • Fixed an issue where the OpenCL plugin checked whether a program is supported
    on a device by looking up platform version/extensions rather than device
    ones. [9f89247]
  • Fixed the result of the free device memory query with the Level Zero backend.
    [c191fb0]
  • Fixed an issue with ext_oneapi_barrier not working when using the Level Zero
    backend. [1f8d90f]
  • Fixed a hang after submitting a barrier to a Level Zero in-order queue.
    [dd5a191]
  • Fixed an issue that occurred when submitting a barrier to a Level Zero queue
    with no prior submissions. [5c9e543]
  • Fixed a memory leak when tracking indirect access in the Level Zero plugin.
    [1b79491]
  • Fixed an invalid read issue that occurred during Level Zero event release.
    [c49eeda]
  • Fixed a synchronization issue when using device scope Level Zero events.
    [9811ef2]
  • Fixed an issue that occurred when using get_native on a newly constructed
    Level Zero queue. [5d0d4ef]
  • Fixed a segmentation fault related to events recycling in immediate command
    list mode in the Level Zero plugin. [a3e93e0] [5b021a2]
  • Fixed an issue where an invalid maximum of compute units was reported for
    Level Zero sub-sub-devices. [b9f4919]
  • Fixed a segmentation fault when using Level Zero sub-sub-devices with the
    immediate command lists mode. [ed3d35c]
  • Reverted the Level Zero plugin change that preferred using copy engine for
    memory read/write operations due to functional regressions. [44aa363]
  • Added the missing fp16 case of FMulKHR libclc function. [4372915]
  • Fixed several bugs in the barrier implementation in the CUDA and HIP plugins.
    [1c3d598] [ce7c594]
  • Fixed get_native not working for CUDA devices. [998fd91]
  • Fixed a crash when submitting a kernel with a range of 0 to a CUDA device.
    [a395886]
  • Fixed an issue where make_device produced duplicate CUDA devices.
    [75302c5]
  • Fixed an issue where the HIP backend always reported the fp64 aspect as
    missing. [cd832bf]

Known issues

  • Having MESA OpenCL implementation which provides no devices on a
    system may cause incorrect device discovery. As a workaround such an OpenCL
    implementation can be disabled by removing /etc/OpenCL/vendor/mesa.icd.
  • Compilation may fail on Windows in debug mode if a kernel uses
    std::array. This happens because debug version of std::array in
    Microsoft STL C++ headers calls functions that are illegal for the device
    code. As a workaround the following can be done:
    1. Dump compiler pipeline execution strings by passing -### option to the
      compiler. The compiler will print the internal execution strings of
      compilation tools. The actual compilation will not happen.
    2. Modify the (usually) first execution string (it should have
      -fsycl-is-device option) by adding
      -D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0 options to the
      end of the string. Execute all string one by one.
  • -fsycl-dead-args-optimization can't help eliminate offset of
    accessor even though it's created with no offset specified
  • SYCL 2020 barriers show worse performance than SYCL 1.2.1 do. [18c80fa]
  • When using fallback assert in separate compilation flow it requires explicit
    linking against lib/libsycl-fallback-cassert.o or
    lib/libsycl-fallback-cassert.spv
  • Limit alignment of allocation requests at 64KB which is the only alignment
    supported by Level Zero. 7dfaf3b
  • User-defined functions with the name and signature matching those of any
    OpenCL C built-in function (i.e. an exact match of arguments, return type
    doesn't matter) can lead to Undefined Behavior.
  • A DPC++ system that has FPGAs installed does not support multi-process
    execution. Creating a context opens the device associated with the context
    and places a lock on it for that process. No other process may use that
    device. Some queries about the device through device.get_info<>() also
    open up the device and lock it to that process since the runtime needs
    to query the actual device to obtain that information.
  • The format of the object files produced by the compiler can change between
    versions. The workaround is to rebuild the application.
  • Using sycl::kernel_bundle API to refer to a kernel defined
    in another translation unit leads to undefined behavior
  • Linkage errors with the following message:
    error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
    can happen when a SYCL application is built using MS Visual Studio 2019
    version below 16.3.0 and user specifies -std=c++14 or /std:c++14.
  • Printing internal defines isn't supported on Windows. [50628db]

This bits associated with this release have not been tested/validated for quality or functionality. They are simply the nightly build associated to the release commit that OneAPI release was based on