Releases: ROCm/aomp
AOMP Release 0.7-6
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
The source code base for this release of AOMP is the clang/llvm 9.0 stable sources as of Oct 29, 2019. The llvm-project branch used to build this release is AOMP-191029. In addition to a complete source tarball, the artifacts of this release include the 2 patches. The file llvm-project.patch shows the delta from the stable branch release/9.x of llvm-project. The file flang.patch are the changes we made to the flang repository.
Here are the changes made in this release:
- Switch to ROCm source for ROCM 3.0.
- Build and install llvm component compiler-rt.
- Created ppc64le debian package for ubuntu 1804
- Fixed examples and tests to work for any CPU type by using "uname -p" command.
- Added verification enhancements for openmpapps
- Added changes from rocmaster into master branch
- Increased default number of loaded kernels to 32 from 8, before needing env-var ATMI_MAX_KERNEL_TYPES=nnn to increase
- Allow map on reduction if -fopenmp-version=50
- Fortran target compiles add AOMP/include dir for module searches
- Modified nested parallel to not use SafeMalloc and instead preallocate loop context memory.
- Minor header changes in cloc.sh
- Adjusted raja_build.sh to avoid use of indirect function calls
- Added $ORIGIN to cmake rpath install in build scripts. This also makes the installation directory moveable. For example, one could move the compiler installation as follows "mv /usr/lib/aomp_0.7-6 /opt/rocm/aomp"
- Added new patch system to allow for unmodified repos after build is complete
- Added support for gfx908
AOMP Release 0.7-5
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
The source code base for this release of AOMP is the clang/llvm 9.0 stable sources as of Oct 8, 2019. The llvm-project branch used to build this release is AOMP-191008 which is now locked. In addition to a complete source tarball, the artifacts of this release include the 3 patches against the stable branch release/9.x of the mono-repo llvm-project.
Here are the changes made in this release.
- Move to ROCm 2.9 sources
- Updates to source build scripts to allow consistent patching to pristine ROCm source repositories.
- The comgr is now patched to use the old method of getting section name for llvm-9. The current comgr code assumes llvm-10 so it needed this patch until aomp moves to llvm-10.
- Backported clang support for f16 in builtins. This was needed to build ROCM 2.9 rocm-device-libs.
- Simplify hostcall detection
- Import non functional changes to deviceRTL from llvm master
- libm SDL now added automatically by default
- Added libm to the do not search for SDL list when linking user specified libraries during the clang-build-select-link step. This prevents a double linking of the device libm when -lm is used.
- Split aomp test repositories to a separate directory.
- Starting with this release, we will create an artifact tarball of the entire source tree. This tree includes a Makefile in the root directory used to build aomp from the release tarball. You can use spack to build AOMP from this source tarball or build manually without spack.
- Instructions to build AOMP from the release source tarball have been added to the install documentation. These instructions include the manual build and build with spack.
- Fixed HIP_DEVICE_COMPILE being active during host pass
- Added RAJA example
- Added initial -g (debug) support for target code. Can be used with the soon to be released rocm-gdb.
- The upstream flang source code as of Oct 23, 2019 has been merged into this build. Some minor fixes were required. See the commits in AOMP-191023.
- Added sollve_vv to the aomp-test repositories and a script called run_sollve.sh that patches the Makefile to skip fortran tests and patches bad tests that have map clauses for reduction variables.
AOMP Release 0.7-4
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
The source code base for this release of AOMP is the clang/llvm 9.0 stable sources as of September 9, 2019. The llvm-project branch used to build this release is AOMP-190918. The patches llvm.patch, openmp.patch, and clang.patch are the diffs from release/9.x to AOMP-190918.
These are the changes included in this release.
-
lld linkstep producing hsaco will now emit errors if undefined symbols are detected in the final image.
-
Removed hip automatic mode. Removes need for lib/Headers/cuda-open.
-
Added aompExtractRegion script which will disassemble the amd gpu kernel.
-
Fixed when a barrier in sequential code region which is handled by the master warp. The fix extends the "barrier handshaking" between the master warp and worker warps. These 5 new deviceRTL functinos begin with _kmpc_amd*. Only amdgcn uses these because nvptx uses a partial hardware barrier not availale in amdgcn.
-
Fixed reduction across teams. We still need to limit number of teams to a hard limit of 256.
-
Fix for hip detecting wrong version of code object, done with patch to ROCm 2.7 hip source code that is applied at build time. Patch will appear in ROCm 2.10
-
Fix build of rocminfo so rocminfo command finds the correct version of libhsa-runtime64.so
-
Fix to device math library in aomp extras
-
Fix to cloc.sh to support removal of hip automatic mode
-
Added support to build_aomp.sh to allow partial builds of AOMP components. Adding 'continue openmp' after calling the script will start the build process at the openmp component and continue to the end of the component list. Adding 'select libdevice extras' will only build the components specified.
-
The source tarball is just for testing on this release.
AOMP Release 0.7-3
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
The source code base for this release is the clang/llvm 9.0 development trunk as of August 2, 2019. These are the other changes included in this release.
- Fix issue with user barrier in code causing master wavefront to unintentionally wake up and continue
- Removed amdgcn_wave_barrier from named_sync and replaced with amdgcn_s_barrier
AOMP Release 0.7-2
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
The source code base for this release is the clang/llvm 9.0 development trunk as of August 2, 2019. These are the other changes included in this release.
- Fixed reduction not showing correct value on device. This was due to a full work group/block barrier being called by the worker threads which threw off synchronization between master warp and worker warps.
- Fixed compilation errors on AOMP cloc examples
AOMP Release 0.7-1
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
The source code base for this release is the clang/llvm 9.0 development trunk as of August 2, 2019. These are the other changes included in this release.
- Added logic to use the FileID and LineNum of the parent file (the includer) instead of the includee file where the target region is located. This avoids creating symbols with the same name when including a header file that has a c++ template with a target region.
- For OpenMP+HIP hip will be on when processing host bc, so clang must be told this is IR and not HIP input.
- Fixed the HIP toolchain so that the custom linker tool build-select is not called for hip applications. It is only called for openmp. This fixes problem where kernels are not seen when multiple source files are specified.
- Cleaned up some things to lessen the patch from upstream HIP.cpp
- Added the hip header hip_host_runtime_api.h to avoid modifications to the hip repository
- Added hipcc wrapper script with modifications to work from AOMP install directory
- Check if an archive contains device code for AMDGCN.
- Cleanup deviceRTL for amdgcn to prepare for common GPU deviceRTL
- Added rocminfo utilities to support hip.
- Defer issue with reductions till 0.7-2.
AOMP Release 0.7-0
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
This release is a major update from 0.6-5. The source code base for this release is the clang/llvm 9.0 development trunk as of July 15, 2019. These are the other changes included in this release.
- The package now installs in /usr/lib/aomp_0.7-X with symbolic link from /usr/lib/aomp.
- Uses build of rocm-device-libs exactly from rocm 2.6 source files.
- New untested infrastructure to eventually support fortran with flang
- Moved to the new llvm-project repository. This is the new monorepo that eliminates need for clang, llvm, lld, and openmp repositories.
- no longer build for nvptx backend, removed cuda examples
- moved utils to aomp-extras repository
- moved custom libraries from rocm-device-libs to aomp-device-libs
- hcc now build with rocm 2.6 hcc is not in the package because we only use it to build the hip runtime.
- roct and rocr are now build from rocm 2.6 sources
- comgr is now build from the rocm 2.6 sources.
- fixes for a number of new test cases
AOMP Release 0.6-5
Like 0.6-4, this release 0.6-5 of aomp is based off the stable version of clang/llvm 8.0.
These are the changes found in 0.6-5 compared to the previous 0.6-4 release.
- Added support for archives of bundles on command line.
- Created hostcall payload on system memory instead of GPU memory. This avoids cache effects of HBM memory that gets flushed only at kernel boundaries.
- Cleaned up examples.
- Readability changes to various README files in docs.
- Added SLES-15-SP1 source install dependencies and important notes for linux support.
- Emit struct of per kernel attributes.
- Detect and warn that a target exit data clause fails, rather than abort.
- Fixed linking issue when archive files contain no BC files.
AOMP Release 0.6-4
Like 0.6-3, this release 0.6-4 of aomp is based off the stable version of clang/llvm 8.0.
These are the changes found in 0.6-4 compared to the previous 0.6-3 release.
- support for building on SLES15 SP1
- rpm package for SLES15 SP1
- do not create a host thread for GPU hostcall services if no services are used by any kernel in the application. This fixes a performance regression we saw with openmpapps in 0.6-3 because none of those apps currently use printf on the device. This still needs more study.
- Reorganized the github README and linked pages to make it less confusing and to ready support for more platforms.
- removed hip wrapper scripts such as hipcc. Users must compile hip with clang++ as demonstrated in the examples to get openmp support with hip.
- properly set amdgpu-flat-work-group-size for generic mode: add wave_size
- add -lelf to link step of libomptarget.rtl.hsa.so
- more gracefully exit when gpu arch of kernel does not match device arch
- refine LIBPOMPTARGET_KERNEL_TRACE 1=>minimal, 2=>verbos'er
AOMP Release 0.6-3
Like 0.6-2, this release is based off the stable version of clang/llvm 8.0.
These changes are from 0.6-2.
- New support for synchronous services called hostcall.
- The source to support hostcall can be found in a new repository called aomp-extras in the hostcall directory
- There are minor changes to atmi to support hostcall. These are in branch atmi-0.5-063.
- Removed printf end-of-kernel service and added to hostcall. printf is now much more reliable from the gpu.
- Enhancements to toolchain to support static device libraries
- fix to correctly pickup math functions from libm-.bc . Previously it was seeing math functions as builtins.
- Suppress calls to __kmpc_push_target_count for host code, resolves undefined reference.
- Allow -frtti flag to be honored if user requests it on command line.
- Add AOMP/include path before /usr/local/include to pick up correct header for omp.h.
- Generate Metadata for both SPMD and Generic offload targets.
- Honor OMP_TEAM_LIMIT for work groups, just like OMP_NUM_TEAMS.
- Added *_wg_size symbol to reflect compile time known thread limit for a kernel.
- Added support to openmp runtimes to support 1024 threads per team/work group.
- Reenabled SILoadStoreOptimizer pass after pulling upstream fix for scalar carry corruption.
- Fixed amdgcn noinline and alwaysinline incompatibility issue for the Parallel Data Sharing Wrapper