Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md
gpu_arch_vs2017.sln		gpu_arch_vs2017.sln
gpu_arch_vs2017.vcxproj		gpu_arch_vs2017.vcxproj
gpu_arch_vs2017.vcxproj.filters		gpu_arch_vs2017.vcxproj.filters
gpu_arch_vs2019.sln		gpu_arch_vs2019.sln
gpu_arch_vs2019.vcxproj		gpu_arch_vs2019.vcxproj
gpu_arch_vs2019.vcxproj.filters		gpu_arch_vs2019.vcxproj.filters
gpu_arch_vs2022.sln		gpu_arch_vs2022.sln
gpu_arch_vs2022.vcxproj		gpu_arch_vs2022.vcxproj
gpu_arch_vs2022.vcxproj.filters		gpu_arch_vs2022.vcxproj.filters
main.hip		main.hip

README.md

HIP-Basic GPU Architecture-specific Code Example

Description

This program showcases an implementation of a simple matrix transpose kernel, which uses a different codepath depending on the target architecture.

Application flow

A number of constants are defined to control the problem details and the kernel launch parameters.
Input matrix is set up in host memory.
The necessary amount of device memory is allocated and input is copied to the device.
The GPU transposition kernel is launched with previously defined arguments.
The kernel will have two different codepaths for its data movement, depending on the target architecture.
The transposed matrix is copied back to the host and all device memory is freed.
The elements of the result matrix are compared with the expected result. The result of the comparison is printed to the standard output.

Key APIs and Concepts

This example showcases two different codepaths inside a GPU kernel, depending on the target architecture.

You may want to use architecture-specific inline assembly when compiling for a specific architecture, without losing compatibility with other architectures (see the inline_assembly example).

These architecture-specific compiler definitions only exist within GPU kernels. If you would like to have GPU architecture-specific host-side code, you could query the stream/device information at runtime.

Demonstrated API Calls

HIP runtime

Device symbols

threadIdx, blockIdx, blockDim
__gfx1010__, __gfx1011__, __gfx1012__, __gfx1030__, __gfx1031__, __gfx1100__, __gfx1101__, __gfx1102__

Host symbols

hipMalloc
hipMemcpy
hipGetLastError
hipFree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu_arch

gpu_arch

README.md

HIP-Basic GPU Architecture-specific Code Example

Description

Application flow

Key APIs and Concepts

Demonstrated API Calls

HIP runtime

Device symbols

Host symbols

Files

gpu_arch

Directory actions

More options

Directory actions

More options

Latest commit

History

gpu_arch

Folders and files

parent directory

README.md

HIP-Basic GPU Architecture-specific Code Example

Description

Application flow

Key APIs and Concepts

Demonstrated API Calls

HIP runtime

Device symbols

Host symbols