Build dependencies include;
-
Latest complete CUDA toolkit (CUDA 12.4 as of time of writing)
-
GNU Compiler Collection
-
Criterion for running the unit tests
Open a terminal interface and run:
make run
to run linear buffer version of programmake run_circ_buff
to run circular buffer version of program (runs slower than linear version due to implementation overly-aggressively attempting to conserve memory bandwidth; Author thought this version would perform better than linear buffer version but unfortunately the circular buffer version was designed for running fast on older GPU architectures.)make run_tests
to run kernel unit tests for linear buffer version of programmake run_tests_circ_buff
to run kernel unit tests for circular buffer version of programmake profile
to run nsight compute profiling for first invocation ofglobal_mem_mergesort_step
kernel within the linear buffer version of program (nsight compute MUST be installed for this to work)make profile_circ_buff
to run nsight compute profiling for first invocation ofglobal_mem_mergesort_step
kernel within the circular buffer version of program (nsight compute MUST be installed for this to work)
Note: only Linux distributions are supported for now on this main branch.
-
Add more detailed comments in at least the
.cu
source code files -
Maybe add more details in this README?
-
Add support for sorting 64-bit integer types as compile-time featureAuthor deems this not important; as this is only essentially a demo program. -
Add unit tests at least for the CUDA kernels - Author is finding this difficult; any outside help would be appreciated; more than willing to refactor code to make unit tests easier :)Done on May 13 2024 :) -
Prevent people from entering too large array sizes based on max total VRAM (total VRAM - 512 mib basically).Done on May 17 2024, and didn't even have to use any special formulas :) -
Port over application to Windows maybe?