Instructions (Linux Version of This Program)

Build dependencies include;

Open a terminal interface and run:

make run to run linear buffer version of program
make run_circ_buff to run circular buffer version of program (runs slower than linear version due to implementation overly-aggressively attempting to conserve memory bandwidth; Author thought this version would perform better than linear buffer version but unfortunately the circular buffer version was designed for running fast on older GPU architectures.)
make run_tests to run kernel unit tests for linear buffer version of program
make run_tests_circ_buff to run kernel unit tests for circular buffer version of program
make profile to run nsight compute profiling for first invocation of global_mem_mergesort_step kernel within the linear buffer version of program (nsight compute MUST be installed for this to work)
make profile_circ_buff to run nsight compute profiling for first invocation of global_mem_mergesort_step kernel within the circular buffer version of program (nsight compute MUST be installed for this to work)

Note: only Linux distributions are supported for now on this main branch.

TODO

Add more detailed comments in at least the .cu source code files
Maybe add more details in this README?
~~Add support for sorting 64-bit integer types as compile-time feature~~ Author deems this not important; as this is only essentially a demo program.
~~Add unit tests at least for the CUDA kernels - Author is finding this difficult; any outside help would be appreciated; more than willing to refactor code to make unit tests easier :)~~ Done on May 13 2024 :)
~~Prevent people from entering too large array sizes based on max total VRAM (total VRAM - 512 mib basically).~~ Done on May 17 2024, and didn't even have to use any special formulas :)
Port over application to Windows maybe?

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
iterative_cuda_mergesort_circ_buff_ytl.cu		iterative_cuda_mergesort_circ_buff_ytl.cu
iterative_cuda_mergesort_ytl.cu		iterative_cuda_mergesort_ytl.cu
iterative_cuda_mergesort_ytl.h		iterative_cuda_mergesort_ytl.h
main.c		main.c
main.h		main.h
makefile		makefile
splitmix64.c		splitmix64.c
splitmix64.h		splitmix64.h
test_kernels.cu		test_kernels.cu
xoshiro256starstar.c		xoshiro256starstar.c
xoshiro256starstar.h		xoshiro256starstar.h