One of the most common workflows in high-performance computing is to 1️⃣ prototype algorithms in Python and then 2️⃣ port them to C++ and CUDA. It's a simple way to prototype and test ideas quickly, but configuring the build tools for such heterogenous code + heterogeneous hardware projects is a pain, often amplified by the error-prone syntax of CMake. This project provides a pre-configured environment for such workflows...:
- using only
setup.py
andrequirements-{cpu,gpu}.txt
to manage the build process, - supporting OpenMP for parallelism on the CPU, and CUDA for GPU, and
- including CCCL libraries, like Thrust, and CUB, to simplify the code.
As an example, the repository implements, tests, and benchmarks only 2 operations - array accumulation and matrix multiplication.
The baseline Python + Numba implementations are placed in starter_kit_baseline.py
, and the optimized CUDA nd OpenMP implementations are placed in starter_kit.cu
.
If no CUDA-capable device is found, the file will be treated as a CPU-only C++ implementation.
If VSCode is used, the tasks.json
file is configured with debuggers for both CPU and GPU code, both in Python and C++.
The .clang-format
is configured with LLVM base style, adjusted for wider screens, allowing 120 characters per line.
I'd recommend forking the repository for your own projects, but you can also clone it directly:
git clone https://github.com/ashvardanian/cpp-cuda-python-starter-kit.git
cd cpp-cuda-python-starter-kit
Once pulled down, you can build the project with:
git submodule update --init --recursive # fetch CCCL libraries
pip install -r requirements-gpu.txt # or requirements-cpu.txt
pip install -e . # compile for the current platform
pytest test.py -s -x # test until first failure
python bench.py # saves charts to disk
The project is designed to be as simple as possible, with the following workflow:
- Fork or download the repository.
- Implement your baseline algorithm in
starter_kit_baseline.py
. - Implement your optimized algorithm in
starter_kit.cu
.
Beginner GPGPU:
- High-level concepts: nvidia.com
- Nvidia CuPy UDFs: cupy.dev
- CUDA in Python with Numba: numba/nvidia-cuda-tutorial
- C++ STL Parallelism on GPUs: nvidia.com
Advanced GPGPU:
- CUDA math intrinsics: nvidia.com
- Troubleshooting Nvidia hardware: stas00/ml-engineering
- Nvidia ISA Generator with SM89 and SM90 codes: kuterd/nv_isa_solver
- Multi GPU examples: nvidia/multi-gpu-programming-models
Communities: