C++ & CUDA Starter Kit for Python Developers

One of the most common workflows in high-performance computing is to 1️⃣ prototype algorithms in Python and then 2️⃣ port them to C++ and CUDA. It's a simple way to prototype and test ideas quickly, but configuring the build tools for such heterogenous code + heterogeneous hardware projects is a pain, often amplified by the error-prone syntax of CMake. This project provides a pre-configured environment for such workflows...:

using only setup.py and requirements-{cpu,gpu}.txt to manage the build process,
supporting OpenMP for parallelism on the CPU, and CUDA for GPU, and
including CCCL libraries, like Thrust, and CUB, to simplify the code.

As an example, the repository implements, tests, and benchmarks only 2 operations - array accumulation and matrix multiplication. The baseline Python + Numba implementations are placed in starter_kit_baseline.py, and the optimized CUDA nd OpenMP implementations are placed in starter_kit.cu. If no CUDA-capable device is found, the file will be treated as a CPU-only C++ implementation. If VSCode is used, the tasks.json file is configured with debuggers for both CPU and GPU code, both in Python and C++. The .clang-format is configured with LLVM base style, adjusted for wider screens, allowing 120 characters per line.

Installation

I'd recommend forking the repository for your own projects, but you can also clone it directly:

git clone https://github.com/ashvardanian/cpp-cuda-python-starter-kit.git
cd cpp-cuda-python-starter-kit

Once pulled down, you can build the project with:

git submodule update --init --recursive     # fetch CCCL libraries
pip install -r requirements-gpu.txt         # or requirements-cpu.txt
pip install -e .                            # compile for the current platform
pytest test.py -s -x                        # test until first failure
python bench.py                             # saves charts to disk

Workflow

The project is designed to be as simple as possible, with the following workflow:

Fork or download the repository.
Implement your baseline algorithm in starter_kit_baseline.py.
Implement your optimized algorithm in starter_kit.cu.

Reading Materials

Beginner GPGPU:

High-level concepts: nvidia.com
Nvidia CuPy UDFs: cupy.dev
CUDA in Python with Numba: numba/nvidia-cuda-tutorial
C++ STL Parallelism on GPUs: nvidia.com

Advanced GPGPU:

CUDA math intrinsics: nvidia.com
Troubleshooting Nvidia hardware: stas00/ml-engineering
Nvidia ISA Generator with SM89 and SM90 codes: kuterd/nv_isa_solver
Multi GPU examples: nvidia/multi-gpu-programming-models

Communities:

CUDA MODE on Discord
r/CUDA on Reddit
NVIDIA Developer Forums on DevTalk

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
cccl @ a3a5f9c		cccl @ a3a5f9c
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
requirements-cpu.txt		requirements-cpu.txt
requirements-gpu.txt		requirements-gpu.txt
setup.py		setup.py
starter_kit.cu		starter_kit.cu
starter_kit_baseline.py		starter_kit_baseline.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C++ & CUDA Starter Kit for Python Developers

Installation

Workflow

Reading Materials

About

Languages

License

ashvardanian/cpp-cuda-python-starter-kit

Folders and files

Latest commit

History

Repository files navigation

C++ & CUDA Starter Kit for Python Developers

Installation

Workflow

Reading Materials

About

Topics

Resources

License

Stars

Watchers

Forks

Languages