Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array4: __cuda_array_interface__ v3 #30

Merged
merged 9 commits into from
Oct 17, 2022

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Mar 26, 2022

Start implementing the __cuda_array_interface__ for zero-copy data exchange on Nvidia CUDA GPUs.

Optional: accessing an external __cuda_array_interface__ object in non-owning manner as AMReX Array4:
https://github.com/cupy/cupy/blob/a5b24f91d4d77fa03e6a4dd2ac954ff9a04e21f4/cupy/core/core.pyx#L2478-L2514

In [1]: import numpy as np                                                                                                           

In [2]: x = np.array([1,2,3])                                                   

In [3]: for a in x: 
   ...:     print(a)                                                                
1
2
3

# a is still alive xD
In [4]: a                                                                       
Out[4]: 3

@ax3l ax3l added the enhancement New feature or request label Mar 26, 2022
@ax3l ax3l force-pushed the array4-cuda-array-interface branch 3 times, most recently from 0e72782 to 44eb90c Compare March 26, 2022 23:09
@ax3l ax3l force-pushed the array4-cuda-array-interface branch from 44eb90c to d2937d0 Compare April 8, 2022 07:14
@ax3l ax3l requested a review from n01r July 1, 2022 15:58
@ax3l ax3l mentioned this pull request Aug 3, 2022
@ax3l ax3l force-pushed the array4-cuda-array-interface branch 2 times, most recently from 7c0289a to f3ff788 Compare October 5, 2022 20:52
@ax3l
Copy link
Member Author

ax3l commented Oct 6, 2022

⚠️ There is an nvcc host code generation bug that we fixed with Nvidia last night. Affects CUDA Toolkit 11.4-11.8 with pybind11 (pybind/pybind11#4193)
I will ship a work-around for pybind11 (pybind/pybind11#4220) before the next CUDA release, pls use an older NVCC (e.g. 11.3) in the meantime.

Update: patch that unlocks that broken compiler range in pybind/pybind11#4220
Add this to CMake:

cmake -S . -B build -DAMREX_GPU_BACKEND=CUDA -DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git -DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8

@ax3l
Copy link
Member Author

ax3l commented Oct 6, 2022

@RemiLehe build logic from README.md is this:

So concretely:

# Python packages if not already installed as described
python3 -m pip install -U pip setuptools wheel
python3 -m pip install -U cmake pytest
python3 -m pip install -U -r requirements.txt

# depending on what you try
python3 -m pip install cupy-cuda11x
python3 -m pip install numba
python3 -m pip install torch
# configure once (unless changing backend or versions heavily)
cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA \
    -DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git \
    -DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8
# rinse & repeat: builds, packages & runs pip install
cmake --build build --target pip_install -j 8

and tests:

# Run all tests
python3 -m pytest tests/

# Run tests from a single file
python3 -m pytest tests/test_array4.py

# Run a single test (useful during debugging)
python3 -m pytest tests/test_array4.py::test_array4_cupy
python3 -m pytest tests/test_multifab.py::test_mfab_ops_cuda_cupy

# Run all tests, do not capture "print" output and be verbose
python3 -m pytest -s -vvvv tests/test_array4.py

and with nsight:

nsys profile -f true -t cuda,nvtx,osrt python3 -m pytest -s -vvv tests/test_multifab.py::test_mfab_ops_cuda_cupy

GUI:

nsight-sys

@ax3l ax3l mentioned this pull request Oct 6, 2022
@ax3l
Copy link
Member Author

ax3l commented Oct 6, 2022

Found a tiny bug, will rebase after #77 was merged. - Update: done.

Found another arena bug, will rebase after #78 was merged. - Update: done.

@ax3l ax3l force-pushed the array4-cuda-array-interface branch from f3ff788 to 7fe40bd Compare October 6, 2022 21:05
@ax3l ax3l requested review from RemiLehe and removed request for n01r October 6, 2022 21:14
@ax3l ax3l force-pushed the array4-cuda-array-interface branch 3 times, most recently from d0f2dec to 5395043 Compare October 7, 2022 03:16
@ax3l
Copy link
Member Author

ax3l commented Oct 7, 2022

Screenshot from 2022-10-06 20-24-00
First cupy progress. Gotta learn how to do in-place updates on arrays in kernels...

@ax3l ax3l mentioned this pull request Oct 7, 2022
@ax3l ax3l force-pushed the array4-cuda-array-interface branch from 5395043 to df349c5 Compare October 7, 2022 18:06
tests/test_multifab.py Outdated Show resolved Hide resolved
@ax3l ax3l mentioned this pull request Oct 11, 2022
5 tasks
@ax3l
Copy link
Member Author

ax3l commented Oct 14, 2022

With the new MFIter::Finalize, I can also see the cudaStreamSynchronize calls at the end of the iteration :)
Screenshot from 2022-10-13 18-00-34
Screenshot from 2022-10-13 18-01-38

@ax3l ax3l force-pushed the array4-cuda-array-interface branch from f5138a9 to a7fc736 Compare October 14, 2022 07:12
@ax3l ax3l changed the title [WIP] Array4: __cuda_array_interface__ v2 Array4: __cuda_array_interface__ v2 Oct 14, 2022
tests/test_multifab.py Outdated Show resolved Hide resolved
@ax3l ax3l mentioned this pull request Oct 17, 2022
ax3l and others added 6 commits October 16, 2022 20:27
Start implementing the `__cuda_array_interface__` for zero-copy
data exchange on Nvidia CUDA GPUs.
Since `for` loops create no scope in Python, we need to trigger
finalize logic, including stream syncs, before the destructor of
`MultiFab` iterators are called.
incl. 3D kernel launch
src/Base/Array4.cpp Outdated Show resolved Hide resolved
@ax3l ax3l force-pushed the array4-cuda-array-interface branch from a6a1199 to 6eb2da4 Compare October 17, 2022 04:35
@ax3l ax3l changed the title Array4: __cuda_array_interface__ v2 Array4: __cuda_array_interface__ v3 Oct 17, 2022
A bit tricky to implement this caster as new constructor.
Not currently needed, but adds comments where to do this.
@ax3l
Copy link
Member Author

ax3l commented Oct 17, 2022

Wuup, wuup. First part done.
Larger tests and particles next :)

@ax3l ax3l enabled auto-merge (squash) October 17, 2022 07:26
@ax3l ax3l merged commit 16ce636 into AMReX-Codes:development Oct 17, 2022
@ax3l ax3l deleted the array4-cuda-array-interface branch October 17, 2022 07:28
@ax3l ax3l mentioned this pull request Oct 17, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants