Releases: coreylowman/cudarc
Releases Β· coreylowman/cudarc
v0.11.1
What's Changed
- CUDA 12.4 by @bitemyapp in #227
- Moving cublasHgemm link to dynamic loading by @coreylowman in #229
New Contributors
- @bitemyapp made their first contribution in #227
Full Changelog: v0.11.0...v0.11.1
v0.11.0
What's Changed
- Added result and safe functionality to support the cuFuncSetAttribute function. by @GaryMcD in #199
- Implemented LaunchAsync for raw ptr arrays by @jafioti in #204
- Safe ConvolutionND by @bloodre in #202
- Fixed Test Example 07 by @Fiend-Star-666 in #205
- Fix pipelines by @coreylowman in #207
- Adding cuda toolkit versions 12.0/12.1/12.2 of sys & feature flags by @coreylowman in #210
- support cuda toolkit version: 11.7 by @wenhaozhao in #214
- Adding cuda 12.3 bindings by @coreylowman in #215
- [Breaking] Remove Static linking. Make dynamic loading default. Add dynamic-linking feature flag by @coreylowman in #211
- Rework cuda versioning features - Users must always specify a version feature instead of having default behavior. by @coreylowman in #216
- Replace unwrap with ? in CudaDevice::new() by @coreylowman in #223
- Update README.md by @chaserileyroberts in #218
- Add CudaDevice::new_with_stream by @coreylowman in #224
- Wrapping comm_split in cfg based on cuda version by @coreylowman in #225
New Contributors
- @GaryMcD made their first contribution in #199
- @jafioti made their first contribution in #204
- @bloodre made their first contribution in #202
- @Fiend-Star-666 made their first contribution in #205
- @wenhaozhao made their first contribution in #214
- @chaserileyroberts made their first contribution in #218
Full Changelog: v0.10.0...v0.11.0
v0.10.0
What's Changed
- feat: add support for batch matrix-matrix product in cuBLASLt by @OlivierDehaene in #186
- [Breaking] Update dtoh_sync_copy by @zjsec in #183
- [Breaking]
cublaslt::result::get_matmul_algo_heuristic
is now unsafe by @coreylowman in #189 - fix: fix cublaslt objects memory leak by @OlivierDehaene in #192
New Contributors
Full Changelog: v0.9.15...v0.10.0
v0.9.15
What's Changed
- Fixing 07-build-workflow. by @Narsil in #175
- Making the tests pass on single GPU platforms. by @Narsil in #176
- Fix typo in docs about for_num_elems of LaunchConfig by @mert-kurttutan in #177
- Minimum changes for occupancy API by @Jark5455 in #180
- feat: Add support for cublas Lt by @OlivierDehaene in #182
- Add set_offset so that curandSetGeneratorOffset() can be called. by @mneilly in #185
New Contributors
- @mert-kurttutan made their first contribution in #177
- @Jark5455 made their first contribution in #180
- @OlivierDehaene made their first contribution in #182
- @mneilly made their first contribution in #185
Full Changelog: v0.9.14...v0.9.15
v0.9.14 - fixes bug when async alloc/free isn't available
What's Changed
Full Changelog: v0.9.13...v0.9.14
v0.9.13 - nccl support, gemm<bf16>, other small updates
What's Changed
- Implement Gemm for CudaBlas. by @LaurentMazare in #167
- NCCL support by @Narsil in #164
- Adds handle method to CudaBlas by @coreylowman in #170
- Adds Ptx::to_src method by @coreylowman in #171
New Contributors
- @LaurentMazare made their first contribution in #167
Full Changelog: v0.9.12...v0.9.13
v0.9.12 - fix multi threading/multi device context management
What's Changed
- Adding RAII version for the profiler. by @Narsil in #162
- Calling bind_to_thread on all api calls by @coreylowman in #166
Full Changelog: v0.9.11...v0.9.12
v0.9.11 - bump half-rs version
v0.9.10
What's Changed
- Use compute type f32 for f16 matmul by @coreylowman in #151
- Add functions to interact with external memory by @ViliamVadocz in #152
- Adds an attribute method for device and a .rustfmt.toml by @emchristiansen in #154
- Improve external memory interface by @ViliamVadocz in #155
- WIP: Add occupancy to driver::result by @tthebst in #150
New Contributors
- @emchristiansen made their first contribution in #154
Full Changelog: v0.9.9...v0.9.10
v0.9.9
What's Changed
- Add
lib
as a library candidate in build.rs by @coreylowman in #142 - Adds Conv2dDescriptor::set_group_count by @coreylowman in #145
- Adding methods to get reference to internal sys handles of CudaDevice by @coreylowman in #147
Full Changelog: v0.9.8...v0.9.9