4.1.00 (2023-06-16)
- Adding interface with execution space instance argument to support execution of BLAS on stream
- Improving BLAS level 2 support by adding native implementation and TPL for GER, HER and SYR
- Optimizing algorithms for single input data
- Adding stream support to ILUK/SPTRSV and sort/merge
- Add BsrMatrix SpMV in rocSparse TPL, rewrite BsrMatrix SpMV unit tests #1769
- sparse: Add coo2crs, crs2coo and CooMatrix #1686
- Adds team- and thread-based lower-bound and upper-bound search and predicates #1711
- Adds KokkosKernels::Impl::Iota, a view-like where iota(i) = i + offset #1710
- ODE: explicit integration methods #1754
- refactor blas3 tests to use benchmark library #1751
- batched/eti: ETI host-level interfaces #1783
- batched/dense: Add gesv DynRankView runtime checks #1850
- Add support for complex data types in MDF #1776
- Sort and merge improvements #1773
- spgemm handle: check that A,B,C graphs never change #1742
- Fix/enhance backend issues on spadd perftest #1672
- Spgemm perf test enhancements #1664
- add explicit tests of opt-in algorithms in SpMV #1712
- Added TplsVersion file and print methods #1693
- Add basis skeleton for KokkosKernels::print_configuration #1665
- Add git information to benchmark context #1722
- Test mixed scalars: more fixes related to mixed scalar tests #1694
- PERF TESTS: adding utilities and instantiation wrapper #1676
- Refactor MKL TPL for both CPU and GPU usage #1779
- MKL: support indices properly #1868
- Use rocsparse_spmv_ex for rocm >= 5.4.0 #1701
- Do not change memory spaces instantiation defaults based on Kokkos_ENABLE_CUDA_UVM #1835
- KokkosKernels: Remove TriBITS Kokkos subpackages (trilinos/Trilinos#11545) #1817
- CMakeLists.txt: Add alias to match what is exported from Trilinos #1855
- KokkosKernels: Don't list include for non-existant 'batched' build dir (trilinos/Trilinos#11966) #1867
- Remove non-existant subdir kokkos-kernels/common/common (#11921, #11863) #1854
- KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos/Trilinos#11545) #1844
- Enable sphinx werror #1856
- Update cmake option naming in docs/comments #1849
- docs/developer: Add Experimental namespace #1852
- docs: Add profiling for compile times #1843
- Ger: adding documentation stubs in apidocs #1822
- .github/workflows: Summarize github-DOCS errors and warnings #1814
- Blas1: docs update for PR #1803 #1805
- apt-get update in hosted runner docs check #1797
- scripts: Fix github-DOCS #1796
- Add --enable-docs option to cm_generate_makefile #1785
- docs: Add stubs for some sparse APIs #1768
- .github: Update to actions/checkout@v3 #1767
- docs: Include BatchedGemm #1765
- .github: Automation reminder #1726
- Allow an HTML-only docs build #1723
- SYCL CI: Specify the full path to the compiler #1670
- Add github DOCS ci check & disable Kokkos tests #1647
- Add rocsparse,rocblas, to enabled TPLs in cm_test_all_sandia when --spot-check-tpls #1841
- cm_test_all_sandia: update to add caraway queues for MI210, MI250 #1840
- Support rocSparse in rocm 5.2.0 #1833
- Add KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 support, only enable KokkosBlas::gesv where supported #1816
- scripts: Include OMP settings #1801
- Print the patch that clang-format-8 wants to apply #1714
- Benchmark cleanup for par_ilut and spmv #1853
- SpMV: adding benchmark for spmv #1821
- New performance test for par_ilut, ginkgo::par_ilut, and spill #1799
- Include OpenMP environment variables in benchmark context #1789
- Re-enable and clean up triangle counting perf test #1752
- Include google/benchmark lib version in benchmark output #1750
- Refactor blas2 test for benchmark feature #1733
- Adds a better parilut test with gmres #1661
- Refactor blas1 test for benchmark feature #1636
- Drop outdated workarounds for backward compatibility with Kokkos #1836
- Remove dead code guarded #1834
- Remove decl ETI files #1824
- Reorganize par_ilut performance test #1818
- Deprecate Kokkos::Details::ArithTraits #1748
- Drop obsolete workaround #ifdef KOKKOS_IF_ON_HOST #1720
- Drop pre Kokkos 3.6 workaround #1653
- View::Rank -> View::rank #1703
- Prefer Kokkos::View::{R->r}ank #1679
- Call concurrency(), not impl_thread_pool_size() #1666
- Kokkos moves ALL_t out of Impl namespace #1658
- Add KokkosKernels::Impl::are_integral_v helper variable template and quit using Kokkos::Impl::are_integral trait #1652
- Kokkos 4 compatibility: modifying the preprocessor logic #1827
- blas/tpls: Fix gemm include guard typo #1848
- spmv cusparse version check modified for cuda/11.1 #1828
- Workaround for #1777 - cusparse spgemm test hang #1811
- Fix 1798 #1800
- BLAS: fixes and testing for LayoutStride #1794
- Fix 1786: check that work array is contiguous in SVD #1793
- Fix unused variable warnings #1790
- Use KOKKOS_IMPL_DO_NOT_USE_PRINTF in Test_Common_UpperBound.hpp #1784
- Batched Gesv: initializing variable to make compiler happy #1778
- perf test utils: fix device ID parsing #1739
- Fix OOB and improve comments in BsrMatrix COO constructor #1732
- batched/unit_test: Disable simd dcomplex4 test in for intel > 19.05 and <= 2021. #1857
- rocsparse spmv tpl: Fix rocsparse_spmv call for rocm < 5.4.0 #1716
- compatibility with 4.0.0 #1709
- team mult: fix type issue in max_error calculation #1706
- cast Kokkos::Impl::integral_constant to int #1697
4.0.01 (2023-04-19)
- Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES #1667
- Introduce KOKKOSKERNELS_ALL_COMPONENTS_ENABLED variable #1691
- Kokkos Kernels version: need to use upper case variables #1707
- CUSPARSE_MM_ALG_DEFAULT deprecated by cuSparse 11.1 #1698
- blas1: Fix a couple documentation typos #1704
- CUDA 11.4: fixing some -Werror #1727
- Remove unused variable in KokkosSparse_spgemm_numeric_tpl_spec_decl.hpp #1734
- Reduce BatchedGemm test coverage time #1737
- Fix kk_generate_diagonally_dominant_sparse_matrix hang #1689
- Temporary spgemm workaround matching Trilinos 11663 #1757
- MDF: Minor changes to interface for ifpack2 impl #1759
- Rocm TPL support upgrade #1763
- Fix BLAS cmake check for complex types #1762
- ParIlut: Adds a better parilut test with gmres #1661
- GMRES: fixing some type issues related to memory space instantiation (partial) #1719
- ParIlut: create and destroy spgemm handle for each usage #1736
- ParIlut: remove par ilut limitations #1755
- ParIlut: make Ut_values view atomic in compute_l_u_factors #1781
4.0.0 (2023-21-02)
- ROTG: implementation of BLAS level1 rotg #1529
- ROT: adding function to rotate two vector using Givens rotation coefficients #1581
- ROTMG: adding rotmg implementation to KokkosBlas #1560
- ROTM: adding blas 1 function for modified rotation #1583
- SWAP: adding implementation of level 1 BLAS function #1612
- Add utility
KokkosSparse::removeCrsMatrixZeros
#1681 - Add spgemm TPL support for cuSparse and rocSparse #1513
- Add csr2csc #1446
- Adding my weighted graph coarsening code into kokkos-kernels #1043
- VBD/VBDBIT D1 coloring: support distributed graphs #1598
- New tests for mixed-precision GEMM, some fixes for BLAS tests with non-ETI types #1615
- Spgemm non-reuse: unification layer and TPLs #1678
- Remove "slow mem space" device ETI #1619
- First phase of SpGEMM TPL refactor #1582
- Spgemm TPL refactor #1618
- cleaned messages printed at configuration time #1616
- Batched dense tests: splitting batched dense unit-tests #1608
- sparse/unit_test: Use native spmv impl in bsr unit tests #1606
- ROT* HIP: testing and improving rocBLAS support for ROT* kernels #1594
- Add main functions for batched sparse solver performance tests #1554
- Batched sparse kernels update #1546
- supernodal SpTRSV : require invert-diag option to use SpMV #1518
- Update --verbose option in D2 coloring perftest #1486
- Modular build: allowing to build components independently #1504
- Move GMRES from example to sparse experimental #1620
- Remove Experimental::BlockCrsMatrix (replaced with Experimental::BsrMatrix) #1458
- Move {Team,TeamVector}Gemv to KokkosBlas #1435
- Move SerialGEMV to KokkosBlas #1433
- CMake: export version and subversion to config file #1680
- CMake: update package COMPATIBILITY mode in anticipation of release 4.0 #1645
- FindTPLMKL.cmake: fix naming of mkl arg to FIND_PACKAGE_HANDLE_STANDARD_ARGS #1644
- KokkosKernels: Use KOKKOSKERNELS_INCLUDE_DIRECTORIES() (TriBITSPub/TriBITS#429) #1635
- Fix docs build #1569
- KokkosKernels: Remove listing of undefined TPL deps (trilinos/Trilinos#11152) #1568
- Update nightly SYCL setup #1660
- Add github DOCS ci check & disable Kokkos tests #1647
- docs: Fix RTD build #1490
- sparse/unit_test: Disable spmv_mv_heavy for all A64FX builds #1555
- ROTMG: rocblas TPL turned off #1603
- Fix HIP nightly build on ORNL Jenkins CI server #1544
- Turn on cublas and cusparse in CLANG13CUDA10 CI check #1584
- Add clang13+cuda10 PR build #1524
- .githob/workflows: Fix redundant workflow triggers #1527
- Add GCC test options for C++17 and disable perftests for INTEL19 #1511
- Add INTEL19 and CUDA11 CI settings #1505
- .github/workflows: use c++17 #1484
- Workaround for array_sum_reduce if scalar is half_t and N is 3, 5 or 7 #1675
- Fix the nondeterministic issue in SPILUK numeric #1683
- Fix an error in Krylov Handle documentation #1659
- ROTMG: loosen unit-test tolerance for Host TPLs #1638
- SWAP: fixing obvious mistake in TPL layer : ( #1637
- Fix 1631: Use Kokkos::LayoutRight with CrsMatrix values_type (Trilinos compatibility) #1633
- Cuda/12 with CuSPARSE updates #1632
- Fix 1627: cusparse 11.0-11.3 spgemm symbolic wrapper #1628
- Make sure to call ExecutionSpace::concurrency() from an object #1614
- SPGEMM: fixing the rocsparse interface #1607
- Fix Trilinos issue 11033: remove compile time check to allow compilation with non-standard scalar types #1591
- SPMM: fixing cuSPARSE issue with incompatible compute type and op #1587
- ParILUT: convert two lambdas to functors #1580
- Update kk_get_free_total_memory for SYCL #1579
- SYCL: Use KOKKOS_IMPL_DO_NOT_USE_PRINTF instead of printf in kernels #1567
- Rotg fixes for issue 1577 #1578
- Rotg update: fixing the interface #1566
- Fix rotg eti #1534
- Fix to include KokkosBatched_Util.hpp #1565
- TeamGemvInternal: reintroduce 12-arg invoke method #1561
- Rename component options to avoid overloaded usage in Trilinos #1641
- Avoid the SIMD code branch if the batched size is not a multiple of the vector length #1552
- SYCL: Fix linking with ze_loader in Trilinos #1551
- ARMPL Fixes and Workarounds #1543
- Test_Graph_coarsen: replace HostMirror usage with auto #1538
- Fix spgemm cusparse #1535
- Warning fixes: Apple Clang complains about [-Werror,-Wunused-but-set-variable] #1532
- In src/batched/dense: Barrier after broadcast #1520
- Graph coarsen: fix test #1517
- KokkosGraph_CoarsenHeuristics: remove volatile qualifier from join #1510
- Replace capture #1502
- utils: implicit copy-assign deprecated in array_sum_reduce #1494
3.7.01 (2022-12-01)
- Use CRS matrix sort, instead of Kokkos::sort on each row #1553
- Change template type for StaticCrsGraph in BsrMatrix #1531
- Remove listing of undefined TPL deps #1568
- Fix using SpGEMM with nonstandard scalar type, with MKL enabled #1591
- Move destroying dense vector descriptors out of cuSparse sptrsv handle #1590
- Fix
cuda_data_type_from
to returnCUDA_C_64F
forKokkos::complex<double>
#1604 - Disable compile-time check in cuda_data_type_from on supported scalar types for cuSPARSE #1605
- Reduce register pressure in batched dense algorithms #1588
- Use new cusparseSpSV TPL for SPTRSV when cuSPARSE is enabled with CUDA >= 11.3 #1574
3.7.00 (2022-08-18)
- Add csc2csr #1342
- csc2csr: update Kokkos_Numeric.hpp header inclusion #1449
- sparse: Remove csc2csr copy #1375
- Added https://kokkos-kernels.readthedocs.io #1451
- Restructure docs #1368
- Add cuSparse TPL files for CrsMatrix-multivector product #1427
- Add template params to forwarding calls in deprecated KokkosKernels::… #1441
- SPILUK: Move host allocations to symbolic #1480
- trsv: remove assumptions about entry order within rows #1463
- Blas serial axpy and nrm2 #1460
- Move Set/Scale unit test to KokkosBlas #1455
- Move {Serial,Team,TeamVector} Set to KokkosBlas #1454
- Move {Serial,Team,TeamVector}Scale to KokkosBlas #1448
- Common Utils: removing dependency on Sparse Utils in Common Utils #1436
- Common cleanup #1431
- Clean-up src: re-organizing the src directory #1398
- Sparse utils namespace #1439
- dot perf test: adding support for HIP and SYCL backend #1453
- Add verbosity parameter to GMRES example. Turn off for testing. #1385
- KokkosSparse_spiluk.cpp perf test: add int-int guards to cusparse codes #1369
- perf_test/blas: Check ARMPL build version #1352
- Clean-up batched block tridiag perf test #1343
- Reduce lots of macro duplication in sparse unit tests #1340
- sycl: re-enabling test now that dpcpp has made progress #1473
- Only instantiate Kokkos's default Cuda mem space #1361
- Sparse and CI updates #1411
- Newer sparse tests were not following the new testing pattern #1356
- Add ETI for D1 coloring #1401
- Add ETI to SpAdd (symbolic and numeric) #1399
- Reformat example/fenl files changed in 1382 #1464
- Change Controls::getParameter error message from stdout to stderr #1416
- Arith traits integral nan #1438
- Kokkos_ArithTraits: re-implementation using Kokkos Core #1406
- Value-initialize result of MaxLoc reduction to avoid maybe uninitialized warning #1383
- Remove volatile qualifiers in reducer join(), init(), and operator+= methods #1382
- Update Batched GMRES #1392
- GEMV: accumulate in float for scalar = bhalf_t #1360
- Restore BLAS-1 MV paths for 1 column #1354
- Minor updates to cluster Gauss-Seidel #1372
- Add unit test for BsrMatrix and BlockCrsMatrix spmv #1338
- Refactor SPGEMM MKL Impl #1244
- D1 coloring: remove unused but set variable #1403
- Minor changes for half precision paper #1429
- Add benchmarks for us-rse escience 2022 half precision paper #1422
- TPLs: adding CUBLAS in the list of dependencies #1482
- Fix MKL build errors #1478
- Fixup drop layout template param in rank-0 views #1476
- BLAS: fixing test that access results before synching #1472
- Fix D1 color ETI with both CudaSpace and UVM #1471
- Fix arithtraits warning #1468
- Fix build when double not instantiated #1467
- Fix -Werror #1466
- Fix GitHub CI failing on broken develop #1461
- HIP: fix warning from ExecSpaceUtils and GEMV #1459
- Removes a duplicate cuda_data_type_from when KOKKOS_HALF_T_IS_FLOAT #1456
- Fix incorrect function call in KokkosBatched::TeamGEMV unit test #1444
- Fix SYCL nightly test #1419
- Fix issues with cuSparse TPL availability for BsrMatrix SpMV #1418
- SpMV: fixing issues with unit-tests tolerance #1412
- Address 1409 #1410
- Fix colliding include guards (copy-paste mistake) #1408
- src/sparse: Fix & check for fence post errors #1405
- Bspgemm fixes #1396
- Fix unused parameter warnings in GEMM test. #1381
- Fixes code deprecation warnings. #1379
- Fix sign-compare warning in SPMV perf test #1371
- Minor MKL fixes #1365
- perf_test/batched: Temporarily disable tests #1359
- Fix nightly builds following promotion of the math functions in Kokkos #1339
3.6.01 (2022-05-23)
- Improve spiluk numeric phase to avoid race conditions and processing in chunks #1390
- Improve sptrsv symbolic phase performance (level scheduling) #1380
- Restore BLAS-1 MV paths for 1 column #1354
- Fix check that view has const type #1370
- Fix check that view has const type part 2 #1394
3.6.00 (2022-02-18)
-
Kokkos Kernels is adding a new component to the library: batched sparse linear algebra.
-
Similarly to the current dense batched algorithms, the new algorithms are called from
-
the GPU and provide Team and TeamVector level of parallelism, SpMV also provides a Serial
-
call on GPU.
-
Add Batched CG and Batched GMRES #1155
-
Add Jacobi Batched preconditioner #1219
-
After introducing the BsrMatrix in release 3.5.0 new algorithms are now supporting this format.
-
For release 3.6.0 we are adding matrix-vector (matvec) multiplication and Gauss-Seidel as well as an
-
implementation of matvec that leverages tensor cores on Nvidia GPUs. More kernels are expected to
-
support the Bsr format in future releases.
-
Add Spmv for BsrMatrix #1255
-
Add BLAS to SpMV operations for BsrMatrix #1297
-
BSR format support in block Gauss-Seidel #1232
-
Experimental tensor-core SpMV for BsrMatrix #1090
-
rocBLAS and rocSPARSE TPLs are now officially supported, they can be enabled at configure time.
-
Initial kernels that can call rocBLAS are GEMV, GEMM, IAMAX and SCAL, while rocSPARSE can be
-
called for matrix-vector multiplication. Further support for TPL calls can be requested on slack
-
and by GitHub issues.
-
Tpl rocBLAS and rocSPARSE #1153
-
Add rocBLAS GEMV wrapper #1201
-
Add rocBLAS wrappers for GEMM, IAMAX, and SCAL #1230
-
SpMV: adding support for rocSPARSE TPL #1221
- bhalf: Unit test Batched GEMM #1251
- and demostrate GMRES example convergence with bhalf_t (kokkos#1300)
- Stream interface: adding stream support in GEMV and GEMM #1131
- Improve double buffering batched gemm performance #1217
- Allow choosing coloring algorithm in multicolor GS #1199
- Batched: Add armpl dgemm support #1256
- Deprecation warning: SpaceAccessibility move out of impl, see #1140 #1141
- Full Blas support on SYCL #1270
- Get sparse tests enabled and working for SYCL #1269
- Changes to make graph run on SYCL #1268
- Allow querying free/total memory for SYCL #1225
- Use KOKKOS_IMPL_DO_NOT_USE_PRINTF instead of printf in kernels #1162
- Work around hipcc size_t/int division with remainder bug #1262
- Replace std::abs with ArithTraits::abs #1312
- Batched/dense: Add Gemm_DblBuf LayoutLeft operator #1299
- KokkosKernels: adding variable that returns version as a single number #1295
- Add KOKKOSKERNELS_FORCE_SIMD macro (Fix #1040) #1290
- Rename KOKKOS_IF_{HOST,DEVICE} -> KOKKOS_IF_ON_{HOST,DEVICE} #1278
- Algo::Level{2,3}::Blocked::mb() #1265
- Batched: Use SerialOpt2 for 33 to 39 square matrices #1261
- Prune extra dependencies #1241
- Improve double buffering batched gemm perf for matrix sizes >64x64 #1239
- Improve graph color perf test #1229
- Add custom implementation for strcasecmp #1227
- Replace restrict with KOKKOS_RESTRICT #1223
- Replace array reductions in BLAS-1 MV reductions #1204
- Update MIS-2 and aggregation #1143
- perf_test/blas/blas3: Update SHAs for benchmarking #1139
- Bump ROCm version 4.2 -> 4.5 in nightly Jenkins CI build #1279
- scripts/cm_test_all_sandia: Add A64FX ci checks #1276
- github/workflows: Add osx CI #1254
- Update SYCL compiler version in CI #1247
- Do not set Kokkos variables when exporting CMake configuration #1236
- Add nightly CI check for SYCL #1190
- Update cmake minimum version to 3.16 #866
- Kokkos::Impl: removing a few more instances of throw_runtime_exception #1320
- Remove Kokkos::Impl::throw_runtime_exception from Kokkos Kernels #1294
- Remove unused memory space utility #1283
- Clean up Kokkos header includes #1282
- Remove private Kokkos header include (Cuda/Kokkos_Cuda_Half.hpp) #1281
- Avoid using #ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macro guards #1266
- Rename enumerator Impl::Exec_{PTHREADS -> THREADS} #1253
- Remove all references to the Kokkos QThreads backend #1238
- Replace more occurences of Kokkos::Impl::is_view #1234
- Do not use Kokkos::Impl::is_view #1214
- Replace Kokkos::Impl::if_c -> std::conditional #1213
- Fix bug in spmv_mv_bsrmatrix() for Ampere GPU arch #1315
- Fix std::abs calls for rocBLAS/rocSparse #1310
- cast literal 0 to fragment scalar type #1307
- Fix 1303: maintain correct #cols on A in twostage #1304
- Add dimension checking to generic spmv interface #1301
- Add missing barriers to TeamGMRES, fix vector len #1285
- Examples: fixing some issues related to type checking #1267
- Restrict BsrMatrix specialization for AMPERE and VOLTA to CUDA #1242
- Fix compilation errors for multi-vectors in kk_print_1Dview() #1231
- src/batched: Fixes #1224 #1226
- Fix SpGEMM crashing on empty rows #1220
- Fix issue #1212 #1218
- example/gmres: Specify half_t namespace #1208
- Check that ordinal types are signed #1188
- Fixing a couple of small issue with tensor core spmv #1185
- Fix #threads setting in pcg for OpenMP #1182
- SpMV: fix catch all case to avoid compiler warnings #1179
- using namespace should be scoped to prevent name clashes #1177
- using namespace should be scoped to prevent name clashes, see issue #1170 #1171
- Fix bug with mkl impl of spgemm #1167
- Add missing $ to KOKKOS_HAS_TRILINOS in sparse_sptrsv_superlu check #1160
- Small fixes to spgemm, and plug gaps in testing #1159
- SpMV: mismatch in #ifdef check and kernel specialization #1151
- Fix values dimension for block sparse matrices #1147
3.5.00 (2021-10-19)
Features:
- Batched serial SVD #1107
- Batched: Add BatchedDblBufGemm #1095
- feature/gemv rps test -- RAJAPerf Suite Version of the BLAS2 GEMV Test #1085
- Add new bsrmatrix #1077
- Adding Kokkos GMRES example #1028
- Add fast two-level mode N GEMV (#926) #939
- Batched: Add BatchedGemm interface #935
- OpenMPTarget: adding ETI and CMake logic for OpenMPTarget backend #886
Implemented enhancements Algorithms and Archs:
- Use float as accumulator for GEMV on half_t (Fix #1081) #1082
- Supernodal SpTRSV: add option to use MAGMA TPL for TRTRI #1069
- Updates for running GMRES example with half precision #1067
- src/blas/impl: Explicitly cast to LHS type for ax #1073
- Update BatchedGemm interface to match design proposal #1054
- Move dot-based GEMM out of TPL CUBLAS #1050
- Adding ArmPL option to spmv perf_test #1038
- Add (right) preconditioning to GMRES #1078
- Supernodal SpTRSV: perform TRMM only if TPL CuBLAS is enabled #1027
- Supernodal SpTRSV: support SuperLU version < 5 #1012
- perf_test/blas/blas3: Add dgemm armpl experiment #1005
- Supernodal SpTRSV: run TRMM on device for setup #983
- Merge pull request #951 from vqd8a/move_sort_ifpack2riluk #972
- Point multicolor GS: faster handling of long/bulk rows #993
- Make CRS sorting utils work with unmanaged #963
- Add sort and make sure using host mirror on host memory in kspiluk_symbolic #951
- GEMM: call GEMV instead in certain cases #948
- SpAdd performance improvements, better perf test, fix mtx reader columns #930
Implemented enhancements BuildSystem:
- Automate documentation generation #1116
- Move the batched dense files to specific directories #1098
- cmake: Update SUPERLU tpl option for Tribits #1066
- cmake/Modules: Allow user to use MAGMA_DIR from env #1007
- Supernodal SpTRSV: update TPLs requirements #997
- cmake: Add MAGMA TPL support #982
- Host only macro: adding macro to check for any device backend #940
- Prevent redundant spmv kernel instantiations (reduce library size) #937
- unit-test: refactor infrastructure to remove most *.cpp #906
Implemented enhancements Other:
- Allow reading integer mtx files into floating-point matrices #1100
- Warnings: remove -Wunused-parameter warnings in Kokkos Kernels #962
- Clean up CrsMatrix raw pointer constructor #949
- unit_test/batched: Remove *_half fns from gemm unit tests #943
- Move sorting functionality out of Impl:: #932
Incompatibilities:
- Deprecation warning: SpaceAccessibility move out of impl #1141
- Rename CUDA_SAFE_CALL to KOKKOS_IMPL_CUDA_SAFE_CALL #1130
- Workaround error with intel #1128
- gmres: disable examples for builds with ibm/xl #1123
- CrsMatrix: deprecate constructor without ncols input #1115
- perf_test/blas/blas3: Disable simd verify for cuda/10.2.2 #1093
- Replace impl/Kokkos_Timer.hpp includes with Kokkos_Timer.hpp #1074
- Remove deprecated ViewAllocateWithoutInitializing #1058
- src/sparse: spadd resolve deprecation warnings #1053
- Give full namespace path for D2 coloring #999
- Fix -Werror=deprecated errors with c++20 standard #964
- Deprecation: a deprecated function is called in the SpADD perf_test #954
Enabled tests:
- HIP: enabling all unit tests #968
- Fix build and add CI coverage for LayoutLeft=OFF #965
- Enable SYCL tests #927
- Fixup HIP nightly builds #907
Fixed Bugs:
- Fix SpGEMM for Nvidia Turing/Ampere #1118
- Fix #1111: spmv tpl instantiations #1112
- Fix C's numCols in spadd simplified interface #1102
- Fix #1089 (failing batched UTV tests) #1096
- Blas GEMM: fix early exit logic, see issue #1088 #1091
- Fix #1048: handle mode C spmv correctly in serial/openmp #1084
- src/batched: Fix multiple definitions of singleton #1072
- Fix host accessing View in non-host space #1057
- Fix559: Intel 18 has trouble with pointer in ternary expr #1042
- Work around team size AUTO issue on kepler #1020
- Supernodal SpTrsv: fix out-of-bound error #1019
- Some fixes for MAGMA TPL and gesv #1008
- Merge pull request #981 from Tech-XCorp/4005-winllvmbuild #984
- This is a PR for 4005 vs2019build, which fixes a few things on Windows #981
- Fix build for no-ETI build #977
- Fix invalid mem accesses in new GEMV kernel #961
- Kokkos_ArithTraits.hpp: Fix isInf and isNan with complex types #936
3.4.01 (2021-05-19)
Fixed Bugs:
- Windows: Fixes for Windows #981
- Sycl: ArithTraits fixes for Sycl #959
- Sparse: Added code to allow KokkosKernels coloring to accept partial colorings #938
- Sparse: Include sorting within spiluk #972
- Sparse: Fix CrsMatrix raw pointer constructor #971
- Sparse: Fix spmv Serial beta==-1 code path #947
3.4.00 (2021-04-25)
Features:
- SYCL: adding ETI and CMake logic for SYCL backend #924
Implemented enhancements Algorithms and Archs:
- Two-stage GS: add damping factors #921
- Supernodal SpTRSV, improve symbolic performance #899
- Add MKL SpMV wrapper #895
- Serial code path for spmv #893
Implemented enhancements BuildSystem:
- Cmake: Update ArmPL support #901
- Cmake: Add ARMPL TPL support #880
- IntelClang guarding __assume_aligned with !defined(clang) #878
Implemented enhancements Other:
- Add static_assert/throw in batched eigendecomp #931
- Workaround using new/delete in kernel code #925
- Blas perf_test updates #892
Fixed bugs:
- Fix ctor CrsMat mirror with CrsGraph mirror #918
- Fix nrm1, removed cublas nrminf, improved blas tests #915
- Fix and testing coverage mainly in graph coarsening #910
- Fix KokkosSparse for nightly test failure #898
- Fix view types across ternary operator #894
- Make work_view_t typedef consistent #885
- Fix supernodal SpTRSV build with serial+openmp+cuda #884
- Construct SpGEMM C with correct ncols #883
- Matrix Converter: fixing issue with deallocation after Kokkos::fininalize #882
- Fix >1024 team size error in sort_crs_* #872
- Fixing seg fault with empty matrix in kspiluk #871
3.3.01 (2021-01-18)
Fixed Bugs:
- With CuSparse enabled too many variants of SPMV were instantiated even if not requested. Up to 1GB executable size increase.
3.3.00 (2020-12-16)
Implemented enhancements:
- Add permanent RCM reordering interface, and a basic serial implementation #854
- Half_t explicit conversions #849
- Add batched gemm performance tests #838
- Add HIP support to src and perf_test #828
- Factor out coarsening #827
- Allow enabling/disabling components at configuration time #823
- HIP: CMake work on tests and ETI #820
- HIP: KokkosBatched - hip specialization #812
- Distance-2 maximal independent set #801
- Use batched TRTRI & TRMM for Supernode-sptrsv setup #797
- Initial support for half precision #794
Fixed bugs:
- Fix issue with HIP and Kokkos_ArithTraits #844
- HIP: fixing round of issues on AMD #840
- Throw an exception if BLAS GESV is not enabled #837
- Fixes -Werror for gcc with c++20 #836
- Add fallback condition to use spmv_native when cuSPARSE does not work #834
- Fix install testing refactor for inline builds #811
- HIP: fix ArithTraits to support HIP backend #809
- cuSPARSE 11: fix spgemm and spmv_struct_tunning compilation error #804
Incompatibilities:
- Remove pre-3.0 deprecated code #825
3.2.01 (2020-11-17)
Fixed bugs:
- Cpp14 Fixes: #790
3.2.00 (2020-08-19)
Implemented enhancements:
- Add CudaUVMSpace specializations for cuBLAS IAMAX and SCAL #758
- Add wiki examples #735
- Support complex_float, complex_double in cuSPARSE SPMV wrapper #726
- Add performance tests for trmm and trtri #711
- SpAdd requires output values to be zero-initialized, but this shouldnt be needed #694
- SpAdd doesnt merge entries correctly #685
- cusparse SpMV merge algorithm #670
- TPL support for SpMV #614
- Add two BLAS/LAPACK calls needed by: Sptrsv supernode #552 #589
- HashmapAccumulator has several unused members, misnamed parameters #508
Fixed bugs:
- Nightly test failure: spgemm unit tests failing on White (Power8) #780
- supernodal does not build with UVM enabled #633
3.1.01 (2020-05-04)
** Fixed bugs:**
- KokkosBatched QR PR breaking nightly tests #691
3.1.00 (2020-04-14)
Implemented enhancements:
- Two-stage & Classical Gauss-Seidel #672
- Test transpose utilities #664
- cuSPARSE spmv wrapper doesn't actually use 'mode' #650
- Distance-2 improvements #625
- FindMKL module: which mkl versions to prioritize #480
- Add SuperLU as optional CMake TPL #545
- Revamp the ETI system #460
Fixed bugs:
- 2-stage GS update breaking cuda/10+rdc build #673
- Why CrsMatrix::staticcrsgraph_type uses execution_space and not device_type? #665
- TRMM and TRTRI build failures with clang/7+cuda9+Cuda_OpenMP and gcc/5.3+OpenMP #657
- cuSPARSE spmv wrapper doesn't actually use 'mode' #650
- Block Gauss-Seidel test fails when cuSPARSE is enabled #648
- cuda uvm test failures without launch blocking - expected behavior? #636
- graph_color_d2_symmetric_double_int_int_TestExecSpace seg faults in cuda/10.1 + Volta nightly test on kokkos-dev-2 #634
- Build failures on kokkos-dev with clang/7.0.1 cuda/9.2 and blas/cublas/cusparse tpls #629
- Distance-2 improvements #625
- trsv - internal compiler error with intel/19 #607
- complex_double misalignment still breaking SPGEMM #598
- PortableNumericCHASH can't align shared memory #587
- Remove all references to Kokkos::Impl::is_same #586
- Can I run KokkosKernels spgemm with float or int32 type? #583
- Kokkos Blas: gemv segfaults #443
- Generated kokkos-kernels file names are too long and are crashing cloning Trilinos on Windows #395
3.0.00 (2020-01-27)
Implemented enhancements:
- BuildSystem: Standalone Modern CMake support #491
- Cluster GS and SGS: add cluster gauss-seidel implementation #455
- spiluk: Add sparse ILUK implementation #459
- BLAS gemm: Dot-based GEMM Cuda optimization for C = betaC + alphaA^TB - [#490]kokkos#490)
- Sorting utilities: #461
- SGS: Support multiple rhs in SGS efficiently #488
- BLAS trsm: Add support and interface for trsm #513
- BLAS iamax: Implement iamax #87
- BLAS gesv: #449
- sptrsv supernodal: Add supernodal sparse triangular solver #552
- sptrsv: Add cusparse tpl support for sparse triangular solve, cudagraphs to fallback #555
- KokkosGraph: Output colors assigned during graph coloring #444
- MatrixReader: Full matrix market support #466
Fixed bugs:
- gemm: Fix bug for complex types in fallback impl #550
- gemv: Fix degenerate matrix cases #514
- spgemm: Fix cuda build with complex_double misaligned shared memory access #500
- spgemm: Wrong team size heuristic used for SPGEMM when Kokkos deprecated=OFF #474
- dot: Improve accuracy for float and complex_float #574
- SpMV Struct: Fix bug with intel_17_0_1 #456
- readmtx: Fix invalid read due to loop condition #453
- spgemm: Fix hashmap accumulator bug yielding crashes and wrong results #402
- KokkosGraph: Fix distance-1 graph coloring segfault #275
- UniformMemoryPool: does not re-initialize chunks that are freed #530
2.9.00 (2019-06-24)
Implemented enhancements:
- KokkosBatched: Add specialization for float2, float4 and double4 #427
- KokkosBatched: Reduce VectorLength (16 to 8) #432
- KokkosBatched: Remove experimental name space for batched blas #371
- Capability: Initial sparse triangular solve capability #435
- Capability: Add support for MAGMA GESV TPL #409
- cuBLAS: Add CudaUVMSpace specializations for GEMM #397
Fixed bugs:
2.8.00 (2019-02-05)
Implemented enhancements:
- Capability, Tests: C++14 Support and Testing #351
- Capability: Batched getrs #332
- More Kernel Labels for KokkosBlas #239
- Name all parallel kernels and regions #124
Fixed bugs:
- BLAS TPL: BLAS underscore mangling #369
- BLAS TPL, Complex: Promotion 2.7.24 broke MV unit tests in Tpetra with complex types #360
- GEMM: GEMM uses wrong function for computing shared memory allocation size #368
- BuildSystem: BLAS TPL macro not properly enabled with MKL BLAS #347
- BuildSystem: make clean - errors #353
- Compiler Workaround: Internal compiler error in KokkosBatched::Experimental::TeamGemm #349
- KokkosBlas: Some KokkosBlas kernels assume default execution space #14
2.7.24 (2018-11-04)
Implemented enhancements:
- Enhance test_all_sandia script to set scalar and ordinal types #315
- Batched getri need #305
- Deterministic Coloring #271
- MKL - guard minor version for MKL v. 18 #268
- TPL Support for all BLAS functions using CuBLAS #247
- Add L1 variant to multithreaded Gauss-Seidel #240
- Multithreaded Gauss-Seidel does not support damping #221
- Guard 1-phase SpGEMM in Intel MKL #217
- generate makefile with-spaces option #98
- Add MKL version check #7
Fixed bugs:
- Perf test failures w/ just CUDA enabled #257
- Wrong signature for axpy blas functions #329
- Failing unit tests with float - unit test error checking issue #322
- cuda.graph_graph_color* COLORING_VBD test failures with cuda/9.2 + gcc/7.2 on White #317
- KokkosBatched::Experimental::SIMD<T> does not build with T=complex<float> #316
- simple test program fails using 3rdparty Eigen library #309
- KokkosBlas::dot is broken for complex, due to incorrect assumptions about Fortran ABI #307
- strides bug in kokkos tpl interface. #292
- Failing spgemm unit test with MKL #289
- Fix the block_pcg perf-test when offsets are size_t #287
- spotcheck warnings from kokkos #284
- Linking error in tpl things #282
- Build failure with clang 3.9.0 #281
- CMake modification for TPLs. #276
- KokkosBatched warnings #259
- KokkosBatched contraction length bug #258
- Small error in KokkosBatched_Gemm_Serial_Imp.hpp with SerialGemm<Trans::Transpose,*,*> #147
2.7.00 (2018-05-24)
Implemented enhancements:
- Tests: add capability to build a unit test standalone #233
- Make KokkosKernels work without KOKKOS_ENABLE_DEPRECATED_CODE #223
- Replace KOKKOS_HAVE_* FLAGS with KOKKOS_ENABLE_* #219
- Add team-based scal, mult, update, nrm2 #214
- Add team based abs #209
- Generated CPP files moving includes inside the ifdef's #199
- Implement BlockCRS in Kokkoskernels #184
- Spgemm hash promotion #171
- Batched BLAS enhancement #170
- Document & check CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS=ON in CUDA build #148
Fixed bugs:
- Update drivers in perf_tests/graph to use Kokkos::initialize() #200
- unit tests failing/hanging on Volta #188
- Inner TRSM: SIMD build error; manifests in Ifpack2 #183
- d2_graph_color doesn't have a default coloring mechanism #168
- Unit tests do not build with Serial backend #154
2.6.00 (2018-03-07)
Implemented enhancements:
Fixed bugs:
- d2_graph_color doesn't have a default coloring mechanism #168
- Build error when MKL TPL is enabled #135
2.5.00 (2017-12-15)
Implemented enhancements:
- KokkosBlas: Add GEMM interface #105
- KokkosBlas: Add GEMM default Kernel #125
- KokkosBlas: Add GEMV that wraps BLAS (and cuBLAS) #16
- KokkosSparse: Make SPMV test not print GBs of output if something goes wrong. #111
- KokkosSparse: ETI SpGEMM and Gauss Seidel and take it out of Experimental namespace #74
- BuildSystem: Fix Makesystem to correctly build library after aborted install #104
- BuildSystem: Add option ot generate_makefile.bash to define memoryspaces for instantiation #89
- BuildSystem: generate makefile tpl option #66
- BuildSystem: Add a simpler compilation script, README update etc #96
Fixed bugs:
- Internal Compiler Error GCC in GEMM #129
- Batched Team LU: bug for small team_size #110
- Compiler BUG in IBM XL pragma unrolling #92
- Fix Blas TPL enables build #77
- Batched Gemm Failure #73
- CUDA 7.5 (GCC 4.8.4) build errors #72
- Cuda BLAS tests fail with UVM if CUDA_LAUNCH_BLOCKING=1 is not defined on Kepler #51
- CrsMatrix: sumIntoValues and replaceValues incorrectly count the number of valid column indices. #11
- findRelOffset test assumes UVM #32
0.10.03 (2017-09-11)
Implemented enhancements:
- KokkosSparse: Fix unused variable warnings in spmv_impl_omp, spmv Test and graph color perf_test #63
- KokkosBlas: dot: Add unit test #15
- KokkosBlas: dot: Add special case for multivector * vector (or vector * multivector) #13
- BuildSystem: Make KokkosKernels build independently of Trilinos #1
- BuildSystem: Fix ETI System not to depend on Tpetra ETI #5
- BuildSystem: Change CMake to work with new ETI system #19
- BuildSystem: Fix TpetraKernels names to KokkosKernels #4
- BuildSystem: Trilinos/KokkosKernels reports no ETI in almost any circumstance #29
- General: Kokkos::ArithTraits<double>::nan() is very slow #35
- General: Design and Define New UnitTest infrastructure #28
- General: Move Tpetra::Details::OrdinalTraits to KokkosKernels #22
- General: Rename files and NameSpace to KokkosKernels #12
- General: PrepareStandalone: Get rid of Teuchos usage #2
- General: Fix warning with char being either signed or unsigned in ArithTraits #60
- Testing: Make all tests run with -Werror #68
Fixed bugs:
- SPGEMM Test Fails for Cuda when compiled through Trilinos #49
- Fix ArithTraits min for floating points #47
- Pthread ETI error #25
- Fix CMake Based ETI for Threads backend #46
- KokkosKernels_ENABLE_EXPERIMENTAL causes build error #59
- ArithTraits warnings in CUDA build #71
- Graph coloring build warnings #3
* This Change Log was automatically generated by github_changelog_generator