Skip to content

knl_configure

Mehmet Deveci edited this page Apr 23, 2018 · 9 revisions

1. Set up Kokkos

You can use the latest version of Kokkos. The experiments of the paper "Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures" uses kokkos-version: 2.04.04

mkdir $HOME/kokkoskernels_spgemm_benchmark
cd $HOME/kokkoskernels_spgemm_benchmark
git clone git@github.com:kokkos/kokkos.git

2. Get KokkosKernels

cd $HOME/kokkoskernels_spgemm_benchmark
git clone git@github.com:kokkos/kokkos-kernels.git

Update: As of 03/07/2018, these functionalities are in the master branch. No need to check out the develop branch.

Currently KokkosKernels-spgemm updates are not on the master branch yet (12/20/2017). Checkout the develop branch.

cd $HOME/kokkoskernels_spgemm_benchmark/kokkos-kernels
git checkout master #git checkout develop #outdated as of 03/07/2018.

3. Update the compileKokkosKernels.sh located at example/buildlib.

cd $HOME/kokkoskernels_spgemm_benchmark/kokkos-kernels/example/buildlib
vi compileKokkosKernels.sh

Below is the example of compileKokkosKernels.sh for KNLs.

KOKKOS_PATH=${HOME}/kokkoskernels_spgemm_benchmark/kokkos #path to kokkos source
KOKKOSKERNELS_SCALARS='double' #we only need double
KOKKOSKERNELS_LAYOUTS=LayoutLeft #the layout types to instantiate.
KOKKOSKERNELS_ORDINALS=int #ordinal types to instantiate
KOKKOSKERNELS_OFFSETS=int #offset types to instantiate
KOKKOSKERNELS_PATH=${HOME}/kokkoskernels_spgemm_benchmark/kokkos-kernels #path to kokkos-kernels top directory.
KOKKOSKERNELS_OPTIONS=eti-only #options for kokkoskernels  
CXXFLAGS="-Wall -pedantic -Werror -O3 -g -Wshadow -Wsign-compare -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized"
CXX=icpc
KOKKOS_DEVICES=OpenMP #we only need openmp execution space.
KOKKOS_ARCHS=KNL        #!!!!!!!!!!!!!!!!!!specify the architecture for compilation!!!!!!!!!!!!!!!!!!!
KOKKOSKERNELS_TPLS=mkl #if you want to be able to run mkl functionalities as well. otherwise place "".

../../scripts/generate_makefile.bash --kokkoskernels-path=${KOKKOSKERNELS_PATH} --with-scalars=${KOKKOSKERNELS_SCALARS} --with-ordinals=${KOKKOSKERNELS_ORDINALS} --with-offsets=${KOKKOSKERNELS_OFFSETS} --kokkos-path=${KOKKOS_PATH} --with-devices=${KOKKOS_DEVICES} --arch=${KOKKOS_ARCHS} --compiler=${CXX} --with-options=${KOKKOSKERNELS_OPTIONS}  --cxxflags="${CXXFLAGS}" --with-tpls=${KOKKOSKERNELS_TPLS}

Set the compiler. If you use mkl, export the mkl path.

module load intel/compilers/18.0.128
export MKL_PATH=/home/projects/x86-64-knl/intel/...../linux/mkl/

4. Compile KokkosKernels.

cd $HOME/kokkoskernels_spgemm_benchmark/kokkos-kernels/example/buildlib
./compileKokkosKernels.sh
make build-test -j

5- Running Benchmarks.

Below we show how to run benchmarks using KNL-CACHE mode.

  • Allocate the node using appropriate scheduling command.
  • Download a UFL sparse matrix. We are showing it on audikw_1 in this example.
  • Each is run 6 times, which can be changed using "repeat" keyword ("repeat 15" to repeat 15 times.)
  • First run is always discarded as warm-up. For each algorithm below, we run from 64 to 256 threads.
  • I am using ".bin" files below for faster I/O handles. ".mtx" files can also be used, based on the suffix correct reader will be called. But for faster experimenting, you can use KokkosKernels_MatrixConverter.exe as below for converint mtx files to bin files.
./KokkosKernels_MatrixConverter.exe --in_mtx audikw_1.mtx --out_mtx audikw_1.bin
  • Set the environment variables, go to the executables folder.
export OMP_PROC_BIND=spread 
export OMP_PLACES=threads
cd $HOME/kokkoskernels_spgemm_benchmark/kokkos-kernels/example/buildlib/perf_test
Running default algorithm: KKSPGEMM. Best Runtime: ~1.37 seconds
bash-4.2$ OMP_NUM_THREADS=64 ./KokkosSparse_spgemm.exe --openmp 64 --amtx  audikw_1.bin 
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 64 x 1 ]
Using A matrix for B as well
mm_time:2.48323 symbolic_time:0.32976 numeric_time:2.15347
mm_time:2.47841 symbolic_time:0.323874 numeric_time:2.15454
mm_time:2.47767 symbolic_time:0.322463 numeric_time:2.15521
mm_time:2.47479 symbolic_time:0.322538 numeric_time:2.15225
mm_time:2.47393 symbolic_time:0.322712 numeric_time:2.15122
mm_time:2.47576 symbolic_time:0.322544 numeric_time:2.15321

bash-4.2$ OMP_NUM_THREADS=128 ./KokkosSparse_spgemm.exe --openmp 128 --amtx  audikw_1.bin 
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 128 x 1 ]
Using A matrix for B as well
mm_time:1.59176 symbolic_time:0.21535 numeric_time:1.37641
mm_time:1.58634 symbolic_time:0.211642 numeric_time:1.37469
mm_time:1.58279 symbolic_time:0.209543 numeric_time:1.37325
mm_time:1.59248 symbolic_time:0.210353 numeric_time:1.38213
mm_time:1.59114 symbolic_time:0.210148 numeric_time:1.38099
mm_time:1.59152 symbolic_time:0.209913 numeric_time:1.38161

bash-4.2$ OMP_NUM_THREADS=256 ./KokkosSparse_spgemm.exe --openmp 256 --amtx  audikw_1.bin 
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 256 x 1 ]
Using A matrix for B as well
mm_time:1.38192 symbolic_time:0.204276 numeric_time:1.17764
mm_time:1.38627 symbolic_time:0.212812 numeric_time:1.17346
mm_time:1.36824 symbolic_time:0.196392 numeric_time:1.17185
mm_time:1.36981 symbolic_time:0.198274 numeric_time:1.17153
mm_time:1.37078 symbolic_time:0.196811 numeric_time:1.17397
mm_time:1.36552 symbolic_time:0.196207 numeric_time:1.16932
Running KKMEM. Best Runtime: ~1.42 seconds
bash-4.2$ OMP_NUM_THREADS=64 ./KokkosSparse_spgemm.exe --openmp 64 --amtx  audikw_1.bin  --algorithm kkmem
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 64 x 1 ]
Using A matrix for B as well
mm_time:2.61748 symbolic_time:0.469393 numeric_time:2.14808
mm_time:2.61755 symbolic_time:0.469112 numeric_time:2.14844
mm_time:2.61899 symbolic_time:0.469484 numeric_time:2.14951
mm_time:2.61537 symbolic_time:0.466099 numeric_time:2.14927
mm_time:2.61856 symbolic_time:0.470032 numeric_time:2.14853
mm_time:2.61432 symbolic_time:0.466054 numeric_time:2.14827

bash-4.2$ OMP_NUM_THREADS=128 ./KokkosSparse_spgemm.exe --openmp 128 --amtx  audikw_1.bin  --algorithm kkmem
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 128 x 1 ]
Using A matrix for B as well
mm_time:1.69112 symbolic_time:0.310673 numeric_time:1.38045
mm_time:1.69136 symbolic_time:0.307821 numeric_time:1.38353
mm_time:1.68288 symbolic_time:0.3067 numeric_time:1.37618
mm_time:1.69087 symbolic_time:0.306877 numeric_time:1.38399
mm_time:1.68863 symbolic_time:0.306124 numeric_time:1.3825
mm_time:1.68767 symbolic_time:0.305851 numeric_time:1.38182

bash-4.2$ OMP_NUM_THREADS=256 ./KokkosSparse_spgemm.exe --openmp 256 --amtx  audikw_1.bin  --algorithm kkmem
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 256 x 1 ]
Using A matrix for B as well
mm_time:1.43602 symbolic_time:0.265443 numeric_time:1.17058
mm_time:1.42847 symbolic_time:0.257696 numeric_time:1.17078
mm_time:1.42843 symbolic_time:0.2532 numeric_time:1.17524
mm_time:1.42418 symbolic_time:0.253221 numeric_time:1.17096
mm_time:1.42461 symbolic_time:0.253872 numeric_time:1.17074
mm_time:1.42362 symbolic_time:0.252966 numeric_time:1.17066

Running KKDENSE. Best Runtime: ~1.09 seconds
bash-4.2$ OMP_NUM_THREADS=64 ./KokkosSparse_spgemm.exe --openmp 64 --amtx  audikw_1.bin  --algorithm kkdense 
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 64 x 1 ]
Using A matrix for B as well
mm_time:1.70767 symbolic_time:0.321831 numeric_time:1.38584
mm_time:1.70334 symbolic_time:0.318716 numeric_time:1.38462
mm_time:1.70252 symbolic_time:0.318125 numeric_time:1.3844
mm_time:1.70231 symbolic_time:0.317844 numeric_time:1.38446
mm_time:1.70372 symbolic_time:0.318226 numeric_time:1.3855
mm_time:1.70266 symbolic_time:0.318109 numeric_time:1.38455

bash-4.2$ OMP_NUM_THREADS=128 ./KokkosSparse_spgemm.exe --openmp 128 --amtx  audikw_1.bin  --algorithm kkdense 
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 128 x 1 ]
Using A matrix for B as well
mm_time:1.16144 symbolic_time:0.215078 numeric_time:0.946365
mm_time:1.15385 symbolic_time:0.211481 numeric_time:0.942369
mm_time:1.15591 symbolic_time:0.21047 numeric_time:0.945443
mm_time:1.15433 symbolic_time:0.209749 numeric_time:0.944576
mm_time:1.15591 symbolic_time:0.209757 numeric_time:0.946151
mm_time:1.15422 symbolic_time:0.209543 numeric_time:0.944674

bash-4.2$ OMP_NUM_THREADS=256 ./KokkosSparse_spgemm.exe --openmp 256 --amtx  audikw_1.bin  --algorithm kkdense 
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 256 x 1 ]
Using A matrix for B as well
mm_time:1.09092 symbolic_time:0.206483 numeric_time:0.884432
mm_time:1.09683 symbolic_time:0.213442 numeric_time:0.883391
mm_time:1.09282 symbolic_time:0.210769 numeric_time:0.882047
mm_time:1.0984 symbolic_time:0.212993 numeric_time:0.885408
mm_time:1.09628 symbolic_time:0.212053 numeric_time:0.884225
mm_time:1.09678 symbolic_time:0.208386 numeric_time:0.888396
Running MKL-INSPECTOR: Best Runtime: ~1.50 seconds

To fit to the use of SpGEMM in Trilinos, mkl-inspector is called twice for both in symbolic and numeric phases and there are some post-processing. To benchmark its runtime, we exclude all these post-processing by providing the "--verbose --mklkeepout 0" to executable. The timings that we take into account is "Actual DOUBLE MKL SPMM Time Without Free", rather then previous mm_time, symbolic_time and numeric time. Note that first call is much more expensive than the rest, which we exclude.

bash-4.2$ OMP_NUM_THREADS=64 ./KokkosSparse_spgemm.exe --openmp 64 --amtx  audikw_1.bin  --algorithm mkl --mklkeepout 0 --verbose
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 64 x 1 ]
Using A matrix for B as well
m:943695 n:943695 k:943695
Actual DOUBLE MKL SPMM Time Without Free:3.74993
Actual DOUBLE MKL SPMM Time:3.75037
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:2.72329
Actual DOUBLE MKL SPMM Time:2.72369
mm_time:6.75529 symbolic_time:3.98355 numeric_time:2.77174
Actual DOUBLE MKL SPMM Time Without Free:2.72345
Actual DOUBLE MKL SPMM Time:2.7239
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:2.72306
Actual DOUBLE MKL SPMM Time:2.7235
mm_time:5.71684 symbolic_time:2.94666 numeric_time:2.77018
Actual DOUBLE MKL SPMM Time Without Free:2.72385
Actual DOUBLE MKL SPMM Time:2.72433
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:2.72334
Actual DOUBLE MKL SPMM Time:2.72379
mm_time:5.71811 symbolic_time:2.94657 numeric_time:2.77154
Actual DOUBLE MKL SPMM Time Without Free:2.72288
Actual DOUBLE MKL SPMM Time:2.72333
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:2.72343
Actual DOUBLE MKL SPMM Time:2.72389
mm_time:5.71555 symbolic_time:2.94453 numeric_time:2.77102
Actual DOUBLE MKL SPMM Time Without Free:2.72238
Actual DOUBLE MKL SPMM Time:2.72283
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:2.72248
Actual DOUBLE MKL SPMM Time:2.72295
mm_time:5.71422 symbolic_time:2.94366 numeric_time:2.77056
Actual DOUBLE MKL SPMM Time Without Free:2.72335
Actual DOUBLE MKL SPMM Time:2.72379
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:2.72312
Actual DOUBLE MKL SPMM Time:2.72357
mm_time:5.71585 symbolic_time:2.94496 numeric_time:2.77089

bash-4.2$ OMP_NUM_THREADS=128 ./KokkosSparse_spgemm.exe --openmp 128 --amtx  audikw_1.bin  --algorithm mkl --mklkeepout 0 --verbose
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 128 x 1 ]
Using A matrix for B as well
m:943695 n:943695 k:943695
Actual DOUBLE MKL SPMM Time Without Free:1.95746
Actual DOUBLE MKL SPMM Time:1.95805
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.69292
Actual DOUBLE MKL SPMM Time:1.69333
mm_time:3.98053 symbolic_time:2.22713 numeric_time:1.7534
Actual DOUBLE MKL SPMM Time Without Free:1.69162
Actual DOUBLE MKL SPMM Time:1.69203
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.69158
Actual DOUBLE MKL SPMM Time:1.69204
mm_time:3.71019 symbolic_time:1.95521 numeric_time:1.75497
Actual DOUBLE MKL SPMM Time Without Free:1.68733
Actual DOUBLE MKL SPMM Time:1.68773
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.69338
Actual DOUBLE MKL SPMM Time:1.69379
mm_time:3.70548 symbolic_time:1.95155 numeric_time:1.75392
Actual DOUBLE MKL SPMM Time Without Free:1.68911
Actual DOUBLE MKL SPMM Time:1.68953
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.6801
Actual DOUBLE MKL SPMM Time:1.68051
mm_time:3.69032 symbolic_time:1.95066 numeric_time:1.73966
Actual DOUBLE MKL SPMM Time Without Free:1.6984
Actual DOUBLE MKL SPMM Time:1.69883
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.68243
Actual DOUBLE MKL SPMM Time:1.68283
mm_time:3.70284 symbolic_time:1.95989 numeric_time:1.74294
Actual DOUBLE MKL SPMM Time Without Free:1.68301
Actual DOUBLE MKL SPMM Time:1.68343
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.67717
Actual DOUBLE MKL SPMM Time:1.67759
mm_time:3.68133 symbolic_time:1.94262 numeric_time:1.73872

bash-4.2$ OMP_NUM_THREADS=256 ./KokkosSparse_spgemm.exe --openmp 256 --amtx  audikw_1.bin  --algorithm mkl --mklkeepout 0 --verbose
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 256 x 1 ]
Using A matrix for B as well
m:943695 n:943695 k:943695
Actual DOUBLE MKL SPMM Time Without Free:2.23312
Actual DOUBLE MKL SPMM Time:2.23403
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.48377
Actual DOUBLE MKL SPMM Time:1.48443
mm_time:4.11499 symbolic_time:2.56608 numeric_time:1.54891
Actual DOUBLE MKL SPMM Time Without Free:1.49798
Actual DOUBLE MKL SPMM Time:1.49866
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.47959
Actual DOUBLE MKL SPMM Time:1.48028
mm_time:3.3549 symbolic_time:1.81261 numeric_time:1.54229
Actual DOUBLE MKL SPMM Time Without Free:1.51514
Actual DOUBLE MKL SPMM Time:1.51578
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.51419
Actual DOUBLE MKL SPMM Time:1.51482
mm_time:3.41489 symbolic_time:1.83635 numeric_time:1.57855
Actual DOUBLE MKL SPMM Time Without Free:1.4894
Actual DOUBLE MKL SPMM Time:1.49008
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.51534
Actual DOUBLE MKL SPMM Time:1.51602
mm_time:3.38422 symbolic_time:1.80693 numeric_time:1.57729
Actual DOUBLE MKL SPMM Time Without Free:1.49757
Actual DOUBLE MKL SPMM Time:1.49819
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.4856
Actual DOUBLE MKL SPMM Time:1.48628
mm_time:3.36753 symbolic_time:1.81743 numeric_time:1.5501
Actual DOUBLE MKL SPMM Time Without Free:1.49857
Actual DOUBLE MKL SPMM Time:1.49925
C SIZE:0
Actual DOUBLE MKL SPMM Time Without Free:1.51736
Actual DOUBLE MKL SPMM Time:1.51806
mm_time:3.40202 symbolic_time:1.81891 numeric_time:1.58311
Running 2 Phase MKL with sort option 7 (No sorting). Best Time: 3.89 seconds.

There are some pre-and post processing because of the methods working on 1-base (Fortran Style). The symbolic time is printed at "Actual MKL2 Symbolic Time" and numeric time is printed "Actual MKL2 Numeric Time", which are the times surrounding mkl spgemm calls excluding the pre/post processing.

bash-4.2$ OMP_NUM_THREADS=64 ./KokkosSparse_spgemm.exe --openmp 64 --amtx  audikw_1.bin  --algorithm mkl2 --verbose --mklsort 7
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 64 x 1 ]
Using A matrix for B as well
m:943695 n:943695 k:943695
Sort:7 Actual MKL2 Symbolic Time:1.49217
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.61316
mm_time:4.20866 symbolic_time:1.52024 numeric_time:2.68843
Sort:7 Actual MKL2 Symbolic Time:1.46548
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.62388
mm_time:4.19539 symbolic_time:1.49614 numeric_time:2.69926
Sort:7 Actual MKL2 Symbolic Time:1.45934
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.6158
mm_time:4.17825 symbolic_time:1.48679 numeric_time:2.69145
Sort:7 Actual MKL2 Symbolic Time:1.45455
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.61726
mm_time:4.17625 symbolic_time:1.48371 numeric_time:2.69254
Sort:7 Actual MKL2 Symbolic Time:1.45365
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.61739
mm_time:4.17523 symbolic_time:1.48239 numeric_time:2.69284
Sort:7 Actual MKL2 Symbolic Time:1.45837
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.61674
mm_time:4.179 symbolic_time:1.48681 numeric_time:2.69219

bash-4.2$ OMP_NUM_THREADS=128 ./KokkosSparse_spgemm.exe --openmp 128 --amtx  audikw_1.bin  --algorithm mkl2 --verbose --mklsort 7
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 128 x 1 ]
Using A matrix for B as well
m:943695 n:943695 k:943695
Sort:7 Actual MKL2 Symbolic Time:1.49747
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.50591
mm_time:4.08896 symbolic_time:1.52182 numeric_time:2.56713
Sort:7 Actual MKL2 Symbolic Time:1.39741
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.50679
mm_time:3.99857 symbolic_time:1.41981 numeric_time:2.57876
Sort:7 Actual MKL2 Symbolic Time:1.39001
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.50741
mm_time:3.97166 symbolic_time:1.41301 numeric_time:2.55865
Sort:7 Actual MKL2 Symbolic Time:1.3902
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.50657
mm_time:3.97159 symbolic_time:1.41346 numeric_time:2.55813
Sort:7 Actual MKL2 Symbolic Time:1.39268
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.5075
mm_time:3.97499 symbolic_time:1.41702 numeric_time:2.55797
Sort:7 Actual MKL2 Symbolic Time:1.38985
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.5079
mm_time:3.97231 symbolic_time:1.41345 numeric_time:2.55886

bash-4.2$ OMP_NUM_THREADS=256 ./KokkosSparse_spgemm.exe --openmp 256 --amtx  audikw_1.bin  --algorithm mkl2 --verbose --mklsort 7
B is not provided. Multiplying AxA.
Kokkos::OpenMP thread_pool_topology[ 1 x 256 x 1 ]
Using A matrix for B as well
m:943695 n:943695 k:943695
Sort:7 Actual MKL2 Symbolic Time:1.46614
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.5107
mm_time:4.05512 symbolic_time:1.49686 numeric_time:2.55826
Sort:7 Actual MKL2 Symbolic Time:1.39502
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.51181
mm_time:4.0016 symbolic_time:1.42182 numeric_time:2.57978
Sort:7 Actual MKL2 Symbolic Time:1.38827
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.51616
mm_time:3.97749 symbolic_time:1.41435 numeric_time:2.56314
Sort:7 Actual MKL2 Symbolic Time:1.38636
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.51356
mm_time:3.99414 symbolic_time:1.41294 numeric_time:2.5812
Sort:7 Actual MKL2 Symbolic Time:1.38773
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.51418
mm_time:3.97602 symbolic_time:1.41369 numeric_time:2.56233
Sort:7 Actual MKL2 Symbolic Time:1.38804
C SIZE:662878935
Sort:7 Actual MKL2 Numeric Time:2.51376
mm_time:3.99734 symbolic_time:1.41556 numeric_time:2.58178

Clone this wiki locally