Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submit benchmark files here #8

Open
IanButterworth opened this issue May 10, 2020 · 90 comments
Open

Submit benchmark files here #8

IanButterworth opened this issue May 10, 2020 · 90 comments
Labels
help wanted Extra attention is needed

Comments

@IanButterworth
Copy link
Owner

IanButterworth commented May 10, 2020

It would be great to be able to compare performance across the many platforms being used.
If you're happy to share your benchmark information for comparison, please submit them here and they'll be added to the repo.

  1. Run
pkg> up             #make sure to use the latest version of SystemBenchmark
julia> using SystemBenchmark
julia> res = runbenchmark();
julia> savebenchmark("result.txt", res)
  1. Drag the file onto a new comment on this thread, and github will upload it and provide a link

result.txt

  1. Feel free to also paste the full table, to make browsing results easier
julia> show(res, allcols=true, allrows=true)
25×3 DataFrame
│ Row │ cat         │ testname          │ res                                      │
│     │ String      │ String            │ Any                                      │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.7.0)        │
...

thanks!

Edit: You can now collect all the results posted in this issue by running

getsubmittedbenchmarks()

Also they are periodically updated here https://docs.google.com/spreadsheets/d/15Ldyq4n9cflXPDR63CQe6QwJCWedjvo2vaYJ0w2hhYo/edit#gid=0

@nilshg
Copy link

nilshg commented May 10, 2020

result.txt

│ Row │ cat         │ testname          │ res                                      │
│     │ String      │ String            │ Any                                      │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │
│ 3   │ info        │ OS                │ Linux (x86_64-pc-linux-gnu)              │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │
│ 6   │ info        │ LIBM              │ libopenlibm                              │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, sandybridge)      │
│ 8   │ info        │ GPU               │ missing                                  │
│ 9   │ cpu         │ FloatMul          │ 1.798e-6                                 │
│ 10  │ cpu         │ FusedMulAdd       │ 2.7e-8                                   │
│ 11  │ cpu         │ FloatSin          │ 5.706e-6                                 │
│ 12  │ cpu         │ VecMulBroad       │ 5.0198e-5                                │
│ 13  │ cpu         │ CPUMatMul         │ 0.065571                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.0054505                                │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0017365                                │
│ 16  │ cpu         │ peakflops         │ 2.92354e10                               │
│ 17  │ cpu         │ FFMPEGH264Write   │ 226.593                                  │
│ 18  │ mem         │ DeepCopy          │ 0.000245867                              │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.067218                                 │
│ 20  │ diskio      │ DiskWrite1MB      │ 3.3915                                   │
│ 21  │ diskio      │ DiskRead1KB       │ 0.015667                                 │
│ 22  │ diskio      │ DiskRead1MB       │ 0.199129                                 │
│ 23  │ loading     │ JuliaLoad         │ 154.651                                  │
│ 24  │ compilation │ compilecache      │ 393.196                                  │
│ 25  │ compilation │ create_expr_cache │ 1.80841                                  │

@CarloLucibello
Copy link

laptop: MSI Prestige 15
OS: Manjaro Linux

results.txt

│ Row │ cat         │ testname          │ res                                       │
│     │ String      │ String            │ Any                                       │
├─────┼─────────────┼───────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                     │
│ 3   │ info        │ OS                │ Linux (x86_64-pc-linux-gnu)               │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                        │
│ 6   │ info        │ LIBM              │ libopenlibm                               │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)           │
│ 8   │ info        │ GPU               │ GeForce GTX 1650 with Max-Q Design        │
│ 9   │ cpu         │ FloatMul          │ 1.54e-6                                   │
│ 10  │ cpu         │ FusedMulAdd       │ 2.1e-8                                    │
│ 11  │ cpu         │ FloatSin          │ 5.352e-6                                  │
│ 12  │ cpu         │ VecMulBroad       │ 3.4502e-5                                 │
│ 13  │ cpu         │ CPUMatMul         │ 0.0471405                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.0042535                                 │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0012537                                 │
│ 16  │ cpu         │ peakflops         │ 4.3538e10                                 │
│ 17  │ cpu         │ FFMPEGH264Write   │ 128.135                                   │
│ 18  │ gpu         │ GPUMatMul         │ 0.00536025                                │
│ 19  │ mem         │ DeepCopy          │ 0.000182196                               │
│ 20  │ diskio      │ DiskWrite1KB      │ 0.032851                                  │
│ 21  │ diskio      │ DiskWrite1MB      │ 1.64405                                   │
│ 22  │ diskio      │ DiskRead1KB       │ 0.00666683                                │
│ 23  │ diskio      │ DiskRead1MB       │ 0.150346                                  │
│ 24  │ loading     │ JuliaLoad         │ 98.9018                                   │
│ 25  │ compilation │ compilecache      │ 357.893                                   │
│ 26  │ compilation │ create_expr_cache │ 1.08519                                   │

@yakir12
Copy link

yakir12 commented May 10, 2020

result.txt

│ Row │ cat         │ testname          │ res                                            │
│     │ String      │ String            │ Any                                            │
├─────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                          │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                          │
│ 3   │ info        │ OS                │ Linux (x86_64-linux-gnu)                       │
│ 4   │ info        │ CPU               │ AMD Ryzen Threadripper 2950X 16-Core Processor │
│ 5   │ info        │ WORD_SIZE         │ 64                                             │
│ 6   │ info        │ LIBM              │ libopenlibm                                    │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, znver1)                 │
│ 8   │ info        │ GPU               │ missing                                        │
│ 9   │ cpu         │ FloatMul          │ 1.163e-6                                       │
│ 10  │ cpu         │ FusedMulAdd       │ 2.0e-8                                         │
│ 11  │ cpu         │ FloatSin          │ 3.066e-6                                       │
│ 12  │ cpu         │ VecMulBroad       │ 2.94643e-5                                     │
│ 13  │ cpu         │ CPUMatMul         │ 0.054506                                       │
│ 14  │ cpu         │ MatMulBroad       │ 0.00396512                                     │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0013856                                      │
│ 16  │ cpu         │ peakflops         │ 1.90993e11                                     │
│ 17  │ cpu         │ FFMPEGH264Write   │ 160.665                                        │
│ 18  │ mem         │ DeepCopy          │ 0.000176566                                    │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.0259                                         │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.869498                                       │
│ 21  │ diskio      │ DiskRead1KB       │ 0.00591286                                     │
│ 22  │ diskio      │ DiskRead1MB       │ 0.113459                                       │
│ 23  │ loading     │ JuliaLoad         │ 135.932                                        │
│ 24  │ compilation │ compilecache      │ 248.265                                        │
│ 25  │ compilation │ create_expr_cache │ 1.34587

@samuelpowell
Copy link

result.txt

│ Row │ cat         │ testname          │ res                                       │
│     │ String      │ String            │ Any                                       │
├─────┼─────────────┼───────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                     │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.7.0)         │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                        │
│ 6   │ info        │ LIBM              │ libopenlibm                               │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)           │
│ 8   │ info        │ GPU               │ missing                                   │
│ 9   │ cpu         │ FloatMul          │ 1.908e-6                                  │
│ 10  │ cpu         │ FusedMulAdd       │ 3.5e-8                                    │
│ 11  │ cpu         │ FloatSin          │ 5.507e-6                                  │
│ 12  │ cpu         │ VecMulBroad       │ 5.08291e-5                                │
│ 13  │ cpu         │ CPUMatMul         │ 0.0395035                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.0233609                                 │
│ 15  │ cpu         │ 3DMulBroad        │ 0.00169095                                │
│ 16  │ cpu         │ peakflops         │ 1.41646e11                                │
│ 17  │ cpu         │ FFMPEGH264Write   │ 312.755                                   │
│ 18  │ mem         │ DeepCopy          │ 0.00021407                                │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.204504                                  │
│ 20  │ diskio      │ DiskWrite1MB      │ 2.51347                                   │
│ 21  │ diskio      │ DiskRead1KB       │ 0.945095                                  │
│ 22  │ diskio      │ DiskRead1MB       │ 2.06309                                   │
│ 23  │ loading     │ JuliaLoad         │ 242.645                                   │
│ 24  │ compilation │ compilecache      │ 435.073                                   │
│ 25  │ compilation │ create_expr_cache │ 11.3237                                   │

@samuelpowell
Copy link

result.txt

│ Row │ cat         │ testname          │ res                                   │
│     │ String      │ String            │ Any                                   │
├─────┼─────────────┼───────────────────┼───────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                 │
│ 2   │ info        │ JuliaVer          │ 1.4.0                                 │
│ 3   │ info        │ OS                │ Windows (x86_64-w64-mingw32)          │
│ 4   │ info        │ CPU               │ Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                    │
│ 6   │ info        │ LIBM              │ libopenlibm                           │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)       │
│ 8   │ info        │ GPU               │ GeForce RTX 2060                      │
│ 9   │ cpu         │ FloatMul          │ 1.4e-6                                │
│ 10  │ cpu         │ FusedMulAdd       │ 1.0e-9                                │
│ 11  │ cpu         │ FloatSin          │ 4.699e-6                              │
│ 12  │ cpu         │ VecMulBroad       │ 3.86099e-5                            │
│ 13  │ cpu         │ CPUMatMul         │ 0.040299                              │
│ 14  │ cpu         │ MatMulBroad       │ 0.00515729                            │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0011501                             │
│ 16  │ cpu         │ peakflops         │ 1.56923e11                            │
│ 17  │ cpu         │ FFMPEGH264Write   │ 145.829                               │
│ 18  │ gpu         │ GPUMatMul         │ 0.00399375                            │
│ 19  │ mem         │ DeepCopy          │ 0.000195275                           │
│ 20  │ diskio      │ DiskWrite1KB      │ 0.285601                              │
│ 21  │ diskio      │ DiskWrite1MB      │ 0.621399                              │
│ 22  │ diskio      │ DiskRead1KB       │ 0.1719                                │
│ 23  │ diskio      │ DiskRead1MB       │ 0.364                                 │
│ 24  │ loading     │ JuliaLoad         │ 201.747                               │
│ 25  │ compilation │ compilecache      │ 347.45                                │
│ 26  │ compilation │ create_expr_cache │ 5.8203                                │

@samuelpowell
Copy link

@ianshmean a bit of AArch64 goodness for you (CM3+)

result.txt

│ Row │ cat         │ testname          │ res                                │
│     │ String      │ String            │ Any                                │
├─────┼─────────────┼───────────────────┼────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                              │
│ 2   │ info        │ JuliaVer          │ 1.4.1                              │
│ 3   │ info        │ OS                │ Linux (aarch64-unknown-linux-gnu)  │
│ 4   │ info        │ CPU               │ unknown                            │
│ 5   │ info        │ WORD_SIZE         │ 64                                 │
│ 6   │ info        │ LIBM              │ libopenlibm                        │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, cortex-a53) │
│ 8   │ info        │ GPU               │ missing                            │
│ 9   │ cpu         │ FloatMul          │ 5.989e-6                           │
│ 10  │ cpu         │ FusedMulAdd       │ 1.04e-7                            │
│ 11  │ cpu         │ FloatSin          │ 6.43187e-5                         │
│ 12  │ cpu         │ VecMulBroad       │ 0.000393305                        │
│ 13  │ cpu         │ CPUMatMul         │ 0.233069                           │
│ 14  │ cpu         │ MatMulBroad       │ 0.030989                           │
│ 15  │ cpu         │ 3DMulBroad        │ 0.010833                           │
│ 16  │ cpu         │ peakflops         │ 3.63517e9                          │
│ 17  │ cpu         │ FFMPEGH264Write   │ 1088.86                            │
│ 18  │ mem         │ DeepCopy          │ 0.0018072                          │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.556815                           │
│ 20  │ diskio      │ DiskWrite1MB      │ 59.0121                            │
│ 21  │ diskio      │ DiskRead1KB       │ 0.118331                           │
│ 22  │ diskio      │ DiskRead1MB       │ 2.17862                            │
│ 23  │ loading     │ JuliaLoad         │ 780.837                            │
│ 24  │ compilation │ compilecache      │ 1865.61                            │
│ 25  │ compilation │ create_expr_cache │ 80.5051                            │

@knuesel
Copy link

knuesel commented May 10, 2020

result.txt

│ Row │ cat         │ testname          │ ref_res                                  │ test_res                        │ factor    │
│     │ String      │ String            │ Any                                      │ Any                             │ Any       │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┼─────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │ 0.2.0                           │ Equal     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │ 1.4.0                           │ Not Equal │
│ 3   │ info        │ OS                │ Linux (x86_64-pc-linux-gnu)              │ Linux (x86_64-pc-linux-gnu)     │ Equal     │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz │ 06/8e                           │ Not Equal │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │ 64                              │ Equal     │
│ 6   │ info        │ LIBM              │ libopenlibm                              │ libopenlibm                     │ Equal     │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)          │ libLLVM-8.0.1 (ORCJIT, skylake) │ Equal     │
│ 8   │ info        │ GPU               │ GeForce GTX 1650 with Max-Q Design       │ missing                         │ Not Equal │
│ 9   │ cpu         │ FloatMul          │ 1.133e-6                                 │ 3.505e-6                        │ 3.09356   │
│ 10  │ cpu         │ FusedMulAdd       │ 1.6e-8                                   │ 1.506e-6                        │ 94.125    │
│ 11  │ cpu         │ FloatSin          │ 3.616e-6                                 │ 8.71271e-6                      │ 2.40949   │
│ 12  │ cpu         │ VecMulBroad       │ 2.9462311557788944e-5                    │ 5.35548e-5                      │ 1.81774   │
│ 13  │ cpu         │ CPUMatMul         │ 0.024109                                 │ 0.0378585                       │ 1.57031   │
│ 14  │ cpu         │ MatMulBroad       │ 0.004257                                 │ 0.00486671                      │ 1.14323   │
│ 15  │ cpu         │ 3DMulBroad        │ 0.001154                                 │ 0.00212495                      │ 1.84138   │
│ 16  │ cpu         │ peakflops         │ 9.865361310009984e10                     │ 6.78874e10                      │ 0.688139  │
│ 17  │ cpu         │ FFMPEGH264Write   │ 135.685863                               │ 252.592                         │ 1.86159   │
│ 18  │ mem         │ DeepCopy          │ 0.0006343389553862894                    │ 0.000278083                     │ 0.438382  │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.031644                                 │ 0.03119                         │ 0.985653  │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.855818                                 │ 213.207                         │ 249.127   │
│ 21  │ diskio      │ DiskRead1KB       │ 0.006407583333333334                     │ 0.0191815                       │ 2.99356   │
│ 22  │ diskio      │ DiskRead1MB       │ 0.143273                                 │ 0.669384                        │ 4.67209   │
│ 23  │ loading     │ JuliaLoad         │ 100.1638935                              │ 156.32                          │ 1.56064   │
│ 24  │ compilation │ compilecache      │ 266.788588                               │ 349.425                         │ 1.30974   │
│ 25  │ compilation │ create_expr_cache │ 1.0525595                                │ 6.50128                         │ 6.17664   │

Regarding the bad DiskWrite1MB performance: this is on a Pixelbook with eMMC hard drive, and the Linux VM in Chrome OS uses Btrfs which has very bad performance on fsync, maybe that explains it.

PS: the readme says "writeBenchmark" instead of "useBenchmark"

@rsrock
Copy link

rsrock commented May 10, 2020

results.txt

│ Row │ cat         │ testname          │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String            │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │ 0.2.0                                    │ Equal     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │ 1.4.2-pre.0                              │ Not Equal │
│ 3   │ info        │ OS                │ Linux (x86_64-pc-linux-gnu)              │ macOS (x86_64-apple-darwin19.4.0)        │ Not Equal │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz │ Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz │ Not Equal │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │ 64                                       │ Equal     │
│ 6   │ info        │ LIBM              │ libopenlibm                              │ libopenlibm                              │ Equal     │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)          │ libLLVM-8.0.1 (ORCJIT, skylake)          │ Equal     │
│ 8   │ info        │ GPU               │ GeForce GTX 1650 with Max-Q Design       │ missing                                  │ Not Equal │
│ 9   │ cpu         │ FloatMul          │ 1.133e-6                                 │ 1.717e-6                                 │ 1.51545   │
│ 10  │ cpu         │ FusedMulAdd       │ 1.6e-8                                   │ 3.9e-8                                   │ 2.4375    │
│ 11  │ cpu         │ FloatSin          │ 3.616e-6                                 │ 5.706e-6                                 │ 1.57799   │
│ 12  │ cpu         │ VecMulBroad       │ 2.9462311557788944e-5                    │ 3.83967e-5                               │ 1.30325   │
│ 13  │ cpu         │ CPUMatMul         │ 0.024109                                 │ 0.016659                                 │ 0.690987  │
│ 14  │ cpu         │ MatMulBroad       │ 0.004257                                 │ 0.00257483                               │ 0.604847  │
│ 15  │ cpu         │ 3DMulBroad        │ 0.001154                                 │ 0.00159441                               │ 1.38164   │
│ 16  │ cpu         │ peakflops         │ 9.865361310009984e10                     │ 2.26222e11                               │ 2.29309   │
│ 17  │ cpu         │ FFMPEGH264Write   │ 135.685863                               │ 215.933                                  │ 1.59142   │
│ 18  │ mem         │ DeepCopy          │ 0.0006343389553862894                    │ 0.000168586                              │ 0.265766  │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.031644                                 │ 0.120015                                 │ 3.79268   │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.855818                                 │ 0.696444                                 │ 0.813776  │
│ 21  │ diskio      │ DiskRead1KB       │ 0.006407583333333334                     │ 0.072804                                 │ 11.3622   │
│ 22  │ diskio      │ DiskRead1MB       │ 0.143273                                 │ 0.889994                                 │ 6.21188   │
│ 23  │ loading     │ JuliaLoad         │ 100.1638935                              │ 214.356                                  │ 2.14005   │
│ 24  │ compilation │ compilecache      │ 266.788588                               │ 349.208                                  │ 1.30893   │
│ 25  │ compilation │ create_expr_cache │ 1.0525595                                │ 10.6762                                  │ 10.1431   │

@IanButterworth
Copy link
Owner Author

Thanks for all the submissions. You can now pull them all in simply by running:

using SystemBenchmark
getsubmittedbenchmarks()

Also, I fixed the formatting of some function names

@rsrock
Copy link

rsrock commented May 10, 2020

So something is up with writing on macOS. On that same system, if I run

❯ time dd if=/dev/zero bs=1k of=tstfile count=1
1+0 records in
1+0 records out
1024 bytes transferred in 0.000035 secs (29217465 bytes/sec)
dd if=/dev/zero bs=1k of=tstfile count=1  0.00s user 0.00s system 78% cpu 0.004 total

❯ time dd if=/dev/zero bs=1024k of=tstfile count=1
1+0 records in
1+0 records out
1048576 bytes transferred in 0.000657 secs (1596387118 bytes/sec)
dd if=/dev/zero bs=1024k of=tstfile count=1  0.00s user 0.00s system 74% cpu 0.004 total

So 1 KB takes about 30 µs, and 1 MB takes 660 µs. Looks like some overhead with creating small files somewhere.

@IanButterworth
Copy link
Owner Author

@rsrock Indeed, but the most affected seems to be the read tests. So it's likely be a file open or close issue also. I've opened a dedicated issue for discussion #11

@natgeo-wong
Copy link

natgeo-wong commented May 10, 2020

│ Row │ cat         │ testname          │ res                                       │
│     │ String      │ String            │ Any                                       │
├─────┼─────────────┼───────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                     │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.7.0)         │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                        │
│ 6   │ info        │ LIBM              │ libopenlibm                               │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)           │
│ 8   │ info        │ GPU               │ missing                                   │
│ 9   │ cpu         │ FloatMul          │ 1.664e-6                                  │
│ 10  │ cpu         │ FusedMulAdd       │ 4.2e-8                                    │
│ 11  │ cpu         │ FloatSin          │ 4.766e-6                                  │
│ 12  │ cpu         │ VecMulBroad       │ 4.66572e-5                                │
│ 13  │ cpu         │ CPUMatMul         │ 0.0344175                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.0194464                                 │
│ 15  │ cpu         │ 3DMulBroad        │ 0.00161                                   │
│ 16  │ cpu         │ peakflops         │ 1.22617e11                                │
│ 17  │ cpu         │ FFMPEGH264Write   │ 160.024                                   │
│ 18  │ mem         │ DeepCopy          │ 0.000189145                               │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.21866                                   │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.373484                                  │
│ 21  │ diskio      │ DiskRead1KB       │ 0.597162                                  │
│ 22  │ diskio      │ DiskRead1MB       │ 1.75296                                   │
│ 23  │ loading     │ JuliaLoad         │ 169.336                                   │
│ 24  │ compilation │ compilecache      │ 316.72                                    │
│ 25  │ compilation │ create_expr_cache │ 7.97103                                   │

result.txt

@mcabbott
Copy link

mcabbott commented May 10, 2020

Macbook pro: result_laptop.txt

25×3 DataFrames.DataFrame
│ Row │ cat         │ testname          │ res                                      │
│     │ String      │ String            │ Any                                      │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │
│ 2   │ info        │ JuliaVer          │ 1.4.0                                    │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.6.0)        │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │
│ 6   │ info        │ LIBM              │ libopenlibm                              │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)          │
│ 8   │ info        │ GPU               │ missing                                  │
│ 9   │ cpu         │ FloatMul          │ 1.429e-6                                 │
│ 10  │ cpu         │ FusedMulAdd       │ 3.3e-8                                   │
│ 11  │ cpu         │ FloatSin          │ 5.055e-6                                 │
│ 12  │ cpu         │ VecMulBroad       │ 4.49343e-5                               │
│ 13  │ cpu         │ CPUMatMul         │ 0.041218                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.0185765                                │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0016012                                │
│ 16  │ cpu         │ peakflops         │ 8.87194e10                               │
│ 17  │ cpu         │ FFMPEGH264Write   │ 175.483                                  │
│ 18  │ mem         │ DeepCopy          │ 0.000173413                              │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.118836                                 │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.652163                                 │
│ 21  │ diskio      │ DiskRead1KB       │ 0.216321                                 │
│ 22  │ diskio      │ DiskRead1MB       │ 0.735228                                 │
│ 23  │ loading     │ JuliaLoad         │ 276.418                                  │
│ 24  │ compilation │ compilecache      │ 341.886                                  │
│ 25  │ compilation │ create_expr_cache │ 10.53                                    │

Desktop: result.txt

julia> show(res, allrows=true)
26×3 DataFrames.DataFrame
│ Row │ cat         │ testname          │ res                                     │
│     │ String      │ String            │ Any                                     │
├─────┼─────────────┼───────────────────┼─────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                   │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                   │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.7.0)       │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                      │
│ 6   │ info        │ LIBM              │ libopenlibm                             │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)         │
│ 8   │ info        │ GPU               │ GeForce GTX 650 Ti                      │
│ 9   │ cpu         │ FloatMul          │ 1.139e-6                                │
│ 10  │ cpu         │ FusedMulAdd       │ 2.4e-8                                  │
│ 11  │ cpu         │ FloatSin          │ 3.6e-6                                  │
│ 12  │ cpu         │ VecMulBroad       │ 3.69079e-5                              │
│ 13  │ cpu         │ CPUMatMul         │ 0.018225                                │
│ 14  │ cpu         │ MatMulBroad       │ 0.00436661                              │
│ 15  │ cpu         │ 3DMulBroad        │ 0.00129644                              │
│ 16  │ cpu         │ peakflops         │ 2.65532e11                              │
│ 17  │ cpu         │ FFMPEGH264Write   │ 144.52                                  │
│ 18  │ gpu         │ GPUMatMul         │ 0.0284571                               │
│ 19  │ mem         │ DeepCopy          │ 0.000149745                             │
│ 20  │ diskio      │ DiskWrite1KB      │ 0.100387                                │
│ 21  │ diskio      │ DiskWrite1MB      │ 1.88479                                 │
│ 22  │ diskio      │ DiskRead1KB       │ 0.030504                                │
│ 23  │ diskio      │ DiskRead1MB       │ 2.95043                                 │
│ 24  │ loading     │ JuliaLoad         │ 191.332                                 │
│ 25  │ compilation │ compilecache      │ 293.941                                 │
│ 26  │ compilation │ create_expr_cache │ 7.05235                                 │

@sairus7
Copy link

sairus7 commented May 10, 2020

There were 3 warnings during test:
Cannot write cache file "C:\Users\user\AppData\Local\Temp\jl_XMbvcu\compiled\v1.4\ExampleModule.ji".

result.txt

│ Row │ cat         │ testname          │ ref_res                                  │ test_res                                 │ factor    │
│     │ String      │ String            │ Any                                      │ Any                                      │ Any       │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┼──────────────────────────────────────────┼───────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │ 0.2.0                                    │ Equal     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │ 1.4.0                                    │ Not Equal │
│ 3   │ info        │ OS                │ Linux (x86_64-pc-linux-gnu)              │ Windows (x86_64-w64-mingw32)             │ Not Equal │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz │ Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz │ Not Equal │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │ 64                                       │ Equal     │
│ 6   │ info        │ LIBM              │ libopenlibm                              │ libopenlibm                              │ Equal     │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)          │ libLLVM-8.0.1 (ORCJIT, skylake)          │ Equal     │
│ 8   │ info        │ GPU               │ GeForce GTX 1650 with Max-Q Design       │ missing                                  │ Not Equal │
│ 9   │ cpu         │ FloatMul          │ 1.133e-6                                 │ 1.201e-6                                 │ 1.06002   │
│ 10  │ cpu         │ FusedMulAdd       │ 1.6e-8                                   │ 1.0e-9                                   │ 0.0625    │
│ 11  │ cpu         │ FloatSin          │ 3.616e-6                                 │ 3.7e-6                                   │ 1.02323   │
│ 12  │ cpu         │ VecMulBroad       │ 2.9462311557788944e-5                    │ 3.41046e-5                               │ 1.15757   │
│ 13  │ cpu         │ CPUMatMul         │ 0.024109                                 │ 0.0403                                   │ 1.67157   │
│ 14  │ cpu         │ MatMulBroad       │ 0.004257                                 │ 0.00508333                               │ 1.19411   │
│ 15  │ cpu         │ 3DMulBroad        │ 0.001154                                 │ 0.00118                                  │ 1.02253   │
│ 16  │ cpu         │ peakflops         │ 9.865361310009984e10                     │ 1.8362e11                                │ 1.86126   │
│ 17  │ cpu         │ FFMPEGH264Write   │ 135.685863                               │ 149.871                                  │ 1.10454   │
│ 18  │ mem         │ DeepCopy          │ 0.0006343389553862894                    │ 0.000158159                              │ 0.249328  │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.031644                                 │ 3.2736                                   │ 103.451   │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.855818                                 │ 4.08205                                  │ 4.76976   │
│ 21  │ diskio      │ DiskRead1KB       │ 0.006407583333333334                     │ 0.0783                                   │ 12.2199   │
│ 22  │ diskio      │ DiskRead1MB       │ 0.143273                                 │ 0.2639                                   │ 1.84194   │
│ 23  │ loading     │ JuliaLoad         │ 100.1638935                              │ 165.691                                  │ 1.6542    │
│ 24  │ compilation │ compilecache      │ 266.788588                               │ 304.961                                  │ 1.14308   │
│ 25  │ compilation │ create_expr_cache │ 1.0525595                                │ 3.1209                                   │ 2.96506   │

@IanButterworth IanButterworth added the help wanted Extra attention is needed label May 10, 2020
@IanButterworth
Copy link
Owner Author

Nvidia Jetson Xavier NX
results.txt

│ Row │ cat         │ testname          │ res                            │
│     │ String      │ String            │ Any                            │
├─────┼─────────────┼───────────────────┼────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                          │
│ 2   │ info        │ JuliaVer          │ 1.5.0-DEV.823                  │
│ 3   │ info        │ OS                │ Linux (aarch64-linux-gnu)      │
│ 4   │ info        │ CPU               │ unknown                        │
│ 5   │ info        │ WORD_SIZE         │ 64                             │
│ 6   │ info        │ LIBM              │ libopenlibm                    │
│ 7   │ info        │ LLVM              │ libLLVM-9.0.1 (ORCJIT, carmel) │
│ 8   │ info        │ GPU               │ Xavier                         │
│ 9   │ cpu         │ FloatMul          │ 6.08e-7                        │
│ 10  │ cpu         │ FusedMulAdd       │ 3.2e-8                         │
│ 11  │ cpu         │ FloatSin          │ 7.95988e-6                     │
│ 12  │ cpu         │ VecMulBroad       │ 5.69106e-5                     │
│ 13  │ cpu         │ CPUMatMul         │ 0.089729                       │
│ 14  │ cpu         │ MatMulBroad       │ 0.00517867                     │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0014017                      │
│ 16  │ cpu         │ peakflops         │ 1.37614e10                     │
│ 17  │ cpu         │ FFMPEGH264Write   │ 465.808                        │
│ 18  │ gpu         │ GPUMatMul         │ 0.035744                       │
│ 19  │ mem         │ DeepCopy          │ 0.00040008                     │
│ 20  │ diskio      │ DiskWrite1KB      │ 0.081408                       │
│ 21  │ diskio      │ DiskWrite1MB      │ 1.74305                        │
│ 22  │ diskio      │ DiskRead1KB       │ 0.038048                       │
│ 23  │ diskio      │ DiskRead1MB       │ 0.579461                       │
│ 24  │ loading     │ JuliaLoad         │ 254.962                        │
│ 25  │ compilation │ compilecache      │ 793.205                        │
│ 26  │ compilation │ create_expr_cache │ 55.8754                        │

@giordano
Copy link
Collaborator

giordano commented May 11, 2020

MacBook Pro (Retina, 15-inch, Mid 2015)

ArchLinux: results_linux.txt

│ Row │ cat         │ testname          │ res                                       │
│     │ String      │ String            │ Any                                       │
├─────┼─────────────┼───────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                     │
│ 3   │ info        │ OS                │ Linux (x86_64-pc-linux-gnu)               │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                        │
│ 6   │ info        │ LIBM              │ libopenlibm                               │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, haswell)           │
│ 8   │ info        │ GPU               │ missing                                   │
│ 9   │ cpu         │ FloatMul          │ 1.376e-6                                  │
│ 10  │ cpu         │ FusedMulAdd       │ 1.4e-8                                    │
│ 11  │ cpu         │ FloatSin          │ 5.18e-6                                   │
│ 12  │ cpu         │ VecMulBroad       │ 3.67429e-5                                │
│ 13  │ cpu         │ CPUMatMul         │ 0.022055                                  │
│ 14  │ cpu         │ MatMulBroad       │ 0.003428                                  │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0013702                                 │
│ 16  │ cpu         │ peakflops         │ 1.53026e11                                │
│ 17  │ cpu         │ FFMPEGH264Write   │ 151.23                                    │
│ 18  │ mem         │ DeepCopy          │ 0.000168949                               │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.0297705                                 │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.871518                                  │
│ 21  │ diskio      │ DiskRead1KB       │ 0.00951925                                │
│ 22  │ diskio      │ DiskRead1MB       │ 0.127287                                  │
│ 23  │ loading     │ JuliaLoad         │ 119.903                                   │
│ 24  │ compilation │ compilecache      │ 320.395                                   │
│ 25  │ compilation │ create_expr_cache │ 1.05338                                   │

macOS 10.14.3 (18D42): results-macos.txt

│ Row │ cat         │ testname          │ res                                       │
│     │ String      │ String            │ Any                                       │
├─────┼─────────────┼───────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                     │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                     │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.7.0)         │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                        │
│ 6   │ info        │ LIBM              │ libopenlibm                               │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, haswell)           │
│ 8   │ info        │ GPU               │ missing                                   │
│ 9   │ cpu         │ FloatMul          │ 1.388e-6                                  │
│ 10  │ cpu         │ FusedMulAdd       │ 3.1e-8                                    │
│ 11  │ cpu         │ FloatSin          │ 5.191e-6                                  │
│ 12  │ cpu         │ VecMulBroad       │ 5.50395e-5                                │
│ 13  │ cpu         │ CPUMatMul         │ 0.0228935                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.00510037                                │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0015412                                 │
│ 16  │ cpu         │ peakflops         │ 1.48113e11                                │
│ 17  │ cpu         │ FFMPEGH264Write   │ 161.079                                   │
│ 18  │ mem         │ DeepCopy          │ 0.000171807                               │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.11039                                   │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.328327                                  │
│ 21  │ diskio      │ DiskRead1KB       │ 0.0330435                                 │
│ 22  │ diskio      │ DiskRead1MB       │ 1.04854                                   │
│ 23  │ loading     │ JuliaLoad         │ 180.142                                   │
│ 24  │ compilation │ compilecache      │ 344.544                                   │
│ 25  │ compilation │ create_expr_cache │ 7.86924                                   │

@IanButterworth
Copy link
Owner Author

Nvidia Xavier NX (on julia 1.4.1 official binary this time)
result1.4.1.txt

│ Row │ cat         │ testname          │ res                               │
│     │ String      │ String            │ Any                               │
├─────┼─────────────┼───────────────────┼───────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                             │
│ 2   │ info        │ JuliaVer          │ 1.4.1                             │
│ 3   │ info        │ OS                │ Linux (aarch64-unknown-linux-gnu) │
│ 4   │ info        │ CPU               │ unknown                           │
│ 5   │ info        │ WORD_SIZE         │ 64                                │
│ 6   │ info        │ LIBM              │ libopenlibm                       │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, generic)   │
│ 8   │ info        │ GPU               │ Xavier                            │
│ 9   │ cpu         │ FloatMul          │ 6.08e-7                           │
│ 10  │ cpu         │ FusedMulAdd       │ 3.2e-8                            │
│ 11  │ cpu         │ FloatSin          │ 5.04418e-6                        │
│ 12  │ cpu         │ VecMulBroad       │ 5.63197e-5                        │
│ 13  │ cpu         │ CPUMatMul         │ 0.084098                          │
│ 14  │ cpu         │ MatMulBroad       │ 0.0049815                         │
│ 15  │ cpu         │ 3DMulBroad        │ 0.0016096                         │
│ 16  │ cpu         │ peakflops         │ 1.46532e10                        │
│ 17  │ cpu         │ FFMPEGH264Write   │ 447.75                            │
│ 18  │ gpu         │ GPUMatMul         │ 0.042337                          │
│ 19  │ mem         │ DeepCopy          │ 0.00040177                        │
│ 20  │ diskio      │ DiskWrite1KB      │ 0.078305                          │
│ 21  │ diskio      │ DiskWrite1MB      │ 1.86118                           │
│ 22  │ diskio      │ DiskRead1KB       │ 0.033569                          │
│ 23  │ diskio      │ DiskRead1MB       │ 0.481467                          │
│ 24  │ loading     │ JuliaLoad         │ 296.809                           │
│ 25  │ compilation │ compilecache      │ 820.541                           │
│ 26  │ compilation │ create_expr_cache │ 64.79                             │

@imciner2
Copy link

Here are 3 benchmarks from the same machine (an Apple Mac Mini 2018) with different OSs:

macOS 10.14: result_OSX10.14.txt

│ Row │ cat         │ testname          │ res                                      │
│     │ String      │ String            │ Any                                      │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.7.0)        │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │
│ 6   │ info        │ LIBM              │ libopenlibm                              │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)          │
│ 8   │ info        │ GPU               │ missing                                  │
│ 9   │ cpu         │ FloatMul          │ 1.194e-6                                 │
│ 10  │ cpu         │ FusedMulAdd       │ 2.6e-8                                   │
│ 11  │ cpu         │ FloatSin          │ 3.775e-6                                 │
│ 12  │ cpu         │ VecMulBroad       │ 3.8e-5                                   │
│ 13  │ cpu         │ CPUMatMul         │ 0.0186195                                │
│ 14  │ cpu         │ MatMulBroad       │ 0.00273989                               │
│ 15  │ cpu         │ 3DMulBroad        │ 0.00133396                               │
│ 16  │ cpu         │ peakflops         │ 2.69492e11                               │
│ 17  │ cpu         │ FFMPEGH264Write   │ 125.65                                   │
│ 18  │ mem         │ DeepCopy          │ 0.000151372                              │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.114933                                 │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.429121                                 │
│ 21  │ diskio      │ DiskRead1KB       │ 0.063297                                 │
│ 22  │ diskio      │ DiskRead1MB       │ 0.733374                                 │
│ 23  │ loading     │ JuliaLoad         │ 136.777                                  │
│ 24  │ compilation │ compilecache      │ 252.246                                  │
│ 25  │ compilation │ create_expr_cache │ 7.92002                                  │

macOS 10.15: result_OSX10.15.txt

│ Row │ cat         │ testname          │ res                                      │
│     │ String      │ String            │ Any                                      │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │
│ 3   │ info        │ OS                │ macOS (x86_64-apple-darwin18.7.0)        │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │
│ 6   │ info        │ LIBM              │ libopenlibm                              │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)          │
│ 8   │ info        │ GPU               │ missing                                  │
│ 9   │ cpu         │ FloatMul          │ 1.117e-6                                 │
│ 10  │ cpu         │ FusedMulAdd       │ 2.5e-8                                   │
│ 11  │ cpu         │ FloatSin          │ 3.531e-6                                 │
│ 12  │ cpu         │ VecMulBroad       │ 3.56067e-5                               │
│ 13  │ cpu         │ CPUMatMul         │ 0.018218                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.00249989                               │
│ 15  │ cpu         │ 3DMulBroad        │ 0.00128777                               │
│ 16  │ cpu         │ peakflops         │ 2.67204e11                               │
│ 17  │ cpu         │ FFMPEGH264Write   │ 129.856                                  │
│ 18  │ mem         │ DeepCopy          │ 0.000142776                              │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.10736                                  │
│ 20  │ diskio      │ DiskWrite1MB      │ 0.693981                                 │
│ 21  │ diskio      │ DiskRead1KB       │ 0.0640125                                │
│ 22  │ diskio      │ DiskRead1MB       │ 0.771016                                 │
│ 23  │ loading     │ JuliaLoad         │ 171.787                                  │
│ 24  │ compilation │ compilecache      │ 286.137                                  │
│ 25  │ compilation │ create_expr_cache │ 9.40953                                  │

Windows 10: result_Win10.txt

│ Row │ cat         │ testname          │ res                                      │
│     │ String      │ String            │ Any                                      │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.0                                    │
│ 2   │ info        │ JuliaVer          │ 1.4.1                                    │
│ 3   │ info        │ OS                │ Windows (x86_64-w64-mingw32)             │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │
│ 6   │ info        │ LIBM              │ libopenlibm                              │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, skylake)          │
│ 8   │ info        │ GPU               │ missing                                  │
│ 9   │ cpu         │ FloatMul          │ 1.201e-6                                 │
│ 10  │ cpu         │ FusedMulAdd       │ 1.0e-9                                   │
│ 11  │ cpu         │ FloatSin          │ 3.399e-6                                 │
│ 12  │ cpu         │ VecMulBroad       │ 3.36016e-5                               │
│ 13  │ cpu         │ CPUMatMul         │ 0.171301                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.00358763                               │
│ 15  │ cpu         │ 3DMulBroad        │ 0.00103947                               │
│ 16  │ cpu         │ peakflops         │ 1.95075e11                               │
│ 17  │ cpu         │ FFMPEGH264Write   │ 132.488                                  │
│ 18  │ mem         │ DeepCopy          │ 0.000151418                              │
│ 19  │ diskio      │ DiskWrite1KB      │ 2.3068                                   │
│ 20  │ diskio      │ DiskWrite1MB      │ 2.7551                                   │
│ 21  │ diskio      │ DiskRead1KB       │ 0.062899                                 │
│ 22  │ diskio      │ DiskRead1MB       │ 0.170099                                 │
│ 23  │ loading     │ JuliaLoad         │ 165.164                                  │
│ 24  │ compilation │ compilecache      │ 298.44                                   │
│ 25  │ compilation │ create_expr_cache │ 2.6279                                   │

@gdkrmr
Copy link

gdkrmr commented May 11, 2020

An older computer results.txt:

25×3 DataFrames.DataFrame
│ Row │ cat         │ testname          │ res                                      │
│     │ String      │ String            │ Any                                      │
├─────┼─────────────┼───────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer       │ 0.2.1                                    │
│ 2   │ info        │ JuliaVer          │ 1.4.2-pre.0                              │
│ 3   │ info        │ OS                │ Linux (x86_64-linux-gnu)                 │
│ 4   │ info        │ CPU               │ Intel(R) Core(TM) i5-3470S CPU @ 2.90GHz │
│ 5   │ info        │ WORD_SIZE         │ 64                                       │
│ 6   │ info        │ LIBM              │ libopenlibm                              │
│ 7   │ info        │ LLVM              │ libLLVM-8.0.1 (ORCJIT, ivybridge)        │
│ 8   │ info        │ GPU               │ missing                                  │
│ 9   │ cpu         │ FloatMul          │ 1.69e-6                                  │
│ 10  │ cpu         │ FusedMulAdd       │ 1.741e-6                                 │
│ 11  │ cpu         │ FloatSin          │ 5.473e-6                                 │
│ 12  │ cpu         │ VecMulBroad       │ 6.97487e-5                               │
│ 13  │ cpu         │ CPUMatMul         │ 0.033443                                 │
│ 14  │ cpu         │ MatMulBroad       │ 0.00637                                  │
│ 15  │ cpu         │ 3DMulBroad        │ 0.001696                                 │
│ 16  │ cpu         │ peakflops         │ 7.3205e10                                │
│ 17  │ cpu         │ FFMPEGH264Write   │ 170.126                                  │
│ 18  │ mem         │ DeepCopy          │ 0.000282038                              │
│ 19  │ diskio      │ DiskWrite1KB      │ 0.049571                                 │
│ 20  │ diskio      │ DiskWrite1MB      │ 2.37895                                  │
│ 21  │ diskio      │ DiskRead1KB       │ 0.0126095                                │
│ 22  │ diskio      │ DiskRead1MB       │ 0.128504                                 │
│ 23  │ loading     │ JuliaLoad         │ 147.424                                  │
│ 24  │ compilation │ compilecache      │ 337.415                                  │
│ 25  │ compilation │ create_expr_cache │ 1.26394                                  │

@IanButterworth IanButterworth pinned this issue May 11, 2020
@IanButterworth
Copy link
Owner Author

I've added a few more compilation tests and fixed the FusedMulAdd test which wasn't interpolating variables correctly

Macbook Pro 2018
result1.4.1.txt

│ Row │ cat         │ testname                  │ res                                      │
│     │ String      │ String                    │ Any                                      │
├─────┼─────────────┼───────────────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                    │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                    │
│ 3   │ info        │ OS                        │ macOS (x86_64-apple-darwin18.7.0)        │
│ 4   │ info        │ CPU                       │ Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz │
│ 5   │ info        │ WORD_SIZE                 │ 64                                       │
│ 6   │ info        │ LIBM                      │ libopenlibm                              │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, skylake)          │
│ 8   │ info        │ GPU                       │ missing                                  │
│ 9   │ cpu         │ FloatMul                  │ 1.72e-6                                  │
│ 10  │ cpu         │ FusedMulAdd               │ 1.726e-6                                 │
│ 11  │ cpu         │ FloatSin                  │ 5.695e-6                                 │
│ 12  │ cpu         │ VecMulBroad               │ 4.67409e-5                               │
│ 13  │ cpu         │ CPUMatMul                 │ 0.0376125                                │
│ 14  │ cpu         │ MatMulBroad               │ 0.0201384                                │
│ 15  │ cpu         │ 3DMulBroad                │ 0.00168935                               │
│ 16  │ cpu         │ peakflops                 │ 1.91094e11                               │
│ 17  │ cpu         │ FFMPEGH264Write           │ 148.132                                  │
│ 18  │ mem         │ DeepCopy                  │ 0.000204265                              │
│ 19  │ diskio      │ DiskWrite1KB              │ 0.131597                                 │
│ 20  │ diskio      │ DiskWrite1MB              │ 0.373042                                 │
│ 21  │ diskio      │ DiskRead1KB               │ 0.0684265                                │
│ 22  │ diskio      │ DiskRead1MB               │ 1.05078                                  │
│ 23  │ loading     │ JuliaLoad                 │ 215.742                                  │
│ 24  │ compilation │ compilecache              │ 334.659                                  │
│ 25  │ compilation │ success_create_expr_cache │ 334.699                                  │
│ 26  │ compilation │ create_expr_cache         │ 11.2549                                  │
│ 27  │ compilation │ output-ji-substart        │ 36.9634                                  │

@IanButterworth
Copy link
Owner Author

Nvidia xavier nx
result1.4.1.txt

│ Row │ cat         │ testname                  │ res                               │
│     │ String      │ String                    │ Any                               │
├─────┼─────────────┼───────────────────────────┼───────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                             │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                             │
│ 3   │ info        │ OS                        │ Linux (aarch64-unknown-linux-gnu) │
│ 4   │ info        │ CPU                       │ unknown                           │
│ 5   │ info        │ WORD_SIZE                 │ 64                                │
│ 6   │ info        │ LIBM                      │ libopenlibm                       │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, generic)   │
│ 8   │ info        │ GPU                       │ Xavier                            │
│ 9   │ cpu         │ FloatMul                  │ 5.76e-7                           │
│ 10  │ cpu         │ FusedMulAdd               │ 6.72e-7                           │
│ 11  │ cpu         │ FloatSin                  │ 8.51406e-6                        │
│ 12  │ cpu         │ VecMulBroad               │ 5.67847e-5                        │
│ 13  │ cpu         │ CPUMatMul                 │ 0.090784                          │
│ 14  │ cpu         │ MatMulBroad               │ 0.00533333                        │
│ 15  │ cpu         │ 3DMulBroad                │ 0.0014752                         │
│ 16  │ cpu         │ peakflops                 │ 1.35671e10                        │
│ 17  │ cpu         │ FFMPEGH264Write           │ 437.655                           │
│ 18  │ gpu         │ GPUMatMul                 │ 0.03568                           │
│ 19  │ mem         │ DeepCopy                  │ 0.000371598                       │
│ 20  │ diskio      │ DiskWrite1KB              │ 0.077408                          │
│ 21  │ diskio      │ DiskWrite1MB              │ 1.85937                           │
│ 22  │ diskio      │ DiskRead1KB               │ 0.033536                          │
│ 23  │ diskio      │ DiskRead1MB               │ 0.340241                          │
│ 24  │ loading     │ JuliaLoad                 │ 311.75                            │
│ 25  │ compilation │ compilecache              │ 806.104                           │
│ 26  │ compilation │ success_create_expr_cache │ 766.175                           │
│ 27  │ compilation │ create_expr_cache         │ 60.752                            │
│ 28  │ compilation │ output-ji-substart        │ 116.229                           │

@giordano
Copy link
Collaborator

My usual system, with ArchLinux:
results-linux.txt

│ Row │ cat         │ testname                  │ res                                       │
│     │ String      │ String                    │ Any                                       │
├─────┼─────────────┼───────────────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                     │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                     │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)               │
│ 4   │ info        │ CPU                       │ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz │
│ 5   │ info        │ WORD_SIZE                 │ 64                                        │
│ 6   │ info        │ LIBM                      │ libopenlibm                               │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, haswell)           │
│ 8   │ info        │ GPU                       │ missing                                   │
│ 9   │ cpu         │ FloatMul                  │ 1.377e-6                                  │
│ 10  │ cpu         │ FusedMulAdd               │ 1.696e-6                                  │
│ 11  │ cpu         │ FloatSin                  │ 5.327e-6                                  │
│ 12  │ cpu         │ VecMulBroad               │ 3.65035e-5                                │
│ 13  │ cpu         │ CPUMatMul                 │ 0.0221045                                 │
│ 14  │ cpu         │ MatMulBroad               │ 0.00340225                                │
│ 15  │ cpu         │ 3DMulBroad                │ 0.001339                                  │
│ 16  │ cpu         │ peakflops                 │ 1.54865e11                                │
│ 17  │ cpu         │ FFMPEGH264Write           │ 158.537                                   │
│ 18  │ mem         │ DeepCopy                  │ 0.000166856                               │
│ 19  │ diskio      │ DiskWrite1KB              │ 0.028936                                  │
│ 20  │ diskio      │ DiskWrite1MB              │ 0.786169                                  │
│ 21  │ diskio      │ DiskRead1KB               │ 0.0091367                                 │
│ 22  │ diskio      │ DiskRead1MB               │ 0.11161                                   │
│ 23  │ loading     │ JuliaLoad                 │ 121.015                                   │
│ 24  │ compilation │ compilecache              │ 321.51                                    │
│ 25  │ compilation │ success_create_expr_cache │ 323.298                                   │
│ 26  │ compilation │ create_expr_cache         │ 1.12413                                   │
│ 27  │ compilation │ output-ji-substart        │ 75.8457                                   │

@singularitti
Copy link

singularitti commented May 12, 2020

│ Row │ cat         │ testname                  │ ref_res                                  │ test_res                               │
│     │ String      │ String                    │ Any                                      │ Any                                    │
├─────┼─────────────┼───────────────────────────┼──────────────────────────────────────────┼────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                    │ 0.2.1                                  │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                    │ 1.4.1                                  │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)              │ macOS (x86_64-apple-darwin18.7.0)      │
│ 4   │ info        │ CPU                       │ Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz │ Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz │
│ 5   │ info        │ WORD_SIZE                 │ 64                                       │ 64                                     │
│ 6   │ info        │ LIBM                      │ libopenlibm                              │ libopenlibm                            │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, skylake)          │ libLLVM-8.0.1 (ORCJIT, skylake)        │
│ 8   │ info        │ GPU                       │ GeForce GTX 1650 with Max-Q Design       │ missing                                │
│ 9   │ cpu         │ FloatMul                  │ 1.133e-6                                 │ 1.317e-6                               │
│ 10  │ cpu         │ FusedMulAdd               │ 1.1339999999999999e-6                    │ 1.298e-6                               │
│ 11  │ cpu         │ FloatSin                  │ 4.051e-6                                 │ 4.354e-6                               │
│ 12  │ cpu         │ VecMulBroad               │ 2.9825125628140703e-5                    │ 4.1623e-5                              │
│ 13  │ cpu         │ CPUMatMul                 │ 0.018817                                 │ 0.018766                               │
│ 14  │ cpu         │ MatMulBroad               │ 0.0042099                                │ 0.00365494                             │
│ 15  │ cpu         │ 3DMulBroad                │ 0.0010415                                │ 0.0015231                              │
│ 16  │ cpu         │ peakflops                 │ 1.7780621689435773e11                    │ 3.3468e11                              │
│ 17  │ cpu         │ FFMPEGH264Write           │ 107.928719                               │ 146.754                                │
│ 18  │ mem         │ DeepCopy                  │ 0.000186327721661055                     │ 0.000183125                            │
│ 19  │ diskio      │ DiskWrite1KB              │ 0.0319535                                │ 0.072157                               │
│ 20  │ diskio      │ DiskWrite1MB              │ 0.848287                                 │ 0.462047                               │
│ 21  │ diskio      │ DiskRead1KB               │ 0.00654375                               │ 0.0210955                              │
│ 22  │ diskio      │ DiskRead1MB               │ 0.144209                                 │ 0.477009                               │
│ 23  │ loading     │ JuliaLoad                 │ 91.776181                                │ 200.402                                │
│ 24  │ compilation │ compilecache              │ 208.8682705                              │ 343.543                                │
│ 25  │ compilation │ success_create_expr_cache │ 236.5010175                              │ 335.856                                │
│ 26  │ compilation │ create_expr_cache         │ 1.129475                                 │ 10.8718                                │
│ 27  │ compilation │ output-ji-substart        │ 32.713674999999995                       │ 52.8724                                │
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

@maleadt
Copy link

maleadt commented May 12, 2020

Benchmarked a bunch of systems :-)

results.txt: my dev system, 5-yo CPU but OCd to 4.5 GHz, and with NVME storage and a high-end GPU:

│ Row │ cat         │ testname                  │ res                                      │
│     │ String      │ String                    │ Any                                      │
├─────┼─────────────┼───────────────────────────┼──────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                    │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                    │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)              │
│ 4   │ info        │ CPU                       │ Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz │
│ 5   │ info        │ WORD_SIZE                 │ 64                                       │
│ 6   │ info        │ LIBM                      │ libopenlibm                              │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, skylake)          │
│ 8   │ info        │ GPU                       │ Quadro RTX 5000                          │
│ 9   │ cpu         │ FloatMul                  │ 1.129e-6                                 │
│ 10  │ cpu         │ FusedMulAdd               │ 1.129e-6                                 │
│ 11  │ cpu         │ FloatSin                  │ 3.605e-6                                 │
│ 12  │ cpu         │ VecMulBroad               │ 2.99819e-5                               │
│ 13  │ cpu         │ CPUMatMul                 │ 0.015621                                 │
│ 14  │ cpu         │ MatMulBroad               │ 0.00320244                               │
│ 15  │ cpu         │ 3DMulBroad                │ 0.0011218                                │
│ 16  │ cpu         │ peakflops                 │ 1.74984e11                               │
│ 17  │ cpu         │ FFMPEGH264Write           │ 109.264                                  │
│ 18  │ gpu         │ GPUMatMul                 │ 0.00464625                               │
│ 19  │ mem         │ DeepCopy                  │ 0.000186422                              │
│ 20  │ diskio      │ DiskWrite1KB              │ 0.027325                                 │
│ 21  │ diskio      │ DiskWrite1MB              │ 0.756474                                 │
│ 22  │ diskio      │ DiskRead1KB               │ 0.00498319                               │
│ 23  │ diskio      │ DiskRead1MB               │ 0.107739                                 │
│ 24  │ loading     │ JuliaLoad                 │ 115.011                                  │
│ 25  │ compilation │ compilecache              │ 214.99                                   │
│ 26  │ compilation │ success_create_expr_cache │ 243.291                                  │
│ 27  │ compilation │ create_expr_cache         │ 0.638935                                 │
│ 28  │ compilation │ output-ji-substart        │ 32.0302                                  │

results.txt: main JuliaGPU CI system. lower IPC, but this is a dual-CPU system with 4c/8t each. that's not reflected from these results though:

│ Row │ cat         │ testname                  │ res                                       │
│     │ String      │ String                    │ Any                                       │
├─────┼─────────────┼───────────────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                     │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                     │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)               │
│ 4   │ info        │ CPU                       │ Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz │
│ 5   │ info        │ WORD_SIZE                 │ 64                                        │
│ 6   │ info        │ LIBM                      │ libopenlibm                               │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, ivybridge)         │
│ 8   │ info        │ GPU                       │ GeForce RTX 2080 Ti                       │
│ 9   │ cpu         │ FloatMul                  │ 1.591e-6                                  │
│ 10  │ cpu         │ FusedMulAdd               │ 1.591e-6                                  │
│ 11  │ cpu         │ FloatSin                  │ 5.529e-6                                  │
│ 12  │ cpu         │ VecMulBroad               │ 5.55736e-5                                │
│ 13  │ cpu         │ CPUMatMul                 │ 0.0347335                                 │
│ 14  │ cpu         │ MatMulBroad               │ 0.0081348                                 │
│ 15  │ cpu         │ 3DMulBroad                │ 0.00293255                                │
│ 16  │ cpu         │ peakflops                 │ 2.0002e11                                 │
│ 17  │ cpu         │ FFMPEGH264Write           │ 154.11                                    │
│ 18  │ gpu         │ GPUMatMul                 │ 0.00887367                                │
│ 19  │ mem         │ DeepCopy                  │ 0.000359481                               │
│ 20  │ diskio      │ DiskWrite1KB              │ 0.056811                                  │
│ 21  │ diskio      │ DiskWrite1MB              │ 4.22511                                   │
│ 22  │ diskio      │ DiskRead1KB               │ 0.014873                                  │
│ 23  │ diskio      │ DiskRead1MB               │ 0.22326                                   │
│ 24  │ loading     │ JuliaLoad                 │ 193.066                                   │
│ 25  │ compilation │ compilecache              │ 372.924                                   │
│ 26  │ compilation │ success_create_expr_cache │ 373.622                                   │
│ 27  │ compilation │ create_expr_cache         │ 1.3371                                    │
│ 28  │ compilation │ output-ji-substart        │ 76.5526                                   │

results.txt: high-end Ryzen Threadripper (previous-gen), but surprisingly low numbers for some benchmarks...

│ Row │ cat         │ testname                  │ res                                             │
│     │ String      │ String                    │ Any                                             │
├─────┼─────────────┼───────────────────────────┼─────────────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                           │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                           │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)                     │
│ 4   │ info        │ CPU                       │ AMD Ryzen Threadripper 2990WX 32-Core Processor │
│ 5   │ info        │ WORD_SIZE                 │ 64                                              │
│ 6   │ info        │ LIBM                      │ libopenlibm                                     │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, znver1)                  │
│ 8   │ info        │ GPU                       │ GeForce GTX TITAN                               │
│ 9   │ cpu         │ FloatMul                  │ 1.222e-6                                        │
│ 10  │ cpu         │ FusedMulAdd               │ 1.222e-6                                        │
│ 11  │ cpu         │ FloatSin                  │ 3.166e-6                                        │
│ 12  │ cpu         │ VecMulBroad               │ 3.0007e-5                                       │
│ 13  │ cpu         │ CPUMatMul                 │ 0.054804                                        │
│ 14  │ cpu         │ MatMulBroad               │ 0.00329633                                      │
│ 15  │ cpu         │ 3DMulBroad                │ 0.0013445                                       │
│ 16  │ cpu         │ peakflops                 │ 2.04618e11                                      │
│ 17  │ cpu         │ FFMPEGH264Write           │ 163.401                                         │
│ 18  │ gpu         │ GPUMatMul                 │ 0.0170386                                       │
│ 19  │ mem         │ DeepCopy                  │ 0.000169991                                     │
│ 20  │ diskio      │ DiskWrite1KB              │ 0.028234                                        │
│ 21  │ diskio      │ DiskWrite1MB              │ 0.753011                                        │
│ 22  │ diskio      │ DiskRead1KB               │ 0.00633033                                      │
│ 23  │ diskio      │ DiskRead1MB               │ 0.125308                                        │
│ 24  │ loading     │ JuliaLoad                 │ 165.767                                         │
│ 25  │ compilation │ compilecache              │ 293.01                                          │
│ 26  │ compilation │ success_create_expr_cache │ 295.795                                         │
│ 27  │ compilation │ create_expr_cache         │ 2.36585                                         │
│ 28  │ compilation │ output-ji-substart        │ 43.0505                                         │

results.txt: Jetson AGX Xavier devkit. as expected, bad numbers due to ARM, even though this is a pretty powerful system:

│ Row │ cat         │ testname                  │ res                               │
│     │ String      │ String                    │ Any                               │
├─────┼─────────────┼───────────────────────────┼───────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                             │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                             │
│ 3   │ info        │ OS                        │ Linux (aarch64-unknown-linux-gnu) │
│ 4   │ info        │ CPU                       │ unknown                           │
│ 5   │ info        │ WORD_SIZE                 │ 64                                │
│ 6   │ info        │ LIBM                      │ libopenlibm                       │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, generic)   │
│ 8   │ info        │ GPU                       │ missing                           │
│ 9   │ cpu         │ FloatMul                  │ 5.12e-7                           │
│ 10  │ cpu         │ FusedMulAdd               │ 5.76e-7                           │
│ 11  │ cpu         │ FloatSin                  │ 4.65396e-6                        │
│ 12  │ cpu         │ VecMulBroad               │ 4.2884e-5                         │
│ 13  │ cpu         │ CPUMatMul                 │ 2.06124                           │
│ 14  │ cpu         │ MatMulBroad               │ 0.00463557                        │
│ 15  │ cpu         │ 3DMulBroad                │ 0.0025025                         │
│ 16  │ cpu         │ peakflops                 │ 6.86379e10                        │
│ 17  │ cpu         │ FFMPEGH264Write           │ 274.624                           │
│ 18  │ mem         │ DeepCopy                  │ 0.000297005                       │
│ 19  │ diskio      │ DiskWrite1KB              │ 1.09742                           │
│ 20  │ diskio      │ DiskWrite1MB              │ 31.9208                           │
│ 21  │ diskio      │ DiskRead1KB               │ 0.045282                          │
│ 22  │ diskio      │ DiskRead1MB               │ 0.875157                          │
│ 23  │ loading     │ JuliaLoad                 │ 381.715                           │
│ 24  │ compilation │ compilecache              │ 708.207                           │
│ 25  │ compilation │ success_create_expr_cache │ 687.974                           │
│ 26  │ compilation │ create_expr_cache         │ 57.6076                           │
│ 27  │ compilation │ output-ji-substart        │ 82.9881                           │

results.txt: a Jetson Nano devkit, much lower-power ARM system with sd-card storage:

│ Row │ cat         │ testname                  │ res                                │
│     │ String      │ String                    │ Any                                │
├─────┼─────────────┼───────────────────────────┼────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                              │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                              │
│ 3   │ info        │ OS                        │ Linux (aarch64-unknown-linux-gnu)  │
│ 4   │ info        │ CPU                       │ unknown                            │
│ 5   │ info        │ WORD_SIZE                 │ 64                                 │
│ 6   │ info        │ LIBM                      │ libopenlibm                        │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, cortex-a57) │
│ 8   │ info        │ GPU                       │ NVIDIA Tegra X1                    │
│ 9   │ cpu         │ FloatMul                  │ 4.323e-6                           │
│ 10  │ cpu         │ FusedMulAdd               │ 4.323e-6                           │
│ 11  │ cpu         │ FloatSin                  │ 2.63042e-5                         │
│ 12  │ cpu         │ VecMulBroad               │ 0.00014074                         │
│ 13  │ cpu         │ CPUMatMul                 │ 0.099428                           │
│ 14  │ cpu         │ MatMulBroad               │ 0.010677                           │
│ 15  │ cpu         │ 3DMulBroad                │ 0.00511914                         │
│ 16  │ cpu         │ peakflops                 │ 1.57406e10                         │
│ 17  │ cpu         │ FFMPEGH264Write           │ 541.78                             │
│ 18  │ gpu         │ GPUMatMul                 │ 0.03974                            │
│ 19  │ mem         │ DeepCopy                  │ 0.000575973                        │
│ 20  │ diskio      │ DiskWrite1KB              │ 1.26632                            │
│ 21  │ diskio      │ DiskWrite1MB              │ 43.2608                            │
│ 22  │ diskio      │ DiskRead1KB               │ 0.147138                           │
│ 23  │ diskio      │ DiskRead1MB               │ 1.07072                            │
│ 24  │ loading     │ JuliaLoad                 │ 595.649                            │
│ 25  │ compilation │ compilecache              │ 1092.95                            │
│ 26  │ compilation │ success_create_expr_cache │ 1082.66                            │
│ 27  │ compilation │ create_expr_cache         │ 30.2996                            │
│ 28  │ compilation │ output-ji-substart        │ 140.495                            │

@oschulz
Copy link

oschulz commented May 12, 2020

result.txt: Epyc-2 compute server, Tesla V100 GPU, Intel Optane NVMe storage:

│ Row │ cat         │ testname                  │ res                              │
│     │ String      │ String                    │ Any                              │
├─────┼─────────────┼───────────────────────────┼──────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                            │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                            │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)      │
│ 4   │ info        │ CPU                       │ AMD EPYC 7702P 64-Core Processor │
│ 5   │ info        │ WORD_SIZE                 │ 64                               │
│ 6   │ info        │ LIBM                      │ libopenlibm                      │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, znver1)   │
│ 8   │ info        │ GPU                       │ Tesla V100-PCIE-32GB             │
│ 9   │ cpu         │ FloatMul                  │ 1.52e-6                          │
│ 10  │ cpu         │ FusedMulAdd               │ 1.52e-6                          │
│ 11  │ cpu         │ FloatSin                  │ 3.93e-6                          │
│ 12  │ cpu         │ VecMulBroad               │ 3.39135e-5                       │
│ 13  │ cpu         │ CPUMatMul                 │ 0.041221                         │
│ 14  │ cpu         │ MatMulBroad               │ 0.00330125                       │
│ 15  │ cpu         │ 3DMulBroad                │ 0.001392                         │
│ 16  │ cpu         │ peakflops                 │ 6.71855e11                       │
│ 17  │ cpu         │ FFMPEGH264Write           │ 137.059                          │
│ 18  │ gpu         │ GPUMatMul                 │ 0.00651833                       │
│ 19  │ mem         │ DeepCopy                  │ 0.0002087                        │
│ 20  │ diskio      │ DiskWrite1KB              │ 0.022                            │
│ 21  │ diskio      │ DiskWrite1MB              │ 0.744473                         │
│ 22  │ diskio      │ DiskRead1KB               │ 0.00652842                       │
│ 23  │ diskio      │ DiskRead1MB               │ 0.131426                         │
│ 24  │ loading     │ JuliaLoad                 │ 180.288                          │
│ 25  │ compilation │ compilecache              │ 302.753                          │
│ 26  │ compilation │ success_create_expr_cache │ 297.745                          │
│ 27  │ compilation │ create_expr_cache         │ 1.88856                          │
│ 28  │ compilation │ output-ji-substart        │ 38.5499                          │

@v-i-s-h
Copy link

v-i-s-h commented May 15, 2020

Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

result.txt

27×3 DataFrames.DataFrame
│ Row │ cat         │ testname                  │ res                                     │
│     │ String      │ String                    │ Any                                     │
├─────┼─────────────┼───────────────────────────┼─────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                   │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                   │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)             │
│ 4   │ info        │ CPU                       │ Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz │
│ 5   │ info        │ WORD_SIZE                 │ 64                                      │
│ 6   │ info        │ LIBM                      │ libopenlibm                             │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, skylake)         │
│ 8   │ info        │ GPU                       │ missing                                 │
│ 9   │ cpu         │ FloatMul                  │ 1.215e-6                                │
│ 10  │ cpu         │ FusedMulAdd               │ 1.215e-6                                │
│ 11  │ cpu         │ FloatSin                  │ 3.874e-6                                │
│ 12  │ cpu         │ VecMulBroad               │ 3.16553e-5                              │
│ 13  │ cpu         │ CPUMatMul                 │ 0.017469                                │
│ 14  │ cpu         │ MatMulBroad               │ 0.00307844                              │
│ 15  │ cpu         │ 3DMulBroad                │ 0.0011962                               │
│ 16  │ cpu         │ peakflops                 │ 1.9438e11                               │
│ 17  │ cpu         │ FFMPEGH264Write           │ 114.998                                 │
│ 18  │ mem         │ DeepCopy                  │ 0.000161646                             │
│ 19  │ diskio      │ DiskWrite1KB              │ 0.087031                                │
│ 20  │ diskio      │ DiskWrite1MB              │ 6.85173                                 │
│ 21  │ diskio      │ DiskRead1KB               │ 0.00606192                              │
│ 22  │ diskio      │ DiskRead1MB               │ 0.245249                                │
│ 23  │ loading     │ JuliaLoad                 │ 100.483                                 │
│ 24  │ compilation │ compilecache              │ 238.315                                 │
│ 25  │ compilation │ success_create_expr_cache │ 260.107                                 │
│ 26  │ compilation │ create_expr_cache         │ 0.887955                                │
│ 27  │ compilation │ output-ji-substart        │ 42.9264                                 │

@v-i-s-h
Copy link

v-i-s-h commented May 15, 2020

Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

GPU : Nvidia GTX TITAN X
results.txt

28×3 DataFrames.DataFrame
│ Row │ cat         │ testname                  │ res                                       │
│     │ String      │ String                    │ Any                                       │
├─────┼─────────────┼───────────────────────────┼───────────────────────────────────────────┤
│ 1   │ info        │ SysBenchVer               │ 0.2.1                                     │
│ 2   │ info        │ JuliaVer                  │ 1.4.1                                     │
│ 3   │ info        │ OS                        │ Linux (x86_64-pc-linux-gnu)               │
│ 4   │ info        │ CPU                       │ Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz │
│ 5   │ info        │ WORD_SIZE                 │ 64                                        │
│ 6   │ info        │ LIBM                      │ libopenlibm                               │
│ 7   │ info        │ LLVM                      │ libLLVM-8.0.1 (ORCJIT, haswell)           │
│ 8   │ info        │ GPU                       │ GeForce GTX TITAN X                       │
│ 9   │ cpu         │ FloatMul                  │ 1.587e-6                                  │
│ 10  │ cpu         │ FusedMulAdd               │ 1.589e-6                                  │
│ 11  │ cpu         │ FloatSin                  │ 5.97e-6                                   │
│ 12  │ cpu         │ VecMulBroad               │ 5.23962e-5                                │
│ 13  │ cpu         │ CPUMatMul                 │ 0.0354835                                 │
│ 14  │ cpu         │ MatMulBroad               │ 0.00680414                                │
│ 15  │ cpu         │ 3DMulBroad                │ 0.00223535                                │
│ 16  │ cpu         │ peakflops                 │ 2.61333e11                                │
│ 17  │ cpu         │ FFMPEGH264Write           │ 209.559                                   │
│ 18  │ gpu         │ GPUMatMul                 │ 0.00530058                                │
│ 19  │ mem         │ DeepCopy                  │ 0.000321111                               │
│ 20  │ diskio      │ DiskWrite1KB              │ 0.489467                                  │
│ 21  │ diskio      │ DiskWrite1MB              │ 24.8961                                   │
│ 22  │ diskio      │ DiskRead1KB               │ 0.0140801                                 │
│ 23  │ diskio      │ DiskRead1MB               │ 0.363181                                  │
│ 24  │ loading     │ JuliaLoad                 │ 182.393                                   │
│ 25  │ compilation │ compilecache              │ 353.991                                   │
│ 26  │ compilation │ success_create_expr_cache │ 354.424                                   │
│ 27  │ compilation │ create_expr_cache         │ 2.0589                                    │
│ 28  │ compilation │ output-ji-substart        │ 45.8285                                   │

@LilithHafner
Copy link

result.txt

 Row │ cat          testname         units   res                               
     │ String       String           String  Any                               
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.6.4
   3 │ info         OS                       macOS (x86_64-apple-darwin19.5.0)
   4 │ info         CPU                      Intel(R) Core(TM) i5-8210Y CPU @…
   5 │ info         CPU_THREADS              4
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-11.0.1 (ORCJIT, skylake)
   9 │ info         BLAS                     missing                           
  10 │ info         BLAS_threads             2
  11 │ info         GPU                      missing                           
  12 │ cpu          FloatMul         ms      1.9995e-6
  13 │ cpu          FusedMulAdd      ms      1.987e-6
  14 │ cpu          FloatSin         ms      5.94394e-6
  15 │ cpu          VecMulBroad      ms      5.39929e-5
  16 │ cpu          CPUMatMul        ms      0.066139
  17 │ cpu          MatMulBroad      ms      0.0236049
  18 │ cpu          3DMulBroad       ms      0.002931
  19 │ cpu          peakflops        flops   4.7348e10
  20 │ cpu          FFMPEGH264Write  ms      680.68
  21 │ mem          DeepCopy         ms      0.000295927
  22 │ mem          Bandwidth10kB    MiB/s   88061.1
  23 │ mem          Bandwidth100kB   MiB/s   42889.6
  24 │ mem          Bandwidth1MB     MiB/s   18038.8
  25 │ mem          Bandwidth10MB    MiB/s   8651.02
  26 │ mem          Bandwidth100MB   MiB/s   7471.81
  27 │ diskio       DiskWrite1KB     ms      0.201853
  28 │ diskio       DiskWrite1MB     ms      0.673183
  29 │ diskio       DiskRead1KB      ms      0.1285
  30 │ diskio       DiskRead1MB      ms      1.40853
  31 │ loading      JuliaLoad        ms      290.808
  32 │ compilation  compilecache     ms      404.246

@giordano
Copy link
Collaborator

This is Fugaku:

32×4 DataFrame
 Row │ cat          testname         units   res                               
     │ String       String           String  Any                               
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.7.2
   3 │ info         OS                       Linux (aarch64-unknown-linux-gnu)
   4 │ info         CPU                      unknown
   5 │ info         CPU_THREADS              50
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-12.0.1 (ORCJIT, a64fx)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      missing                           
  12 │ cpu          FloatMul         ms      3.22e-6
  13 │ cpu          FusedMulAdd      ms      3.6e-6
  14 │ cpu          FloatSin         ms      2.66533e-5
  15 │ cpu          VecMulBroad      ms      0.000181122
  16 │ cpu          CPUMatMul        ms      0.11291
  17 │ cpu          MatMulBroad      ms      0.0063798
  18 │ cpu          3DMulBroad       ms      0.00415875
  19 │ cpu          peakflops        flops   2.60564e10
  20 │ cpu          FFMPEGH264Write  ms      1410.66
  21 │ mem          DeepCopy         ms      0.000669933
  22 │ mem          Bandwidth10kB    MiB/s   17080.9
  23 │ mem          Bandwidth100kB   MiB/s   12835.5
  24 │ mem          Bandwidth1MB     MiB/s   13163.2
  25 │ mem          Bandwidth10MB    MiB/s   11866.4
  26 │ mem          Bandwidth100MB   MiB/s   11638.9
  27 │ diskio       DiskWrite1KB     ms      12.4292
  28 │ diskio       DiskWrite1MB     ms      17.4228
  29 │ diskio       DiskRead1KB      ms      0.88321
  30 │ diskio       DiskRead1MB      ms      0.982015
  31 │ loading      JuliaLoad        ms      805.339
  32 │ compilation  compilecache     ms      1418.16

Similar disappointing results as the other a64fx system I benchmarked above.

results.txt

@oschulz
Copy link

oschulz commented Feb 15, 2022

Any idea why CPUMatMul and the like to so much worse on a64fx compared to M1?

@giordano
Copy link
Collaborator

giordano commented Mar 25, 2022

Same machine as #8 (comment), but with Asahi Linux

32×4 DataFrame
 Row │ cat          testname         units   res                               
     │ String       String           String  Any                               
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.0-DEV.247
   3 │ info         OS                       Linux (aarch64-unknown-linux-gnu)
   4 │ info         CPU                      8 × unknown
   5 │ info         CPU_THREADS              8
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-13.0.1 (ORCJIT, apple-m1)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      missing                           
  12 │ cpu          FloatMul         ms      2.375e-6
  13 │ cpu          FusedMulAdd      ms      2.375e-6
  14 │ cpu          FloatSin         ms      5.042e-6
  15 │ cpu          VecMulBroad      ms      2.54357e-5
  16 │ cpu          CPUMatMul        ms      0.041749
  17 │ cpu          MatMulBroad      ms      0.0015541
  18 │ cpu          3DMulBroad       ms      0.000805336
  19 │ cpu          peakflops        flops   1.06142e11
  20 │ cpu          FFMPEGH264Write  ms      590.739
  21 │ mem          DeepCopy         ms      0.000147061
  22 │ mem          Bandwidth10kB    MiB/s   75898.9
  23 │ mem          Bandwidth100kB   MiB/s   48085.2
  24 │ mem          Bandwidth1MB     MiB/s   38083.0
  25 │ mem          Bandwidth10MB    MiB/s   31526.6
  26 │ mem          Bandwidth100MB   MiB/s   27164.8
  27 │ diskio       DiskWrite1KB     ms      0.016208
  28 │ diskio       DiskWrite1MB     ms      0.317458
  29 │ diskio       DiskRead1KB      ms      0.00283789
  30 │ diskio       DiskRead1MB      ms      0.238375
  31 │ loading      JuliaLoad        ms      77.1987
  32 │ compilation  compilecache     ms      137.577

results.txt


For a more fair comparison, I reran the benchmark on macOS with the same version of Julia and SystemBenchmark:

32×4 DataFrame
 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.0-DEV.247
   3 │ info         OS                       macOS (arm64-apple-darwin21.4.0)
   4 │ info         CPU                      8 × Apple M1
   5 │ info         CPU_THREADS              4
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-13.0.1 (ORCJIT, apple-m1)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      missing
  12 │ cpu          FloatMul         ms      1.583e-6
  13 │ cpu          FusedMulAdd      ms      1.583e-6
  14 │ cpu          FloatSin         ms      3.75e-6
  15 │ cpu          VecMulBroad      ms      2.35944e-5
  16 │ cpu          CPUMatMul        ms      0.06125
  17 │ cpu          MatMulBroad      ms      0.0033667
  18 │ cpu          3DMulBroad       ms      0.0010459
  19 │ cpu          peakflops        flops   1.21064e11
  20 │ cpu          FFMPEGH264Write  ms      187.329
  21 │ mem          DeepCopy         ms      0.000106642
  22 │ mem          Bandwidth10kB    MiB/s   81526.6
  23 │ mem          Bandwidth100kB   MiB/s   83839.5
  24 │ mem          Bandwidth1MB     MiB/s   45322.4
  25 │ mem          Bandwidth10MB    MiB/s   33219.5
  26 │ mem          Bandwidth100MB   MiB/s   27981.7
  27 │ diskio       DiskWrite1KB     ms      0.042792
  28 │ diskio       DiskWrite1MB     ms      0.26
  29 │ diskio       DiskRead1KB      ms      0.00675692
  30 │ diskio       DiskRead1MB      ms      0.360333
  31 │ loading      JuliaLoad        ms      80.34
  32 │ compilation  compilecache     ms      152.64

results-macos.txt

Comparison macOS vs Linux:

julia> compare(macos, linux)
32×6 DataFrame
 Row │ cat          testname         units    ref_res                            test_res                           factor
     │ String       String           String?  Any                                Any                                Any
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer      missing  0.4.1                              0.4.1                              Equal
   2 │ info         JuliaVer         missing  1.9.0-DEV.247                      1.9.0-DEV.247                      Equal
   3 │ info         OS               missing  macOS (arm64-apple-darwin21.4.0)   Linux (aarch64-unknown-linux-gnu)  Not Equal
   4 │ info         CPU              missing  8 × Apple M1                       8 × unknown                        Not Equal
   5 │ info         CPU_THREADS      missing  4                                  8                                  Not Equal
   6 │ info         WORD_SIZE        missing  64                                 64                                 Equal
   7 │ info         LIBM             missing  libopenlibm                        libopenlibm                        Equal
   8 │ info         LLVM             missing  libLLVM-13.0.1 (ORCJIT, apple-m1)  libLLVM-13.0.1 (ORCJIT, apple-m1)  Equal
   9 │ info         BLAS             missing  libopenblas64_                     libopenblas64_                     Equal
  10 │ info         BLAS_threads     missing  8                                  8                                  Equal
  11 │ info         GPU              missing  missing                            missing                            Not Equal
  12 │ cpu          FloatMul         ms       1.583e-6                           2.375e-6                           1.50032
  13 │ cpu          FusedMulAdd      ms       1.583e-6                           2.375e-6                           1.50032
  14 │ cpu          FloatSin         ms       3.75e-6                            5.042e-6                           1.34453
  15 │ cpu          VecMulBroad      ms       2.359437751004016e-5               2.5435742971887548e-5              1.07804
  16 │ cpu          CPUMatMul        ms       0.06125                            0.041749                           0.681616
  17 │ cpu          MatMulBroad      ms       0.0033667                          0.0015540999999999999              0.461609
  18 │ cpu          3DMulBroad       ms       0.00104590395480226                0.0008053358778625955              0.76999
  19 │ cpu          peakflops        flops    1.2106381070680473e11              1.0614193490591101e11              0.876744
  20 │ cpu          FFMPEGH264Write  ms       187.329                            590.738945                         3.15348
  21 │ mem          DeepCopy         ms       0.00010664248159831755             0.00014706105263157895             1.37901
  22 │ mem          Bandwidth10kB    MiB/s    81526.63516515732                  75898.86461543928                  0.93097
  23 │ mem          Bandwidth100kB   MiB/s    83839.5003434066                   48085.22746968436                  0.573539
  24 │ mem          Bandwidth1MB     MiB/s    45322.417850311285                 38082.99322762759                  0.840268
  25 │ mem          Bandwidth10MB    MiB/s    33219.463235588664                 31526.632123394204                 0.949041
  26 │ mem          Bandwidth100MB   MiB/s    27981.693500110618                 27164.814178103377                 0.970807
  27 │ diskio       DiskWrite1KB     ms       0.042792                           0.016208                           0.378762
  28 │ diskio       DiskWrite1MB     ms       0.26                               0.317458                           1.22099
  29 │ diskio       DiskRead1KB      ms       0.006756916666666666               0.0028378888888888885              0.419998
  30 │ diskio       DiskRead1MB      ms       0.360333                           0.238375                           0.661541
  31 │ loading      JuliaLoad        ms       80.34                              77.198681                          0.9609
  32 │ compilation  compilecache     ms       152.63975                          137.57717                          0.901319

@mrs504aa
Copy link

mrs504aa commented Apr 11, 2022

My own laptop. result.txt

 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.7.1
   3 │ info         OS                       Windows (x86_64-w64-mingw32)
   4 │ info         CPU                      AMD Ryzen 7 5800H with Radeon Gr…
   5 │ info         CPU_THREADS              16
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-12.0.1 (ORCJIT, znver3)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      NVIDIA GeForce RTX 3060 Laptop G…
  12 │ cpu          FloatMul         ms      1.2e-6
  13 │ cpu          FusedMulAdd      ms      1.2e-6
  14 │ cpu          FloatSin         ms      3.3e-6
  15 │ cpu          VecMulBroad      ms      3.11245e-5
  16 │ cpu          CPUMatMul        ms      0.1599
  17 │ cpu          MatMulBroad      ms      0.012875
  18 │ cpu          3DMulBroad       ms      0.00111705
  19 │ cpu          peakflops        flops   1.74267e11
  20 │ cpu          FFMPEGH264Write  ms      224.313
  21 │ gpu          GPUMatMul        ms      0.0248
  22 │ mem          DeepCopy         ms      0.000194413
  23 │ mem          Bandwidth10kB    MiB/s   1.26119e5
  24 │ mem          Bandwidth100kB   MiB/s   65320.2
  25 │ mem          Bandwidth1MB     MiB/s   36124.0
  26 │ mem          Bandwidth10MB    MiB/s   10011.3
  27 │ mem          Bandwidth100MB   MiB/s   8284.82
  28 │ diskio       DiskWrite1KB     ms      0.3006
  29 │ diskio       DiskWrite1MB     ms      3.1672
  30 │ diskio       DiskRead1KB      ms      2.5632
  31 │ diskio       DiskRead1MB      ms      3.7722
  32 │ loading      JuliaLoad        ms      148.985
  33 │ compilation  compilecache     ms      247.792

Server in Quantum X center. result.txt

 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.7.2
   3 │ info         OS                       Linux (x86_64-pc-linux-gnu)
   4 │ info         CPU                      Intel(R) Xeon(R) Gold 6248 CPU @…
   5 │ info         CPU_THREADS              160
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-12.0.1 (ORCJIT, cascadel…
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      Tesla V100-PCIE-32GB
  12 │ cpu          FloatMul         ms      2.263e-6
  13 │ cpu          FusedMulAdd      ms      2.291e-6
  14 │ cpu          FloatSin         ms      5.38739e-6
  15 │ cpu          VecMulBroad      ms      8.82558e-5
  16 │ cpu          CPUMatMul        ms      0.0634355
  17 │ cpu          MatMulBroad      ms      0.00703592
  18 │ cpu          3DMulBroad       ms      0.00262922
  19 │ cpu          peakflops        flops   5.00479e11
  20 │ cpu          FFMPEGH264Write  ms      425.497
  21 │ gpu          GPUMatMul        ms      0.015861
  22 │ mem          DeepCopy         ms      0.000443211
  23 │ mem          Bandwidth10kB    MiB/s   1.02107e5
  24 │ mem          Bandwidth100kB   MiB/s   35179.4
  25 │ mem          Bandwidth1MB     MiB/s   8326.12
  26 │ mem          Bandwidth10MB    MiB/s   7858.06
  27 │ mem          Bandwidth100MB   MiB/s   5237.15
  28 │ diskio       DiskWrite1KB     ms      0.067679
  29 │ diskio       DiskWrite1MB     ms      0.694269
  30 │ diskio       DiskRead1KB      ms      0.00865664
  31 │ diskio       DiskRead1MB      ms      0.360752
  32 │ loading      JuliaLoad        ms      205.956
  33 │ compilation  compilecache     ms      317.523

@tbeason
Copy link

tbeason commented Apr 11, 2022

results.txt

 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.7.2
   3 │ info         OS                       Windows (x86_64-w64-mingw32)
   4 │ info         CPU                      AMD Ryzen 9 5950X 16-Core Proces…
   5 │ info         CPU_THREADS              32
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-12.0.1 (ORCJIT, znver3)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      NVIDIA GeForce RTX 2080 SUPER
  12 │ cpu          FloatMul         ms      1.1e-6
  13 │ cpu          FusedMulAdd      ms      1.1e-6
  14 │ cpu          FloatSin         ms      3.0e-6
  15 │ cpu          VecMulBroad      ms      2.50752e-5
  16 │ cpu          CPUMatMul        ms      0.0397
  17 │ cpu          MatMulBroad      ms      0.00933
  18 │ cpu          3DMulBroad       ms      0.000818978
  19 │ cpu          peakflops        flops   3.80514e11
  20 │ cpu          FFMPEGH264Write  ms      243.274
  21 │ gpu          GPUMatMul        ms      0.00608
  22 │ mem          DeepCopy         ms      0.000125
  23 │ mem          Bandwidth10kB    MiB/s   1.43491e5
  24 │ mem          Bandwidth100kB   MiB/s   74505.8
  25 │ mem          Bandwidth1MB     MiB/s   64005.0
  26 │ mem          Bandwidth10MB    MiB/s   29066.6
  27 │ mem          Bandwidth100MB   MiB/s   11503.8
  28 │ diskio       DiskWrite1KB     ms      0.1883
  29 │ diskio       DiskWrite1MB     ms      2.16425
  30 │ diskio       DiskRead1KB      ms      1.7753
  31 │ diskio       DiskRead1MB      ms      2.5384
  32 │ loading      JuliaLoad        ms      125.954
  33 │ compilation  compilecache     ms      191.284

@Drvi
Copy link

Drvi commented May 16, 2022

result.txt

 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.7.2
   3 │ info         OS                       macOS (x86_64-apple-darwin19.5.0)
   4 │ info         CPU                      Apple M1 Max
   5 │ info         CPU_THREADS              10
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-12.0.1 (ORCJIT, westmere)
   9 │ info         BLAS                     libopenblas64_.0.3.13
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      missing
  12 │ cpu          FloatMul         ms      1.583e-6
  13 │ cpu          FusedMulAdd      ms      1.583e-6
  14 │ cpu          FloatSin         ms      6.542e-6
  15 │ cpu          VecMulBroad      ms      3.37022e-5
  16 │ cpu          CPUMatMul        ms      0.123917
  17 │ cpu          MatMulBroad      ms      0.0021125
  18 │ cpu          3DMulBroad       ms      0.00114029
  19 │ cpu          peakflops        flops   1.48237e11
  20 │ cpu          FFMPEGH264Write  ms      340.882
  21 │ mem          DeepCopy         ms      0.000189476
  22 │ mem          Bandwidth10kB    MiB/s   71627.9
  23 │ mem          Bandwidth100kB   MiB/s   42648.8
  24 │ mem          Bandwidth1MB     MiB/s   41614.3
  25 │ mem          Bandwidth10MB    MiB/s   33258.0
  26 │ mem          Bandwidth100MB   MiB/s   28390.2
  27 │ diskio       DiskWrite1KB     ms      0.049584
  28 │ diskio       DiskWrite1MB     ms      0.166958
  29 │ diskio       DiskRead1KB      ms      0.0102188
  30 │ diskio       DiskRead1MB      ms      0.216708
  31 │ loading      JuliaLoad        ms      184.611
  32 │ compilation  compilecache     ms      257.398

@LilithHafner
Copy link

results.txt

 Row │ cat          testname         units   res                               
     │ String       String           String  Any                               
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.0-DEV.1053
   3 │ info         OS                       macOS (x86_64-apple-darwin21.5.0)
   4 │ info         CPU                      4 × Intel(R) Core(TM) i5-8210Y C…
   5 │ info         CPU_THREADS              4
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-14.0.5 (ORCJIT, skylake)
   9 │ info         BLAS                     libopenblas64_.0.3.20
  10 │ info         BLAS_threads             2
  11 │ info         GPU                      missing                           
  12 │ cpu          FloatMul         ms      3.5605e-6
  13 │ cpu          FusedMulAdd      ms      3.3115e-6
  14 │ cpu          FloatSin         ms      9.9105e-6
  15 │ cpu          VecMulBroad      ms      8.96273e-5
  16 │ cpu          CPUMatMul        ms      0.268553
  17 │ cpu          MatMulBroad      ms      0.027973
  18 │ cpu          3DMulBroad       ms      0.0024923
  19 │ cpu          peakflops        flops   2.88603e10
  20 │ cpu          FFMPEGH264Write  ms      818.48
  21 │ mem          DeepCopy         ms      0.000353534
  22 │ mem          Bandwidth10kB    MiB/s   93164.2
  23 │ mem          Bandwidth100kB   MiB/s   27693.6
  24 │ mem          Bandwidth1MB     MiB/s   21713.4
  25 │ mem          Bandwidth10MB    MiB/s   9410.86
  26 │ mem          Bandwidth100MB   MiB/s   8546.43
  27 │ diskio       DiskWrite1KB     ms      0.273521
  28 │ diskio       DiskWrite1MB     ms      0.7204
  29 │ diskio       DiskRead1KB      ms      0.165685
  30 │ diskio       DiskRead1MB      ms      1.28638
  31 │ loading      JuliaLoad        ms      306.817
  32 │ compilation  compilecache     ms      459.195

@sverek
Copy link

sverek commented Oct 8, 2022

result.txt

 Row │ cat          testname         units   res                               ⋯
     │ String       String           String  Any                               ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1                             ⋯
   2 │ info         JuliaVer                 1.8.2
   3 │ info         OS                       macOS (arm64-apple-darwin21.3.0)
   4 │ info         CPU                      8 × Apple M2
   5 │ info         CPU_THREADS              4                                 ⋯
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-13.0.1 (ORCJIT, apple-m1)
   9 │ info         BLAS                     libopenblas64_                    ⋯
  10 │ info         BLAS_threads             4
  11 │ info         GPU                      missing
  12 │ cpu          FloatMul         ms      2.042e-6
  13 │ cpu          FusedMulAdd      ms      2.0e-6                            ⋯
  14 │ cpu          FloatSin         ms      4.417e-6
  15 │ cpu          VecMulBroad      ms      2.66219e-5
  16 │ cpu          CPUMatMul        ms      0.028917
  17 │ cpu          MatMulBroad      ms      0.003175                          ⋯
  18 │ cpu          3DMulBroad       ms      0.000998086
  19 │ cpu          peakflops        flops   1.83735e11
  20 │ cpu          FFMPEGH264Write  ms      166.83
  21 │ mem          DeepCopy         ms      0.000164592                       ⋯
  22 │ mem          Bandwidth10kB    MiB/s   85541.7
  23 │ mem          Bandwidth100kB   MiB/s   89412.6
  24 │ mem          Bandwidth1MB     MiB/s   69570.6
  25 │ mem          Bandwidth10MB    MiB/s   40661.1                           ⋯
  26 │ mem          Bandwidth100MB   MiB/s   36535.7
  27 │ diskio       DiskWrite1KB     ms      0.045292
  28 │ diskio       DiskWrite1MB     ms      0.234167
  29 │ diskio       DiskRead1KB      ms      0.00834375                        ⋯
  30 │ diskio       DiskRead1MB      ms      0.332541
  31 │ loading      JuliaLoad        ms      74.7802
  32 │ compilation  compilecache     ms      117.742

@etatara
Copy link

etatara commented Nov 6, 2022

results.txt

 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.8.0
   3 │ info         OS                       Linux (x86_64-linux-gnu)
   4 │ info         CPU                      24 × AMD Ryzen 9 5900X 12-Core P…
   5 │ info         CPU_THREADS              24
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-13.0.1 (ORCJIT, znver3)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             12
  11 │ info         GPU                      NVIDIA GeForce GTX 1660 SUPER
  12 │ cpu          FloatMul         ms      1.93e-6
  13 │ cpu          FusedMulAdd      ms      1.99e-6
  14 │ cpu          FloatSin         ms      3.67e-6
  15 │ cpu          VecMulBroad      ms      4.00702e-5
  16 │ cpu          CPUMatMul        ms      0.0659
  17 │ cpu          MatMulBroad      ms      0.002292
  18 │ cpu          3DMulBroad       ms      0.000850092
  19 │ cpu          peakflops        flops   3.36684e11
  20 │ cpu          FFMPEGH264Write  ms      229.91
  21 │ gpu          GPUMatMul        ms      0.00594667
  22 │ mem          DeepCopy         ms      0.000140317
  23 │ mem          Bandwidth10kB    MiB/s   1.26436e5
  24 │ mem          Bandwidth100kB   MiB/s   72578.0
  25 │ mem          Bandwidth1MB     MiB/s   39375.5
  26 │ mem          Bandwidth10MB    MiB/s   25572.5
  27 │ mem          Bandwidth100MB   MiB/s   22887.9
  28 │ diskio       DiskWrite1KB     ms      0.11576
  29 │ diskio       DiskWrite1MB     ms      1.10894
  30 │ diskio       DiskRead1KB      ms      0.00301333
  31 │ diskio       DiskRead1MB      ms      0.06603
  32 │ loading      JuliaLoad        ms      89.3183
  33 │ compilation  compilecache     ms      168.83

@alepensato
Copy link

results.txt

@IanButterworth
Copy link
Owner Author

IanButterworth commented Oct 26, 2023

I was interested to see how github CI runners compare so the CI on this package now saves results files

Linux_1.10-nightly_results.txt
Linux_1_results.txt
Windows_1.10-nightly_results.txt
Windows_1_results.txt
macOS_1.10-nightly_results.txt
macOS_1_results.txt

These are those alone

summary_report
memory_report

compared to all crowd data to date
summary_report
memory_report

Note: I think the analysis can be greatly improved here..
Contributions very welcome.. the package is due an overhaul

@bjarthur
Copy link

osx-arm64.csv

 Row │ cat          testname         units   res                               
     │ String       String           String  Any                               
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.4
   3 │ info         OS                       macOS (arm64-apple-darwin22.4.0)
   4 │ info         CPU                      12 × Apple M2 Max
   5 │ info         CPU_THREADS              8
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-14.0.6 (ORCJIT, apple-m1)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      missing                           
  12 │ cpu          FloatMul         ms      1.5e-6
  13 │ cpu          FusedMulAdd      ms      1.5e-6
  14 │ cpu          FloatSin         ms      3.542e-6
  15 │ cpu          VecMulBroad      ms      2.15236e-5
  16 │ cpu          CPUMatMul        ms      0.103417
  17 │ cpu          MatMulBroad      ms      0.0031
  18 │ cpu          3DMulBroad       ms      0.000879412
  19 │ cpu          peakflops        flops   3.18455e11
  20 │ cpu          FFMPEGH264Write  ms      223.756
  21 │ mem          DeepCopy         ms      0.000104596
  22 │ mem          Bandwidth10kB    MiB/s   85842.0
  23 │ mem          Bandwidth100kB   MiB/s   91550.5
  24 │ mem          Bandwidth1MB     MiB/s   69994.4
  25 │ mem          Bandwidth10MB    MiB/s   50226.4
  26 │ mem          Bandwidth100MB   MiB/s   52317.0
  27 │ diskio       DiskWrite1KB     ms      0.051167
  28 │ diskio       DiskWrite1MB     ms      0.146417
  29 │ diskio       DiskRead1KB      ms      0.019083
  30 │ diskio       DiskRead1MB      ms      0.234292
  31 │ loading      JuliaLoad        ms      111.554
  32 │ compilation  compilecache     ms      231.132

@bjarthur
Copy link

bjarthur commented Dec 19, 2023

here is a direct comparison of MS Windows 10 and WSL2 Ubuntu 22.04 on the same machine:

win10.csv

33×4 DataFrame
 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.4
   3 │ info         OS                       Windows (x86_64-w64-mingw32)
   4 │ info         CPU                      32 × Intel(R) Xeon(R) Silver 411…
   5 │ info         CPU_THREADS              32
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-14.0.6 (ORCJIT, skylake-…
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             16
  11 │ info         GPU                      NVIDIA TITAN RTX
  12 │ cpu          FloatMul         ms      3.1e-6
  13 │ cpu          FusedMulAdd      ms      3.1e-6
  14 │ cpu          FloatSin         ms      7.8e-6
  15 │ cpu          VecMulBroad      ms      5.9787e-5
  16 │ cpu          CPUMatMul        ms      0.0489
  17 │ cpu          MatMulBroad      ms      0.00708333
  18 │ cpu          3DMulBroad       ms      0.0016
  19 │ cpu          peakflops        flops   1.19055e11
  20 │ cpu          FFMPEGH264Write  ms      548.557
  21 │ gpu          GPUMatMul        ms      0.0233
  22 │ mem          DeepCopy         ms      0.000318815
  23 │ mem          Bandwidth10kB    MiB/s   41483.9
  24 │ mem          Bandwidth100kB   MiB/s   28681.9
  25 │ mem          Bandwidth1MB     MiB/s   4396.84
  26 │ mem          Bandwidth10MB    MiB/s   3662.06
  27 │ mem          Bandwidth100MB   MiB/s   3403.9
  28 │ diskio       DiskWrite1KB     ms      0.9758
  29 │ diskio       DiskWrite1MB     ms      4.0889
  30 │ diskio       DiskRead1KB      ms      0.2168
  31 │ diskio       DiskRead1MB      ms      0.71975
  32 │ loading      JuliaLoad        ms      520.717
  33 │ compilation  compilecache     ms      865.807

wsl2.csv

33×4 DataFrame
 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.4
   3 │ info         OS                       Linux (x86_64-linux-gnu)
   4 │ info         CPU                      32 × Intel(R) Xeon(R) Silver 411…
   5 │ info         CPU_THREADS              32
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-14.0.6 (ORCJIT, skylake-…
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             16
  11 │ info         GPU                      NVIDIA TITAN RTX
  12 │ cpu          FloatMul         ms      3.047e-6
  13 │ cpu          FusedMulAdd      ms      3.381e-6
  14 │ cpu          FloatSin         ms      7.07407e-6
  15 │ cpu          VecMulBroad      ms      6.59909e-5
  16 │ cpu          CPUMatMul        ms      0.0387035
  17 │ cpu          MatMulBroad      ms      0.00497664
  18 │ cpu          3DMulBroad       ms      0.0017847
  19 │ cpu          peakflops        flops   1.49769e11
  20 │ cpu          FFMPEGH264Write  ms      591.083
  21 │ gpu          GPUMatMul        ms      0.028873
  22 │ mem          DeepCopy         ms      0.000296031
  23 │ mem          Bandwidth10kB    MiB/s   1.11624e5
  24 │ mem          Bandwidth100kB   MiB/s   37986.6
  25 │ mem          Bandwidth1MB     MiB/s   6019.11
  26 │ mem          Bandwidth10MB    MiB/s   6048.19
  27 │ mem          Bandwidth100MB   MiB/s   5013.63
  28 │ diskio       DiskWrite1KB     ms      0.479531
  29 │ diskio       DiskWrite1MB     ms      19.7946
  30 │ diskio       DiskRead1KB      ms      0.0134912
  31 │ diskio       DiskRead1MB      ms      0.512911
  32 │ loading      JuliaLoad        ms      262.244
  33 │ compilation  compilecache     ms      554.391

and the side-by-side comparison:

33×6 DataFrame
 Row │ cat          testname         units    ref_res                            test_res                           factor
     │ String       String           String?  Any                                Any                                Any
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer               0.4.1                              0.4.1                              Equal
   2 │ info         JuliaVer                  1.9.4                              1.9.4                              Equal
   3 │ info         OS                        Windows (x86_64-w64-mingw32)       Linux (x86_64-linux-gnu)           Not Equal
   4 │ info         CPU                       32 × Intel(R) Xeon(R) Silver 411…  32 × Intel(R) Xeon(R) Silver 411…  Equal
   5 │ info         CPU_THREADS               32                                 32                                 Not Equal
   6 │ info         WORD_SIZE                 64                                 64                                 Not Equal
   7 │ info         LIBM                      libopenlibm                        libopenlibm                        Equal
   8 │ info         LLVM                      libLLVM-14.0.6 (ORCJIT, skylake-…  libLLVM-14.0.6 (ORCJIT, skylake-…  Equal
   9 │ info         BLAS                      libopenblas64_                     libopenblas64_                     Equal
  10 │ info         BLAS_threads              16                                 16                                 Equal
  11 │ info         GPU                       NVIDIA TITAN RTX                   NVIDIA TITAN RTX                   Equal
  12 │ cpu          FloatMul         ms       3.1e-6                             3.047e-6                           0.982903
  13 │ cpu          FusedMulAdd      ms       3.1e-6                             3.381e-6                           1.09065
  14 │ cpu          FloatSin         ms       7.8e-6                             7.07407e-6                         0.906933
  15 │ cpu          VecMulBroad      ms       5.9787e-5                          6.59909e-5                         1.10377
  16 │ cpu          CPUMatMul        ms       0.0489                             0.0387035                          0.791483
  17 │ cpu          MatMulBroad      ms       0.00708333                         0.00497664                         0.702585
  18 │ cpu          3DMulBroad       ms       0.0016                             0.0017847                          1.11544
  19 │ cpu          peakflops        flops    1.19055e11                         1.49769e11                         1.25799
  20 │ cpu          FFMPEGH264Write  ms       548.557                            591.083                            1.07752
  21 │ gpu          GPUMatMul        ms       0.0233                             0.028873                           1.23918
  22 │ mem          DeepCopy         ms       0.000318815                        0.000296031                        0.928534
  23 │ mem          Bandwidth10kB    MiB/s    41483.9                            1.11624e5                          2.69077
  24 │ mem          Bandwidth100kB   MiB/s    28681.9                            37986.6                            1.32441
  25 │ mem          Bandwidth1MB     MiB/s    4396.84                            6019.11                            1.36896
  26 │ mem          Bandwidth10MB    MiB/s    3662.06                            6048.19                            1.65158
  27 │ mem          Bandwidth100MB   MiB/s    3403.9                             5013.63                            1.47291
  28 │ diskio       DiskWrite1KB     ms       0.9758                             0.479531                           0.491424
  29 │ diskio       DiskWrite1MB     ms       4.0889                             19.7946                            4.84106
  30 │ diskio       DiskRead1KB      ms       0.2168                             0.0134912                          0.062229
  31 │ diskio       DiskRead1MB      ms       0.71975                            0.512911                           0.712625
  32 │ loading      JuliaLoad        ms       520.717                            262.244                            0.503621
  33 │ compilation  compilecache     ms       865.807                            554.391                            0.640317

as a suggestion: the last column in compare would be easier to interpret if a larger number (or smaller take your pick) were always better. as it stands now, for ms a smaller factor is better, but for flops and MiB/s bigger is better. perhaps invert the factor for ms? or at least provide doing so as an option via a keyword argument.

also, why in this chart are the CPU_THREADS and WORD_SIZE not considered equal?

and wow, on the same machine i'm surprised that there is such a large and consistent difference in both memory and disk bandwidth in favor of WSL2!

@bjarthur
Copy link

another direct comparison of MS Windows 10 and WSL2 Ubuntu 22.04 on the same machine (different from the one in the immediately preceding post):

win10.csv

 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.4
   3 │ info         OS                       Windows (x86_64-w64-mingw32)
   4 │ info         CPU                      8 × Intel(R) Core(TM) i7-2600 CP…
   5 │ info         CPU_THREADS              8
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-14.0.6 (ORCJIT, sandybri…
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             4
  11 │ info         GPU                      NVIDIA GeForce GTX 1080 Ti
  12 │ cpu          FloatMul         ms      3.0e-6
  13 │ cpu          FusedMulAdd      ms      3.0e-6
  14 │ cpu          FloatSin         ms      7.0e-6
  15 │ cpu          VecMulBroad      ms      5.52525e-5
  16 │ cpu          CPUMatMul        ms      0.1157
  17 │ cpu          MatMulBroad      ms      0.0065
  18 │ cpu          3DMulBroad       ms      0.00125
  19 │ cpu          peakflops        flops   9.49836e10
  20 │ cpu          FFMPEGH264Write  ms      491.677
  21 │ gpu          GPUMatMul        ms      0.0494
  22 │ mem          DeepCopy         ms      0.000247036
  23 │ mem          Bandwidth10kB    MiB/s   52516.8
  24 │ mem          Bandwidth100kB   MiB/s   26308.3
  25 │ mem          Bandwidth1MB     MiB/s   9117.35
  26 │ mem          Bandwidth10MB    MiB/s   6874.32
  27 │ mem          Bandwidth100MB   MiB/s   7917.99
  28 │ diskio       DiskWrite1KB     ms      5.65355
  29 │ diskio       DiskWrite1MB     ms      6.43355
  30 │ diskio       DiskRead1KB      ms      5.6689
  31 │ diskio       DiskRead1MB      ms      6.9831
  32 │ loading      JuliaLoad        ms      286.929
  33 │ compilation  compilecache     ms      580.596

wsl2.csv

33×4 DataFrame
 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.9.4
   3 │ info         OS                       Linux (x86_64-linux-gnu)
   4 │ info         CPU                      8 × Intel(R) Core(TM) i7-2600 CP…
   5 │ info         CPU_THREADS              8
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-14.0.6 (ORCJIT, sandybri…
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             4
  11 │ info         GPU                      NVIDIA GeForce GTX 1080 Ti
  12 │ cpu          FloatMul         ms      2.9e-6
  13 │ cpu          FusedMulAdd      ms      2.9e-6
  14 │ cpu          FloatSin         ms      6.4e-6
  15 │ cpu          VecMulBroad      ms      5.09586e-5
  16 │ cpu          CPUMatMul        ms      0.1437
  17 │ cpu          MatMulBroad      ms      0.00534286
  18 │ cpu          3DMulBroad       ms      0.00154
  19 │ cpu          peakflops        flops   9.13678e10
  20 │ cpu          FFMPEGH264Write  ms      478.415
  21 │ gpu          GPUMatMul        ms      0.0503
  22 │ mem          DeepCopy         ms      0.000228833
  23 │ mem          Bandwidth10kB    MiB/s   53066.6
  24 │ mem          Bandwidth100kB   MiB/s   26959.0
  25 │ mem          Bandwidth1MB     MiB/s   10145.6
  26 │ mem          Bandwidth10MB    MiB/s   6240.52
  27 │ mem          Bandwidth100MB   MiB/s   7526.56
  28 │ diskio       DiskWrite1KB     ms      0.276599
  29 │ diskio       DiskWrite1MB     ms      4.786
  30 │ diskio       DiskRead1KB      ms      0.01095
  31 │ diskio       DiskRead1MB      ms      0.1353
  32 │ loading      JuliaLoad        ms      189.399
  33 │ compilation  compilecache     ms      448.462

and the side-by-side comparison:

 Row │ cat          testname         units    ref_res                            test_res                           factor
     │ String       String           String?  Any                                Any                                Any
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer               0.4.1                              0.4.1                              Equal
   2 │ info         JuliaVer                  1.9.4                              1.9.4                              Equal
   3 │ info         OS                        Windows (x86_64-w64-mingw32)       Linux (x86_64-linux-gnu)           Not Equal
   4 │ info         CPU                       8 × Intel(R) Core(TM) i7-2600 CP…  8 × Intel(R) Core(TM) i7-2600 CP…  Equal
   5 │ info         CPU_THREADS               8                                  8                                  Not Equal
   6 │ info         WORD_SIZE                 64                                 64                                 Not Equal
   7 │ info         LIBM                      libopenlibm                        libopenlibm                        Equal
   8 │ info         LLVM                      libLLVM-14.0.6 (ORCJIT, sandybri…  libLLVM-14.0.6 (ORCJIT, sandybri…  Equal
   9 │ info         BLAS                      libopenblas64_                     libopenblas64_                     Equal
  10 │ info         BLAS_threads              4                                  4                                  Equal
  11 │ info         GPU                       NVIDIA GeForce GTX 1080 Ti         NVIDIA GeForce GTX 1080 Ti         Equal
  12 │ cpu          FloatMul         ms       3.0e-6                             2.9e-6                             0.966667
  13 │ cpu          FusedMulAdd      ms       3.0e-6                             2.9e-6                             0.966667
  14 │ cpu          FloatSin         ms       7.0e-6                             6.4e-6                             0.914286
  15 │ cpu          VecMulBroad      ms       5.52525e-5                         5.09586e-5                         0.922286
  16 │ cpu          CPUMatMul        ms       0.1157                             0.1437                             1.24201
  17 │ cpu          MatMulBroad      ms       0.0065                             0.00534286                         0.821978
  18 │ cpu          3DMulBroad       ms       0.00125                            0.00154                            1.232
  19 │ cpu          peakflops        flops    9.49836e10                         9.13678e10                         0.961933
  20 │ cpu          FFMPEGH264Write  ms       491.677                            478.415                            0.973026
  21 │ gpu          GPUMatMul        ms       0.0494                             0.0503                             1.01822
  22 │ mem          DeepCopy         ms       0.000247036                        0.000228833                        0.926317
  23 │ mem          Bandwidth10kB    MiB/s    52516.8                            53066.6                            1.01047
  24 │ mem          Bandwidth100kB   MiB/s    26308.3                            26959.0                            1.02473
  25 │ mem          Bandwidth1MB     MiB/s    9117.35                            10145.6                            1.11278
  26 │ mem          Bandwidth10MB    MiB/s    6874.32                            6240.52                            0.907802
  27 │ mem          Bandwidth100MB   MiB/s    7917.99                            7526.56                            0.950564
  28 │ diskio       DiskWrite1KB     ms       5.65355                            0.276599                           0.0489248
  29 │ diskio       DiskWrite1MB     ms       6.43355                            4.786                              0.743912
  30 │ diskio       DiskRead1KB      ms       5.6689                             0.01095                            0.00193159
  31 │ diskio       DiskRead1MB      ms       6.9831                             0.1353                             0.0193753
  32 │ loading      JuliaLoad        ms       286.929                            189.399                            0.660088
  33 │ compilation  compilecache     ms       580.596                            448.462                            0.772417

WSL2 again is usually faster but not by as much, except for disk i/o where 3 of 4 tests are >10x faster! not sure how to explain the DiskWrite1MB outlier in WSL2.

@giordano
Copy link
Collaborator

Nvidia Grace Hopper GH200:

33×4 DataFrame
 Row │ cat          testname         units   res
     │ String       String           String  Any
─────┼────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.10.0-rc3
   3 │ info         OS                       Linux (aarch64-linux-gnu)
   4 │ info         CPU                      72 × unknown
   5 │ info         CPU_THREADS              72
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-15.0.7 (ORCJIT, generic)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             36
  11 │ info         GPU                      GH200 480GB
  12 │ cpu          FloatMul         ms      1.568e-6
  13 │ cpu          FusedMulAdd      ms      1.568e-6
  14 │ cpu          FloatSin         ms      4.224e-6
  15 │ cpu          VecMulBroad      ms      2.41285e-5
  16 │ cpu          CPUMatMul        ms      0.121153
  17 │ cpu          MatMulBroad      ms      0.0015137
  18 │ cpu          3DMulBroad       ms      0.000580823
  19 │ cpu          peakflops        flops   9.10277e11
  20 │ cpu          FFMPEGH264Write  ms      241.032
  21 │ gpu          GPUMatMul        ms      0.022593
  22 │ mem          DeepCopy         ms      9.90535e-5
  23 │ mem          Bandwidth10kB    MiB/s   72855.0
  24 │ mem          Bandwidth100kB   MiB/s   73224.4
  25 │ mem          Bandwidth1MB     MiB/s   41798.5
  26 │ mem          Bandwidth10MB    MiB/s   30563.1
  27 │ mem          Bandwidth100MB   MiB/s   17787.9
  28 │ diskio       DiskWrite1KB     ms      0.050177
  29 │ diskio       DiskWrite1MB     ms      0.430916
  30 │ diskio       DiskRead1KB      ms      0.006544
  31 │ diskio       DiskRead1MB      ms      0.05584
  32 │ loading      JuliaLoad        ms      57.7533
  33 │ compilation  compilecache     ms      155.209

gh200-result.txt

@oschulz
Copy link

oschulz commented Dec 22, 2023

@giordano , any idea why mem-IO seems worse on GH200 than on OS-X-arm64 (post by @bjarthur )? Should both be LPDDR6 with very similar bandwidth, right?

@giordano
Copy link
Collaborator

I don't know what the M2 has, but my GH200 has 480 GB of LPDDR5X with a peak memory bandwidth of 384 GB/s (reference: https://docs.nvidia.com/gh200-benchmarking-guide.pdf)

@oschulz
Copy link

oschulz commented Dec 22, 2023

According to Wikipedia, an M2 Max (tested above) has 400 GB/s LPDDR5 (I was wrong about LPDDR6). So they should be pretty similar - what really confuses me is why the 100MB-chunk throughput seems so low on the GH200.

@oschulz
Copy link

oschulz commented Dec 22, 2023

I think the single-core math performance on the GH200 is very encouraging though. The Apple M-series is good, and given the lower CPU clock speed of the GH200 (correct?), those number look quite competitive, I'd say.

@oschulz
Copy link

oschulz commented Dec 22, 2023

@bjarthur , could you maybe re-run the M2-max benchmarks on Julia v1.10.0-rc3, for a closer comparison?

@giordano
Copy link
Collaborator

and given the lower CPU clock speed of the GH200 (correct?)

According to https://docs.nvidia.com/gh200-benchmarking-guide.pdf the Grace CPU has 3.1 GHz of clock rate, but in lscpu I see

Vendor ID:              ARM
  Model name:           Neoverse-V2
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 72
    Socket(s):          1
    Stepping:           r0p0
    Frequency boost:    disabled
    CPU max MHz:        3474.0000
    CPU min MHz:        81.0000
    BogoMIPS:           2000.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm
                         svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh

Apple is notoriously secretive about the clock speed of the Apple Silicon chips, according to https://apple.techable.com/specs/bto-cto-macbook-pro-m2-max-12-core-cpu-38-core-gpu-14-inch-2023 the M2 Max should have a clock rate of ~3.68 GHz, they probably ran some benchmarks to figure that number out.

But yeah, it'd appear the Grace CPU is slightly lower powered than the M2 Max.

@bjarthur
Copy link

sure. here you go. a couple mem/io results are slower now but still all faster than GH200:

osx-arm64-pluggedin.csv

 Row │ cat          testname         units   res                               
     │ String       String           String  Any                               
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.10.0-rc3
   3 │ info         OS                       macOS (arm64-apple-darwin22.4.0)
   4 │ info         CPU                      12 × Apple M2 Max
   5 │ info         CPU_THREADS              8
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-15.0.7 (ORCJIT, apple-m1)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      missing                           
  12 │ cpu          FloatMul         ms      1.458e-6
  13 │ cpu          FusedMulAdd      ms      1.458e-6
  14 │ cpu          FloatSin         ms      3.458e-6
  15 │ cpu          VecMulBroad      ms      1.87465e-5
  16 │ cpu          CPUMatMul        ms      0.019208
  17 │ cpu          MatMulBroad      ms      0.0029
  18 │ cpu          3DMulBroad       ms      0.000709799
  19 │ cpu          peakflops        flops   3.70989e11
  20 │ cpu          FFMPEGH264Write  ms      228.307
  21 │ mem          DeepCopy         ms      8.69494e-5
  22 │ mem          Bandwidth10kB    MiB/s   85818.8
  23 │ mem          Bandwidth100kB   MiB/s   92078.9
  24 │ mem          Bandwidth1MB     MiB/s   58836.1
  25 │ mem          Bandwidth10MB    MiB/s   38300.2
  26 │ mem          Bandwidth100MB   MiB/s   52516.3
  27 │ diskio       DiskWrite1KB     ms      0.051542
  28 │ diskio       DiskWrite1MB     ms      0.147313
  29 │ diskio       DiskRead1KB      ms      0.020042
  30 │ diskio       DiskRead1MB      ms      0.235334
  31 │ loading      JuliaLoad        ms      114.082
  32 │ compilation  compilecache     ms      212.076

and for comparison, i was curious how much slower low-power mode was:

osx-arm64-unplugged.csv

 Row │ cat          testname         units   res                               
     │ String       String           String  Any                               
─────┼─────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer              0.4.1
   2 │ info         JuliaVer                 1.10.0-rc3
   3 │ info         OS                       macOS (arm64-apple-darwin22.4.0)
   4 │ info         CPU                      12 × Apple M2 Max
   5 │ info         CPU_THREADS              8
   6 │ info         WORD_SIZE                64
   7 │ info         LIBM                     libopenlibm
   8 │ info         LLVM                     libLLVM-15.0.7 (ORCJIT, apple-m1)
   9 │ info         BLAS                     libopenblas64_
  10 │ info         BLAS_threads             8
  11 │ info         GPU                      missing                           
  12 │ cpu          FloatMul         ms      2.125e-6
  13 │ cpu          FusedMulAdd      ms      2.125e-6
  14 │ cpu          FloatSin         ms      5.042e-6
  15 │ cpu          VecMulBroad      ms      2.75271e-5
  16 │ cpu          CPUMatMul        ms      0.024667
  17 │ cpu          MatMulBroad      ms      0.0054792
  18 │ cpu          3DMulBroad       ms      0.00127308
  19 │ cpu          peakflops        flops   2.62707e11
  20 │ cpu          FFMPEGH264Write  ms      318.503
  21 │ mem          DeepCopy         ms      0.000162001
  22 │ mem          Bandwidth10kB    MiB/s   60240.1
  23 │ mem          Bandwidth100kB   MiB/s   62878.2
  24 │ mem          Bandwidth1MB     MiB/s   48491.1
  25 │ mem          Bandwidth10MB    MiB/s   36192.6
  26 │ mem          Bandwidth100MB   MiB/s   26127.5
  27 │ diskio       DiskWrite1KB     ms      0.066667
  28 │ diskio       DiskWrite1MB     ms      0.163312
  29 │ diskio       DiskRead1KB      ms      0.018459
  30 │ diskio       DiskRead1MB      ms      0.348458
  31 │ loading      JuliaLoad        ms      175.931
  32 │ compilation  compilecache     ms      353.942

and the side-by-side with pluggedin as the reference/left column:

 Row │ cat          testname         units    ref_res                            test_res                           factor    
     │ String       String           String?  Any                                Any                                Any       
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ info         SysBenchVer               0.4.1                              0.4.1                              Equal
   2 │ info         JuliaVer                  1.10.0-rc3                         1.10.0-rc3                         Equal
   3 │ info         OS                        macOS (arm64-apple-darwin22.4.0)   macOS (arm64-apple-darwin22.4.0)   Equal
   4 │ info         CPU                       12 × Apple M2 Max                  12 × Apple M2 Max                  Equal
   5 │ info         CPU_THREADS               8                                  8                                  Equal
   6 │ info         WORD_SIZE                 64                                 64                                 Equal
   7 │ info         LIBM                      libopenlibm                        libopenlibm                        Equal
   8 │ info         LLVM                      libLLVM-15.0.7 (ORCJIT, apple-m1)  libLLVM-15.0.7 (ORCJIT, apple-m1)  Equal
   9 │ info         BLAS                      libopenblas64_                     libopenblas64_                     Equal
  10 │ info         BLAS_threads              8                                  8                                  Equal
  11 │ info         GPU                       missing                            missing                            Not Equal
  12 │ cpu          FloatMul         ms       1.458e-6                           2.125e-6                           1.45748
  13 │ cpu          FusedMulAdd      ms       1.458e-6                           2.125e-6                           1.45748
  14 │ cpu          FloatSin         ms       3.458e-6                           5.042e-6                           1.45807
  15 │ cpu          VecMulBroad      ms       1.87465e-5                         2.75271e-5                         1.46839
  16 │ cpu          CPUMatMul        ms       0.019208                           0.024667                           1.2842
  17 │ cpu          MatMulBroad      ms       0.0029                             0.0054792                          1.88938
  18 │ cpu          3DMulBroad       ms       0.000709799                        0.00127308                         1.79358
  19 │ cpu          peakflops        flops    3.70989e11                         2.62707e11                         0.708124
  20 │ cpu          FFMPEGH264Write  ms       228.307                            318.503                            1.39507
  21 │ mem          DeepCopy         ms       8.69494e-5                         0.000162001                        1.86317
  22 │ mem          Bandwidth10kB    MiB/s    85818.8                            60240.1                            0.701945
  23 │ mem          Bandwidth100kB   MiB/s    92078.9                            62878.2                            0.682874
  24 │ mem          Bandwidth1MB     MiB/s    58836.1                            48491.1                            0.824172
  25 │ mem          Bandwidth10MB    MiB/s    38300.2                            36192.6                            0.944972
  26 │ mem          Bandwidth100MB   MiB/s    52516.3                            26127.5                            0.497511
  27 │ diskio       DiskWrite1KB     ms       0.051542                           0.066667                           1.29345
  28 │ diskio       DiskWrite1MB     ms       0.147313                           0.163312                           1.10861
  29 │ diskio       DiskRead1KB      ms       0.020042                           0.018459                           0.921016
  30 │ diskio       DiskRead1MB      ms       0.235334                           0.348458                           1.4807
  31 │ loading      JuliaLoad        ms       114.082                            175.931                            1.54214
  32 │ compilation  compilecache     ms       212.076                            353.942                            1.66894

not sure exactly what apple does when unplugged-- lower clock speed and/or use an efficiency core. would be nice if SystemBenchmarks.jl had a way to pin the thread to a specific core.

@oschulz
Copy link

oschulz commented Dec 22, 2023

But yeah, it'd appear the Grace CPU is slightly lower powered than the M2 Max.

Yes, though it may be more or less on par per clock, which is very nice to see.

@oschulz
Copy link

oschulz commented Dec 22, 2023

The bad CPUMatMul result on the GH200 is confusing, though.

@Moelf
Copy link

Moelf commented Jan 31, 2024

result.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests