-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submit benchmark files here #8
Comments
|
laptop: MSI Prestige 15
|
│ Row │ cat │ testname │ res │
│ │ String │ String │ Any │
├─────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ 1 │ info │ SysBenchVer │ 0.2.0 │
│ 2 │ info │ JuliaVer │ 1.4.1 │
│ 3 │ info │ OS │ Linux (x86_64-linux-gnu) │
│ 4 │ info │ CPU │ AMD Ryzen Threadripper 2950X 16-Core Processor │
│ 5 │ info │ WORD_SIZE │ 64 │
│ 6 │ info │ LIBM │ libopenlibm │
│ 7 │ info │ LLVM │ libLLVM-8.0.1 (ORCJIT, znver1) │
│ 8 │ info │ GPU │ missing │
│ 9 │ cpu │ FloatMul │ 1.163e-6 │
│ 10 │ cpu │ FusedMulAdd │ 2.0e-8 │
│ 11 │ cpu │ FloatSin │ 3.066e-6 │
│ 12 │ cpu │ VecMulBroad │ 2.94643e-5 │
│ 13 │ cpu │ CPUMatMul │ 0.054506 │
│ 14 │ cpu │ MatMulBroad │ 0.00396512 │
│ 15 │ cpu │ 3DMulBroad │ 0.0013856 │
│ 16 │ cpu │ peakflops │ 1.90993e11 │
│ 17 │ cpu │ FFMPEGH264Write │ 160.665 │
│ 18 │ mem │ DeepCopy │ 0.000176566 │
│ 19 │ diskio │ DiskWrite1KB │ 0.0259 │
│ 20 │ diskio │ DiskWrite1MB │ 0.869498 │
│ 21 │ diskio │ DiskRead1KB │ 0.00591286 │
│ 22 │ diskio │ DiskRead1MB │ 0.113459 │
│ 23 │ loading │ JuliaLoad │ 135.932 │
│ 24 │ compilation │ compilecache │ 248.265 │
│ 25 │ compilation │ create_expr_cache │ 1.34587 │ |
|
|
@ianshmean a bit of AArch64 goodness for you (CM3+)
|
Regarding the bad DiskWrite1MB performance: this is on a Pixelbook with eMMC hard drive, and the Linux VM in Chrome OS uses Btrfs which has very bad performance on PS: the readme says "writeBenchmark" instead of "useBenchmark" |
|
Thanks for all the submissions. You can now pull them all in simply by running:
Also, I fixed the formatting of some function names |
So something is up with writing on macOS. On that same system, if I run
So 1 KB takes about 30 µs, and 1 MB takes 660 µs. Looks like some overhead with creating small files somewhere. |
|
Macbook pro: result_laptop.txt
Desktop: result.txt
|
There were 3 warnings during test:
|
Nvidia Jetson Xavier NX
|
MacBook Pro (Retina, 15-inch, Mid 2015) ArchLinux: results_linux.txt
macOS 10.14.3 (18D42): results-macos.txt
|
Nvidia Xavier NX (on julia 1.4.1 official binary this time)
|
Here are 3 benchmarks from the same machine (an Apple Mac Mini 2018) with different OSs: macOS 10.14: result_OSX10.14.txt
macOS 10.15: result_OSX10.15.txt
Windows 10: result_Win10.txt
|
An older computer results.txt:
|
I've added a few more compilation tests and fixed the Macbook Pro 2018
|
Nvidia xavier nx
|
My usual system, with ArchLinux:
|
|
Benchmarked a bunch of systems :-) results.txt: my dev system, 5-yo CPU but OCd to 4.5 GHz, and with NVME storage and a high-end GPU:
results.txt: main JuliaGPU CI system. lower IPC, but this is a dual-CPU system with 4c/8t each. that's not reflected from these results though:
results.txt: high-end Ryzen Threadripper (previous-gen), but surprisingly low numbers for some benchmarks...
results.txt: Jetson AGX Xavier devkit. as expected, bad numbers due to ARM, even though this is a pretty powerful system:
results.txt: a Jetson Nano devkit, much lower-power ARM system with sd-card storage:
|
result.txt: Epyc-2 compute server, Tesla V100 GPU, Intel Optane NVMe storage:
|
|
GPU : Nvidia GTX TITAN X
|
|
This is Fugaku:
Similar disappointing results as the other a64fx system I benchmarked above. |
Any idea why CPUMatMul and the like to so much worse on a64fx compared to M1? |
Same machine as #8 (comment), but with Asahi Linux
For a more fair comparison, I reran the benchmark on macOS with the same version of Julia and SystemBenchmark:
Comparison macOS vs Linux:
|
My own laptop. result.txt
Server in Quantum X center. result.txt
|
|
|
|
|
|
I was interested to see how github CI runners compare so the CI on this package now saves results files Linux_1.10-nightly_results.txt These are those alone compared to all crowd data to date Note: I think the analysis can be greatly improved here.. |
|
here is a direct comparison of MS Windows 10 and WSL2 Ubuntu 22.04 on the same machine:
and the side-by-side comparison:
as a suggestion: the last column in also, why in this chart are the and wow, on the same machine i'm surprised that there is such a large and consistent difference in both memory and disk bandwidth in favor of WSL2! |
another direct comparison of MS Windows 10 and WSL2 Ubuntu 22.04 on the same machine (different from the one in the immediately preceding post):
and the side-by-side comparison:
WSL2 again is usually faster but not by as much, except for disk i/o where 3 of 4 tests are >10x faster! not sure how to explain the DiskWrite1MB outlier in WSL2. |
Nvidia Grace Hopper GH200:
|
I don't know what the M2 has, but my GH200 has 480 GB of LPDDR5X with a peak memory bandwidth of 384 GB/s (reference: https://docs.nvidia.com/gh200-benchmarking-guide.pdf) |
According to Wikipedia, an M2 Max (tested above) has 400 GB/s LPDDR5 (I was wrong about LPDDR6). So they should be pretty similar - what really confuses me is why the 100MB-chunk throughput seems so low on the GH200. |
I think the single-core math performance on the GH200 is very encouraging though. The Apple M-series is good, and given the lower CPU clock speed of the GH200 (correct?), those number look quite competitive, I'd say. |
@bjarthur , could you maybe re-run the M2-max benchmarks on Julia v1.10.0-rc3, for a closer comparison? |
According to https://docs.nvidia.com/gh200-benchmarking-guide.pdf the Grace CPU has 3.1 GHz of clock rate, but in
Apple is notoriously secretive about the clock speed of the Apple Silicon chips, according to https://apple.techable.com/specs/bto-cto-macbook-pro-m2-max-12-core-cpu-38-core-gpu-14-inch-2023 the M2 Max should have a clock rate of ~3.68 GHz, they probably ran some benchmarks to figure that number out. But yeah, it'd appear the Grace CPU is slightly lower powered than the M2 Max. |
sure. here you go. a couple mem/io results are slower now but still all faster than GH200:
and for comparison, i was curious how much slower low-power mode was:
and the side-by-side with pluggedin as the reference/left column:
not sure exactly what apple does when unplugged-- lower clock speed and/or use an efficiency core. would be nice if SystemBenchmarks.jl had a way to pin the thread to a specific core. |
Yes, though it may be more or less on par per clock, which is very nice to see. |
The bad |
It would be great to be able to compare performance across the many platforms being used.
If you're happy to share your benchmark information for comparison, please submit them here and they'll be added to the repo.
result.txt
thanks!
Edit: You can now collect all the results posted in this issue by running
Also they are periodically updated here https://docs.google.com/spreadsheets/d/15Ldyq4n9cflXPDR63CQe6QwJCWedjvo2vaYJ0w2hhYo/edit#gid=0
The text was updated successfully, but these errors were encountered: