-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Benchmark suite (ethcore-util) doesn't compile on ARM #596
Comments
https://github.com/ethcore/parity/pull/601 |
Alright, here's a result from a Cortex-A5 @1.7Ghz (Odroid C1, roughly equivalent to Rpi3 in 32-bit mode (benchmarked here and here)) How much CPU power does the client need to operate efficiently? running 7 tests
test u128_mul ... bench: 15,931,475 ns/iter (+/- 10,350)
test u256_add ... bench: 1,398,006 ns/iter (+/- 3,820)
test u256_full_mul ... bench: 212,997,800 ns/iter (+/- 124,790)
test u256_mul ... bench: 55,641,661 ns/iter (+/- 13,930)
test u256_sub ... bench: 5,301,924 ns/iter (+/- 2,965)
test u512_add ... bench: 2,744,213 ns/iter (+/- 3,001)
test u512_sub ... bench: 25,910,821 ns/iter (+/- 11,590)
running 7 tests
test bench_decode_nested_empty_lists ... bench: 3,315 ns/iter (+/- 28)
test bench_decode_u256_value ... bench: 802 ns/iter (+/- 4)
test bench_decode_u64_value ... bench: 423 ns/iter (+/- 1)
test bench_stream_1000_empty_lists ... bench: 87,578 ns/iter (+/- 627)
test bench_stream_nested_empty_lists ... bench: 4,355 ns/iter (+/- 64)
test bench_stream_u256_value ... bench: 6,594 ns/iter (+/- 48)
test bench_stream_u64_value ... bench: 3,019 ns/iter (+/- 54)
running 13 tests
test sha3x10000 ... bench: 162,216,662 ns/iter (+/- 8,040)
test trie_insertions_32_mir_1k ... bench: 253,371,890 ns/iter (+/- 1,008,384)
test trie_insertions_32_ran_1k ... bench: 251,711,482 ns/iter (+/- 1,307,956)
test trie_insertions_random_mid ... bench: 226,539,564 ns/iter (+/- 808,284)
test trie_insertions_six_high ... bench: 224,861,356 ns/iter (+/- 1,644,908)
test trie_insertions_six_low ... bench: 347,735,233 ns/iter (+/- 2,389,891)
test trie_insertions_six_mid ... bench: 250,915,378 ns/iter (+/- 1,386,046)
test triehash_insertions_32_mir_1k ... bench: 35,852,568 ns/iter (+/- 61,470)
test triehash_insertions_32_ran_1k ... bench: 35,423,266 ns/iter (+/- 35,120)
test triehash_insertions_random_mid ... bench: 19,411,191 ns/iter (+/- 20,780)
test triehash_insertions_six_high ... bench: 24,293,714 ns/iter (+/- 18,900)
test triehash_insertions_six_low ... bench: 35,667,967 ns/iter (+/- 23,910)
test triehash_insertions_six_mid ... bench: 27,426,428 ns/iter (+/- 25,440) Thanks to https://github.com/ethcore/parity/pull/629, integer benchmarks got a huge boost: test u128_mul ... bench: 501,857 ns/iter (+/- 1,995)
test u256_add ... bench: 666,109 ns/iter (+/- 2,285)
test u256_full_mul ... bench: 21,314,517 ns/iter (+/- 138,552)
test u256_mul ... bench: 1,035,615 ns/iter (+/- 3,510)
test u256_sub ... bench: 666,109 ns/iter (+/- 1,700)
test u512_add ... bench: 1,264,418 ns/iter (+/- 3,510)
test u512_sub ... bench: 1,295,819 ns/iter (+/- 4,310) |
it's about 15x-30x slower than modern i7 |
It's running rather well, using the 4 available cores - however, compared to a 2.2 Ghz Core2 system, on the same 20Mbit link it seems bottlenecked at the computing power level. The Core2 result (3-4x faster) for reference: running 7 tests
test u128_mul ... bench: 4,809,948 ns/iter (+/- 31,625)
test u256_add ... bench: 385,760 ns/iter (+/- 3,026)
test u256_full_mul ... bench: 49,414,181 ns/iter (+/- 1,102,323)
test u256_mul ... bench: 14,072,564 ns/iter (+/- 591,316)
test u256_sub ... bench: 1,470,511 ns/iter (+/- 39,705)
test u512_add ... bench: 510,531 ns/iter (+/- 8,871)
test u512_sub ... bench: 5,726,031 ns/iter (+/- 186,089)
running 7 tests
test bench_decode_nested_empty_lists ... bench: 1,086 ns/iter (+/- 17)
test bench_decode_u256_value ... bench: 266 ns/iter (+/- 6)
test bench_decode_u64_value ... bench: 108 ns/iter (+/- 1)
test bench_stream_1000_empty_lists ... bench: 24,281 ns/iter (+/- 1,013)
test bench_stream_nested_empty_lists ... bench: 1,641 ns/iter (+/- 25)
test bench_stream_u256_value ... bench: 2,095 ns/iter (+/- 10)
test bench_stream_u64_value ... bench: 991 ns/iter (+/- 8)
running 13 tests
test sha3x10000 ... bench: 56,082,878 ns/iter (+/- 18,015)
test trie_insertions_32_mir_1k ... bench: 83,536,621 ns/iter (+/- 378,935)
test trie_insertions_32_ran_1k ... bench: 83,057,168 ns/iter (+/- 412,792)
test trie_insertions_random_mid ... bench: 74,751,062 ns/iter (+/- 361,463)
test trie_insertions_six_high ... bench: 74,554,606 ns/iter (+/- 259,211)
test trie_insertions_six_low ... bench: 113,039,744 ns/iter (+/- 568,131)
test trie_insertions_six_mid ... bench: 83,514,020 ns/iter (+/- 348,722)
test triehash_insertions_32_mir_1k ... bench: 12,188,945 ns/iter (+/- 20,512)
test triehash_insertions_32_ran_1k ... bench: 12,091,118 ns/iter (+/- 20,645)
test triehash_insertions_random_mid ... bench: 6,762,954 ns/iter (+/- 15,392)
test triehash_insertions_six_high ... bench: 8,583,235 ns/iter (+/- 13,449)
test triehash_insertions_six_low ... bench: 12,384,996 ns/iter (+/- 19,236)
test triehash_insertions_six_mid ... bench: 9,634,772 ns/iter (+/- 16,575) The ARM system is using just 3W, I think we have a clear winner. |
I've got another result, this time from an running 7 tests
test u128_mul ... bench: 312,769 ns/iter (+/- 461)
test u256_add ... bench: 120,151 ns/iter (+/- 850)
test u256_full_mul ... bench: 11,261,102 ns/iter (+/- 88,151)
test u256_mul ... bench: 466,504 ns/iter (+/- 28,125)
test u256_sub ... bench: 135,008 ns/iter (+/- 3,092)
test u512_add ... bench: 139,867 ns/iter (+/- 552)
test u512_sub ... bench: 140,992 ns/iter (+/- 6,470)
running 7 tests
test bench_decode_nested_empty_lists ... bench: 2,374 ns/iter (+/- 26)
test bench_decode_u256_value ... bench: 455 ns/iter (+/- 1)
test bench_decode_u64_value ... bench: 243 ns/iter (+/- 2)
test bench_stream_1000_empty_lists ... bench: 76,738 ns/iter (+/- 138)
test bench_stream_nested_empty_lists ... bench: 2,242 ns/iter (+/- 5)
test bench_stream_u256_value ... bench: 3,689 ns/iter (+/- 10)
test bench_stream_u64_value ... bench: 1,275 ns/iter (+/- 6)
running 13 tests
test sha3x10000 ... bench: 14,207,328 ns/iter (+/- 5,440)
test trie_insertions_32_mir_1k ... bench: 54,919,297 ns/iter (+/- 1,198,320)
test trie_insertions_32_ran_1k ... bench: 54,391,393 ns/iter (+/- 1,182,720)
test trie_insertions_random_mid ... bench: 58,116,126 ns/iter (+/- 805,007)
test trie_insertions_six_high ... bench: 48,579,540 ns/iter (+/- 88,940)
test trie_insertions_six_low ... bench: 98,693,094 ns/iter (+/- 299,943)
test trie_insertions_six_mid ... bench: 67,177,608 ns/iter (+/- 522,415)
test triehash_insertions_32_mir_1k ... bench: 7,846,771 ns/iter (+/- 105,636)
test triehash_insertions_32_ran_1k ... bench: 7,735,220 ns/iter (+/- 46,381)
test triehash_insertions_random_mid ... bench: 5,213,247 ns/iter (+/- 25,770)
test triehash_insertions_six_high ... bench: 6,262,506 ns/iter (+/- 43,525)
test triehash_insertions_six_low ... bench: 8,637,428 ns/iter (+/- 18,110)
test triehash_insertions_six_mid ... bench: 6,935,012 ns/iter (+/- 37,696) Probably not exactly comparable, as some results are relatively too good, but here's the current performance on 64-bit ARM. |
Looks like an import problem. Parity itself compiles fine.
I was using Rust 1.8 from Feb 28th.
The text was updated successfully, but these errors were encountered: