Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark for addressing tiny elementwise dispatch issues #10216

Closed
wants to merge 1 commit into from

Conversation

hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Aug 25, 2022

No description provided.

@iree-github-actions-bot
Copy link
Contributor

iree-github-actions-bot commented Aug 25, 2022

Abbreviated Benchmark Summary

@ commit 3956a1ebdef8291accd5dcce93e8ee31a92206d2 (vs. base acb7355b1e80495fc63dd8d2358ca5b821f5eb8a)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad [fp16] (TFLite) full-inference,experimental-flags with IREE-Vulkan @ Pixel-6-Pro (GPU-Mali-G78) 154.318 (vs. 130.492, 18.26%↑) 154.731 3.240
MobileNetV3Small [fp32,imagenet] (TFLite) 4-thread,big-core,full-inference,experimental-flags with IREE-LLVM-CPU @ Pixel-4 (CPU-ARMv8.2-A) 10.870 (vs. 9.638, 12.78%↑) 10.873 0.028
MobileBertSquad [fp16] (TFLite) full-inference,default-flags with IREE-Vulkan @ Pixel-6-Pro (GPU-Mali-G78) 139.878 (vs. 130.053, 7.55%↑) 138.348 5.972

[Top 3 out of 6 results showed]

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad [int8] (TFLite) 4-thread,big-core,full-inference,experimental-flags with IREE-LLVM-CPU @ Pixel-4 (CPU-ARMv8.2-A) 199.307 (vs. 236.307, 15.66%↓) 199.285 0.406
MobileBertSquad [fp32] (TFLite) 4-thread,big-core,full-inference,experimental-flags with IREE-LLVM-CPU @ Pixel-6-Pro (CPU-ARMv8.2-A) 312.317 (vs. 370.020, 15.59%↓) 297.569 37.892
MobileNetV2 [fp32,imagenet] (TFLite) big-core,full-inference,default-flags with IREE-LLVM-CPU-Sync @ Pixel-4 (CPU-ARMv8.2-A) 38.029 (vs. 42.592, 10.71%↓) 35.570 3.277

[Top 3 out of 9 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

@iree-github-actions-bot
Copy link
Contributor

Abbreviated Linux Benchmark Summary

@ commit 3956a1ebdef8291accd5dcce93e8ee31a92206d2 (vs. base 094ec6d183e769893ae6e9ee98e668712a19dd24)

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
PoseNet [fp32] (TFLite) 8-thread,full-inference,default-flags with IREE-LLVM-CPU @ GCP-c2-standard-16 (CPU-x86\_64-CascadeLake) 6.268 (vs. 8.382, 25.22%↓) 6.277 0.034
PoseNet [fp32] (TFLite) 4-thread,full-inference,default-flags with IREE-LLVM-CPU @ GCP-c2-standard-16 (CPU-x86\_64-CascadeLake) 7.976 (vs. 10.196, 21.78%↓) 7.970 0.038
PersonDetect [int8] (TFLite) 8-thread,full-inference,default-flags with IREE-LLVM-CPU @ GCP-c2-standard-16 (CPU-x86\_64-CascadeLake) 1.647 (vs. 1.972, 16.47%↓) 1.647 0.004

[Top 3 out of 7 results showed]

Regressed Compilation Times 🚩

Benchmark Name Compilation Time (ms)
MobileBertSquad [int8] (TFLite) CPU-RV32-Generic full-inference,default-flags 10999718 (vs. 1369401, 703.25%↑)
MobileBertSquad [int8] (TFLite) CPU-RV64-Generic full-inference,default-flags 395836 (vs. 146289, 170.58%↑)
EfficientNet [int8] (TFLite) CPU-RV32-Generic full-inference,default-flags 17652 (vs. 15420, 14.47%↑)

[Top 3 out of 14 results showed]

Improved Compilation Times 🎉

Benchmark Name Compilation Time (ms)
PoseNet [fp32] (TFLite) CPU-x86\_64-CascadeLake 8-thread,full-inference,default-flags 8767 (vs. 10332, 15.15%↓)
PoseNet [fp32] (TFLite) CPU-x86\_64-CascadeLake full-inference,default-flags 8767 (vs. 10332, 15.15%↓)
PoseNet [fp32] (TFLite) CPU-x86\_64-CascadeLake 4-thread,full-inference,default-flags 8767 (vs. 10332, 15.15%↓)

[Top 3 out of 12 results showed]

Regressed Total Dispatch Sizes 🚩

Benchmark Name Total Dispatch Size (bytes)
MobileBertSquad [int8] (TFLite) CPU-RV32-Generic full-inference,default-flags 84601208 (vs. 22515640, 275.74%↑)
MobileBertSquad [int8] (TFLite) CPU-RV64-Generic full-inference,default-flags 11305392 (vs. 4133184, 173.53%↑)
EfficientNet [int8] (TFLite) CPU-RV32-Generic full-inference,default-flags 749980 (vs. 610652, 22.82%↑)

[Top 3 out of 9 results showed]

For more information:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
(deprecated) buildkite:benchmark-android Deprecated. Please use benchmarks:android-*
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants