Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper.swiftui : add model download list & bench methods #2546

Merged
merged 10 commits into from
Nov 13, 2024

Conversation

jhen0409
Copy link
Contributor

@jhen0409 jhen0409 commented Nov 10, 2024

This PR fixes the build of whisper.swiftui, and then add model download list (from ggerganov/whisper.cpp HF repo) and bench methods.

The bench method output format is following format from scripts/bench-all.sh:

printf "| %6s | %6s | %16s | %13s | %3s | %3s | %7s | %7s | %7s | %7s | %7s |\n" "CPU" "OS" "Config" "Model" "Th" "FA" "Enc." "Dec." "Bch5" "PP" "Commit"
printf "| %6s | %6s | %16s | %13s | %3s | %3s | %7s | %7s | %7s | %7s | %7s |\n" "---" "---" "---" "---" "---" "---" "---" "---" "---" "---" "---"

New Buttons:

  • View Models: Navigate to model list (We can download model & load downloaded model)
  • Bench (bench_full for current loaded model)
  • Bench All (bench_full for downloaded models)
  • Copy Logs

The memcpy and ggml_mul_mat methods are added but commented.

The demo looks like in iPhone 15 Pro:

Benchmark result in iPhone 15 Pro (A17 Pro):

memcpy and ggml_mul_mat:

Running memcpy benchmark
memcpy:   14.97 GB/s (heat-up)
memcpy:   21.46 GB/s ( 1 thread)
memcpy:   20.66 GB/s ( 1 thread)
memcpy:   22.52 GB/s ( 2 thread)
memcpy:   22.57 GB/s ( 3 thread)
memcpy:   22.46 GB/s ( 4 thread)
sum:    -3072001183.000000

Running ggml_mul_mat benchmark with 4 threads
  64 x   64: Q4_0     4.1 GFLOPS (128 runs) | Q4_1     3.7 GFLOPS (128 runs)
  64 x   64: Q5_0     4.5 GFLOPS (128 runs) | Q5_1     4.6 GFLOPS (128 runs) | Q8_0     4.0 GFLOPS (128 runs)
  64 x   64: F16      4.4 GFLOPS (128 runs) | F32      4.7 GFLOPS (128 runs)
 128 x  128: Q4_0    28.2 GFLOPS (128 runs) | Q4_1    25.8 GFLOPS (128 runs)
 128 x  128: Q5_0    20.4 GFLOPS (128 runs) | Q5_1    26.4 GFLOPS (128 runs) | Q8_0    28.9 GFLOPS (128 runs)
 128 x  128: F16     16.7 GFLOPS (128 runs) | F32     31.5 GFLOPS (128 runs)
 256 x  256: Q4_0    89.1 GFLOPS (128 runs) | Q4_1    84.1 GFLOPS (128 runs)
 256 x  256: Q5_0    70.1 GFLOPS (128 runs) | Q5_1    62.5 GFLOPS (128 runs) | Q8_0   103.2 GFLOPS (128 runs)
 256 x  256: F16     66.6 GFLOPS (128 runs) | F32     73.7 GFLOPS (128 runs)
 512 x  512: Q4_0   108.0 GFLOPS (128 runs) | Q4_1   101.0 GFLOPS (128 runs)
 512 x  512: Q5_0    83.1 GFLOPS (128 runs) | Q5_1    74.1 GFLOPS (128 runs) | Q8_0   130.6 GFLOPS (128 runs)
 512 x  512: F16     73.1 GFLOPS (128 runs) | F32     83.2 GFLOPS (128 runs)
1024 x 1024: Q4_0   116.6 GFLOPS ( 55 runs) | Q4_1   107.2 GFLOPS ( 50 runs)
1024 x 1024: Q5_0    87.2 GFLOPS ( 41 runs) | Q5_1    78.7 GFLOPS ( 37 runs) | Q8_0   139.8 GFLOPS ( 66 runs)
1024 x 1024: F16     74.2 GFLOPS ( 35 runs) | F32     76.5 GFLOPS ( 36 runs)
2048 x 2048: Q4_0   120.5 GFLOPS (  8 runs) | Q4_1   110.1 GFLOPS (  7 runs)
2048 x 2048: Q5_0    89.0 GFLOPS (  6 runs) | Q5_1    80.4 GFLOPS (  5 runs) | Q8_0   143.2 GFLOPS (  9 runs)
2048 x 2048: F16     74.0 GFLOPS (  5 runs) | F32     67.5 GFLOPS (  4 runs)
4096 x 4096: Q4_0   119.0 GFLOPS (  3 runs) | Q4_1   107.6 GFLOPS (  3 runs)
4096 x 4096: Q5_0    88.2 GFLOPS (  3 runs) | Q5_1    78.7 GFLOPS (  3 runs) | Q8_0   138.9 GFLOPS (  3 runs)
4096 x 4096: F16     69.1 GFLOPS (  3 runs) | F32     56.1 GFLOPS (  3 runs)

full:

CPU OS Config Model Th FA Enc. Dec. Bch5 PP Commit
A17 Pro iOS NEON METAL tiny 4 1 50.00 4.47 1.17 0.07 31aea56
A17 Pro iOS NEON METAL tiny-q5_1 4 1 52.75 3.88 1.34 0.07 31aea56
A17 Pro iOS NEON METAL tiny-q8_0 4 1 50.81 3.79 1.13 0.07 31aea56
A17 Pro iOS NEON METAL tiny.en 4 1 50.48 4.69 1.64 0.07 31aea56
A17 Pro iOS NEON METAL tiny.en-q5_1 4 1 52.39 3.88 1.55 0.07 31aea56
A17 Pro iOS NEON METAL tiny.en-q8_0 4 1 51.07 3.78 1.08 0.07 31aea56
A17 Pro iOS NEON METAL base 4 1 103.23 6.06 1.32 0.11 31aea56
A17 Pro iOS NEON METAL base-q5_1 4 1 108.43 4.71 1.72 0.11 31aea56
A17 Pro iOS NEON METAL base-q8_0 4 1 104.34 5.46 1.36 0.11 31aea56
A17 Pro iOS NEON METAL base.en 4 1 102.70 6.00 1.39 0.11 31aea56
A17 Pro iOS NEON METAL base.en-q5_1 4 1 109.11 5.22 1.44 0.11 31aea56
A17 Pro iOS NEON METAL base.en-q8_0 4 1 104.17 5.13 1.55 0.11 31aea56
A17 Pro iOS NEON METAL small 4 1 377.67 11.65 2.78 0.41 31aea56
A17 Pro iOS NEON METAL small-q5_1 4 1 366.79 8.56 2.83 0.33 31aea56
A17 Pro iOS NEON METAL small-q8_0 4 1 354.77 8.90 2.61 0.32 31aea56
A17 Pro iOS NEON METAL small.en 4 1 351.71 11.74 2.78 0.32 31aea56
A17 Pro iOS NEON METAL small.en-q5_1 4 1 391.11 8.34 2.74 0.36 31aea56
A17 Pro iOS NEON METAL small.en-q8_0 4 1 411.22 8.96 2.65 0.43 31aea56
A17 Pro iOS NEON METAL medium 4 1 1236.87 29.46 8.70 1.18 31aea56
A17 Pro iOS NEON METAL medium-q5_0 4 1 1400.28 19.67 8.47 1.31 31aea56
A17 Pro iOS NEON METAL medium-q8_0 4 1 1347.07 22.92 9.75 1.25 31aea56
A17 Pro iOS NEON METAL medium.en 4 1 1307.12 30.41 9.87 1.23 31aea56
A17 Pro iOS NEON METAL medium.en-q5_0 4 1 1426.46 20.24 10.36 1.32 31aea56
A17 Pro iOS NEON METAL medium.en-q8_0 4 1 1359.09 22.92 10.04 1.24 31aea56
A17 Pro iOS NEON METAL large-v3-turbo 4 1 2752.62 10.63 2.60 0.54 31aea56
A17 Pro iOS NEON METAL large-v3-turbo-q5_0 4 1 2617.94 6.44 2.64 0.60 31aea56
A17 Pro iOS NEON METAL large-v3-turbo-q8_0 4 1 2494.79 7.20 2.63 0.69 31aea56

Skipped large-v1 ~ v3 downloads

@ggerganov ggerganov merged commit 5f8a086 into ggerganov:master Nov 13, 2024
45 checks passed
@jhen0409 jhen0409 deleted the swiftui-bench branch November 14, 2024 02:21
@ggerganov
Copy link
Owner

@jhen0409 I think this change breaks the builds on MacOS:

image

Can you propose a fix?

@jhen0409
Copy link
Contributor Author

@jhen0409 I think this change breaks the builds on MacOS:

image

Can you propose a fix?

We can change the macOS destination to Mac Catalyst or Designed for iPad.

PR: #2562

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants