Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "-mcpu=native" when building for aarch64 #532

Merged
merged 1 commit into from
Feb 27, 2023

Conversation

FlippFuzz
Copy link
Contributor

Performance test was done in #89 (comment) and #89 (comment)

  1. While the test was done only for Ampere A1 on Oracle Cloud, there's a recommendation from ARM to just set -mcpu=native. We might as well do it for all ARM CPUs.
    https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu

  2. On Ampere A1 on Oracle Cloud, FP16 will be enabled with -mcpu=native which results in large performance gains.

  3. If it's not acceptable to do this for all ARM CPUs, I can add an ifdef WHISPER_AMPERE_A1 check before enabling -mcpu=native.

ARM CPUs aren't very good at reporting their names and cannot easily identify it.

cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

@jaybinks
Copy link
Contributor

jaybinks commented Feb 25, 2023

Can confirm the same in my Ampere At in Oracle Cloud.

unmodified checkout :
``ubuntu@instance-20230225-1302:~/src/whisper.cpp$ ./main -m models/ggml-medium.bin -f samples/jfk.wav
whisper_init_from_file: loading model from 'models/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 4
whisper_model_load: mem required = 1720.00 MB (+ 43.00 MB per decoder)
whisper_model_load: kv self size = 42.00 MB
whisper_model_load: kv cross size = 140.62 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 1462.35 MB
whisper_model_load: model size = 1462.12 MB

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:11.000] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: load time = 917.15 ms
whisper_print_timings: mel time = 78.78 ms
whisper_print_timings: sample time = 21.20 ms / 28 runs ( 0.76 ms per run)
whisper_print_timings: encode time = 60665.24 ms / 1 runs (60665.24 ms per run)
whisper_print_timings: decode time = 2004.89 ms / 28 runs ( 71.60 ms per run)
whisper_print_timings: total time = 63755.75 ms
``

using -mcpu=native :
``ubuntu@instance-20230225-1302:~/src/whisper.cpp$ ./main -m models/ggml-medium.bin -f samples/jfk.wav
whisper_init_from_file: loading model from 'models/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 4
whisper_model_load: mem required = 1720.00 MB (+ 43.00 MB per decoder)
whisper_model_load: kv self size = 42.00 MB
whisper_model_load: kv cross size = 140.62 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 1462.35 MB
whisper_model_load: model size = 1462.12 MB

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:11.000] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: load time = 836.11 ms
whisper_print_timings: mel time = 79.38 ms
whisper_print_timings: sample time = 21.25 ms / 28 runs ( 0.76 ms per run)
whisper_print_timings: encode time = 23294.16 ms / 1 runs (23294.16 ms per run)
whisper_print_timings: decode time = 1188.38 ms / 28 runs ( 42.44 ms per run)
whisper_print_timings: total time = 25482.80 ms``

@ggerganov ggerganov merged commit f420de1 into ggerganov:master Feb 27, 2023
mattsta pushed a commit to mattsta/whisper.cpp that referenced this pull request Apr 1, 2023
anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request Apr 28, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants