Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) #64

Closed
zhouwg opened this issue Feb 22, 2024 · 20 comments
Assignees

Comments

@zhouwg
Copy link
Owner

zhouwg commented Feb 22, 2024

whisper.cpp is an open-source and powerful device-side AI framework/lib/model for ASR(Automatic Speech Recognition, a sub-filed of AI).

I want to integrate the great and powerful whisper.cpp to KanTV for purpose of real-time English subtitle with English online TV on Xiaomi 14.

just looks like the following snapshots by Xiaomi 14's powerful proprietary 6B device-side AI model(aka XiaoAI, or Chinese "小爱") + Xiaomi 14's powerful mobile SoC------Qualcomm SM8650-AB Snapdragon 8 Gen 3 (4 nm).

1813200371
2074025952

@zhouwg zhouwg changed the title integrate the great and powerful whisper.cpp to KanTV for purpose of real-time subtitle with online TV integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV Feb 22, 2024
@zhouwg zhouwg changed the title integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV PoC:integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV Feb 26, 2024
@zhouwg
Copy link
Owner Author

zhouwg commented Mar 4, 2024

I will start integrating the excellent and amazing whisper.cpp to project KanTV since March 5,2024 after v1.2.9 was released on March 4, 2024 and before that I had been spent about two weeks to migrate some local personal projects to github since Feb 22,2024.

background study:

GGML is a C library for machine learning (ML): https://github.com/rustformers/llm/blob/main/crates/ggml/README.md

Roadmap and FAQ: ggerganov/whisper.cpp#126

Android example app: ggerganov/whisper.cpp#283

whisper.cpp should support NNAPI on Android: ggerganov/whisper.cpp#1249

Android Inference is too slow: ggerganov/whisper.cpp#1070

Use Android NNAPI to accelerate inference on Android Devices: ggerganov/ggml#88

NPU support in whisper.cpp:ggerganov/whisper.cpp#1557

Support for realtime audio input: ggerganov/whisper.cpp#10

The Whisper model processes the audio in chunks of 30 seconds - this is a hard constraint of the architecture.

However, what seems to work is you can take for example 5 seconds of audio and pad it with 25 seconds of silence. This way you can process shorter chunks.

silence removal for transcription implemented: ggerganov/whisper.cpp#1649

Can real-time transcription be achieved?: ggerganov/whisper.cpp#1653

How to increase speech to text speed when using whisper cpp?:ggerganov/whisper.cpp#1635

Benchmark results: ggerganov/whisper.cpp#89

Whisper model files in custom ggml format: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md

GGUF file format specification:

Google Highway: https://github.com/google/highway

Tencent ncnn: https://github.com/Tencent/ncnn

updated on 03-13-2024:

SIGFPE on certain audio files:
ggerganov/whisper.cpp#39

Real-time identification of microphone has no result:
sandrohanea/whisper.net#155

How to handle real-time sound streams:
sandrohanea/whisper.net#25

updated on 03-20-2024,
Finetuning models for audio_ctx support(VERY important but bring side-effect: 4cd35dd)
ggerganov/whisper.cpp#1951

here are some strategies from original author to reduce repetition and hallucinations:
ggerganov/whisper.cpp#1507

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 5, 2024

integrate whisper.cpp to KanTV step1:(just migrate original Android sample in official whisper.cpp to KanTV and study something accordingly)

1239754793

How to practise/play with this branch:

adb logcat | grep KANTV
(logs in Java layer / JNI layer / Native layer would be displayed with same prefix properly, so it's helpful for troubleshooting and tracking source codes in whisper.cpp)

Screenshot from 2024-03-05 21-56-25

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 5, 2024

move to #62 to avoid misunderstanding

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 6, 2024

I suddenly got an idea to implement PoC - stage1 after study source code in examples/bench/bench.cpp and examples/main/main.cpp.

    PoC stage-1
  • background study
  • high-level study of whisper.cpp( structure of source code, build system of source code, .......)
  • migrate original Android example to KanTV
  • reuse code in kantv-core and add a simple event handling framework for ASRResearchFragment.java to make further work easier

If it works well as expected, I'll move to PoC - stage2(whispercpp inference with pre-loadded audio file by another method, referenced with original Android sample and examples/main/main.cpp)

    PoC stage-2
  • re-write(or "refine" would be accurate) Java part & JNI part of whisper.cpp JNI, referenced with original Android sample
  • re-implement native part of whisper.cpp JNI by a new method base on PoC stage-1, some codes referenced with examples/main/main.cpp
  • reuse code in kantv-core and spent some time to integrate customized FFmpeg within kantv-core to whisper.cpp and it would/might be heavily used/very helpful in the future

if it works well as expected,I'll merge previous works to master branch and create a new branch/baseline accordingly. then I'll move to PoC-stage3(ASR with Live stream -- aka online English TV). PoC - stage3 would be a real challenge for me so I would breakdown it like PoC-S31/PoC-S32/PoC-S33....accordingly

    PoC stage-3
  • PoC - S31 : investigate performance of mulmat and inference(study internal detail of whisper.cpp)
  • PoC - S32 : high level design (HLD) of real-time subtitle by whisper.cpp in KanTV
  • PoC - S33: coding work of data path: UI <----> JNI <----> whisper.cpp <----> kantv-play(original name is libkantv-ffmpeg.so, rename it to libkantv-play.so since v1.3.0 because the real codes of customized FFmpeg is statically linked in libkantv-core.so, so rename it to libkantv-play.so to avoid confusion) <----> kantv-core. this step is just like design/coding a pure virtual function and correspond virtual function by C++ and every node in the data path should works fine as expected
  • PoC - S34: reuse code in kantv-core and coding work of implementation of audio only record mode
  • PoC - S35: whispercpp inference/predication with live stream(online English TV) base on PoC-S33 & PoC-S34

if it works well as expected,I'll move to PoC-stage4(performance analysis and optimization on Android phone).

    PoC stage-4
  • build optimization
  • code optimization
  • arch/synthesis optimization
  • algorithm optimization
  • assemble optimization
  • bug fix
  • pre-alpha release (updated on 03-17-2024, done on master branch or referenced in branch v1.3.2)
  • alpha release

the PoC stage-3 and PoC stage-4 might be taken place simultaneously.

the final goal is implement real-time English subtitle for English online-TV by KanTV + customized whisper.cpp and I'll demo it on Xiaomi 14(because Xiaomi 14 contains a very powerful mobile SoC and I personally purchased one for purpose of software development).

of course, source code of customized whisper.cpp will be found in this Android turn-key project. if it's considered well and accepted by upstream whisper.cpp, I'll submit a PR accordingly.

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 7, 2024

it works as expected(PoC stage1 was finished).

887421330
1643291615
1345536493

@dinvlad
Copy link

dinvlad commented Mar 7, 2024

How's the performance? Can this be used for real-time transcription on a reasonably old device?

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 8, 2024

How's the performance? Can this be used for real-time transcription on a reasonably old device?

Benchmark of whisper.cpp/GGML's mulmat(matrix computing) seems not good on low-end Android phone.

But the performance of whisper.cpp/GGML on iOS is very good because the original author of whisper.cpp/GGML(the great Georgi Gerganov) spent much time to optimize them with Apple's dedicated machine learning library(just similar to SSE2/SSE3/AVX optimization on X86 architecture).

I think whisper.cpp could be used for real-time subtitle with online TV on high-end Android phone(such as Xiaomi 14, I will demo it later after finish this PoC successfully) and this is the goal of this PoC(this opening issue).

Whisper.cpp/GGML might be not reasonable for old device because complicated math computing need powerful SoC or highly optimized code(just like what Georgi did for iOS/Mac platform).

@liam-mceneaney has provide an Android example to demo transcription.

BTW, the following loop from GGML's official website would be helpful for more information:

ggerganov/whisper.cpp#283

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 8, 2024

it works as expected(PoC stage2 was finished).

91136480

1928721132

48922803

the above screenshots can't illustrate any exciting progress in this commit(I'd like to say this is a big milestone and express my sincerely thanks for the great whisper.cpp/GGML again at the moment:I have to say that the more I understand/familiar from whisper.cpp the more feeling I think we all should thanks for the great whisper.cpp/GGML).

or built the APK from source code(branch kantv-poc-with-whispercpp) by Android Studio IDE accordingly.

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 10, 2024

ASR/transcription performance on Xiaomi 14 is about 5x-20x better then other Android phone(low-end phone from vivo, huawei's honor ------ now it's a standalone company), but it's still not enough for purpose of real-time subtitle with online TV.

1211427351

48922803

2061477083

2057664571

transcription performance can be improved by about 1-3 seconds when enable openblas(1-3 depend on OS load / process sched / ...... ).

performance of mulmat benchmark seems be improved a lot / significantly when enable openblas.

so I guess Apple's dedicated machine learning acceleration library mightbe very important for performance on iOS/Mac.just like Georgi Gerganov said before. and we should/might study something about Qualcomm's dedicated/proprietary machine learning acceleration library accordingly.

Screenshot from 2024-03-10 21-57-20

@zhouwg zhouwg mentioned this issue Mar 10, 2024
@zhouwg
Copy link
Owner Author

zhouwg commented Mar 10, 2024

updated on 03-10-2024(2024-03-10, 23:41 Beijing Time / GMT + 8):

Screenshot from 2024-03-10 23-17-28

1228365447

from 21 seconds to 3 seconds, thanks to the powerful Xiaomi 14 or Qualcomm's Snapdragon 8 Gen 3, thanks to the powerful modern compiler from Google. I'd like to say once again at the moment:we all should thanks for the great GGML: the open source C/C++ whisper.cpp & llama.cpp has really changed our world.

I think I got the point although the performance of ASR is still not enough for real-time subtitle with online TV. we should maximize the use of the AI engine in Qualcomm's Snapdragon 8 Gen 3 to improve ASR performance more better.

Screenshot from 2024-03-10 21-57-20

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 11, 2024

updated on 03-11-2024(2024-03-11,10:40) Beijing Time / GMT + 8

1210108450

less than 2 seconds for the first time.

commit could be found here.

this is exactly one item in the breakdown task list. because now is 2024(not 1994) and we should trust powerful modern compiler from Google and Linaro by top talents in our planet.

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 11, 2024

next-step is coding work of PoC-S33 although ASR performance is still not enough(the AI engine in Qualcomm's mobile SoC is not utilized) for real-time subtitle but it's improved a lot now.

sincerely thanks for key-point of code snippets of how to transcribe a single audio data by whisper.cpp

from @liam-mceneaney:

https://github.com/ggerganov/whisper.cpp/blob/19b8436ef11bd05201d650c8e08193009ec6bb3c/examples/whisper.android/lib/src/main/jni/whisper/jni.c#L197


 // The below adapted from the Objective-C iOS sample
    struct whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    params.print_realtime = true;
    params.print_progress = false;
    params.print_timestamps = true;
    params.print_special = false;
    params.translate = false;
    params.language = "en";
    params.n_threads = num_threads; //how many threads can I use on an S23?
    //potentially use an initial prompt for custom vocabularies?
    // initial_prompt: Optional[str]
    //        Optional text to provide as a prompt for the first window. This can be used to provide, or
    //        "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns
    //        to make it more likely to predict those word correctly.
    //params.initial_prompt = "Transcription of Tactical Combat Casualty Drugs such as Fentanyl, Ibuprofen, Amoxicillin, Epinephrine, TXA, Hextend, Ketamine, Oral Transmucosal Fentanyl Citrate. ";
    params.offset_ms = 0;
    params.no_context = true;
    <b>params.single_segment   = true; //hard code for true, objc example has it based on a button press</b>
    params.no_timestamps    = params.single_segment; //from streaming objc example

    whisper_reset_timings(context);

    LOGI("About to run whisper_full");
    if (whisper_full(context, params, audio_data_arr, audio_data_length) != 0) {
        LOGI("Failed to run the model");
    } else {
        whisper_print_timings(context);
    }

or from original author of whisper.cpp @ggerganov:

https://github.com/ggerganov/whisper.cpp/blob/master/examples/whisper.objc/whisper.objc/ViewController.m#L186

// dispatch the model to a background thread
    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
        // process captured audio
        // convert I16 to F32
        for (int i = 0; i < self->stateInp.n_samples; i++) {
            self->stateInp.audioBufferF32[i] = (float)self->stateInp.audioBufferI16[i] / 32768.0f;
        }

        // run the model
        struct whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);

        // get maximum number of threads on this device (max 8)
        const int max_threads = MIN(8, (int)[[NSProcessInfo processInfo] processorCount]);

        params.print_realtime   = true;
        params.print_progress   = false;
        params.print_timestamps = true;
        params.print_special    = false;
        params.translate        = false;
        params.language         = "en";
        params.n_threads        = max_threads;
        params.offset_ms        = 0;
        params.no_context       = true;
        <b>params.single_segment   = self->stateInp.isRealtime;</b>
        params.no_timestamps    = params.single_segment;

        CFTimeInterval startTime = CACurrentMediaTime();

        whisper_reset_timings(self->stateInp.ctx);

        if (whisper_full(self->stateInp.ctx, params, self->stateInp.audioBufferF32, self->stateInp.n_samples) != 0) {
            NSLog(@"Failed to run the model");
            self->_textviewResult.text = @"Failed to run the model";

            return;
        }

        whisper_print_timings(self->stateInp.ctx);

        CFTimeInterval endTime = CACurrentMediaTime();

        NSLog(@"\nProcessing time: %5.3f, on %d threads", endTime - startTime, params.n_threads);

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 11, 2024

updated on 03-10-2024(2024-03-10, 23:41 Beijing Time / GMT + 8):

Screenshot from 2024-03-10 23-17-28

1228365447

from 21 seconds to 3 seconds, thanks to the powerful Xiaomi 14 or Qualcomm's Snapdragon 8 Gen 3, thanks to the powerful modern compiler from Google. I'd like to say once again at the moment:we all should thanks for the great GGML: the open source C/C++ whisper.cpp & llama.cpp has really changed our world.

I think I got the point although the performance of ASR is still not enough for real-time subtitle with online TV. we should maximize the use of the AI engine in Qualcomm's Snapdragon 8 Gen 3 to improve ASR performance more better.

Screenshot from 2024-03-10 21-57-20


clarification of why I said many times that we(programmers) all should thanks for the great GGML:
  • AI scientist is far away from programmer
  • similar to FFmpeg or think about FFmpeg and there is no doubt that FFmpeg is definitely a great open source project for any programmer in video/audio/streaming media field
  • the root cause of huge improvements of ASR performance on Xiaomi 14 is just because of highly elegant C/C++ implementation of whisper.cpp by Georgi Gerganov(powerful modern compiler don't works for common codes), I didn't made any substantial/hardcore work(my mean is that I did not made a real/hardcore contribution in source code of ggml.c / ggml-quants.c / whisper.cpp because I know very little about real/hardcore AI tech and I don't understand the details/mechanism in ggml.c / ggml-quants.c / whisper.cpp, or I did NOT touch anything core stuff in internal of ggml/whisper.cpp)
  • I never knew Georgi Gerganov until I heard about whisper.cpp recently and I've never overpraised wishierp.cpp as an Android system software programmer
  • what I said it's just personal feeling of the excellent and amazing whisper.cpp. of course, if anyone don't like what I said about whisper.cpp, I'd like to see another similar open source AI project(I tried PaddleSpeech but I gave up at last.I even think the deprecated Mozilla's DeepSpeech is more friendly to programmer than PaddleSpeech)

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 11, 2024

updated on 03-12-2024(2024-03-12,00:51)

1924953245
Screenshot from 2024-03-12 00-46-14

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 11, 2024

updated on 03-12-2024(2024-03-12,00:51)

1924953245 Screenshot from 2024-03-12 00-46-14

I will submit the method of new optimization(only works on Xiaomi 14) in the next commit accordingly.

I don't understand why performance with 4 threads is about 2x than performance with 8 threads(it should be 8 is 2x than 4) by same optimization method. what happened between 4 threads and 8 threads? what's the detail?
I really don't know how these models came from and what's difference between these models? AI is really a magic technology.Thanks for great whisper.cpp again.

updated on 03-12-2024(2024-03-12,21:01, Beijing Time / GMT +8):

new optimization method (ASR performance less then 1 second, the root cause is because of highly elegant/handcrafted C/C++ implementation of whisper.cpp, of course Google's NDK r26c is powerful and Xiaomi 14 / Qualcomm SM8650-AB Snapdragon 8 Gen 3 is also powerful) for Xiaomi14 could be found in this file or this commit . I think I had been finished coding work of PoC - S33: coding work of data path: UI <----> JNI <----> whisper.cpp <----> kantv-play <----> kantv-core.

some snapshots of demo in PoC-S33(third step in PoC stage-3):coding work of data path: UI <----> JNI <----> whisper.cpp <----> kantv-play <----> kantv-core. or built APK from source code with this branch manually, the generated APK should works fine on any mainstream Android phone(because special optimization for Xiaomi 14 is default disabled, of course could be enabled manually in this file.

1448231248
1622803621
616860757
489728102
757193925

we are getting closer and closer to the final goal of this POC.once again, I'd like to express sincerely thanks for the great whisper.cpp which it's really helpful for C/C++ programmer whom know very little about AI tech.

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 15, 2024

updated on 03-15-2024(2024-03-15,11:59, Beijing Time / GMT +8):

I spent about 10 days(10+ hours / day with self-motivated) to achieve following goal(and many other minor improvements of this project) since 03-05-2024. it's NOT worked perfectly but more closer and closer to the final goal of this PoC.

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6726

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/kantv/src/main/java/com/cdeos/kantv/ui/fragment/ASRResearchFragment.java

It should be finished in less then or just ONE WEEK as my initial estimation or planning(because I had been spent many days to investigate online TV recording and implemented it perfectly in the end of 2023 and I could re-use many codes for this PoC) . I'm sorry for this. The reason of delay:

  • I shouldn't waste time on standalone local branch which delayed development progress about 1-2 days
  • GFW: I have to say that GFW brought many troubles to development activity and network access to outside of China(such as Google) is very very very unstable.GFW might be delayed development progress about 1-3 days: I lack of a little lucky because China was in highly sensitive period in last 2-3 weeks.

updated on 03-22-2024,19:26, anyway, I paid the price and I really have NO negative thoughts of my great country because I think I'm familiar with history of the Ming dynasty and I know that running a large and complex country is NOT easy. BUT, at the same time, I respect the fact:I had been spent about RMB10,000(USD 1500-1600) to fix network issue caused by GFW as a common programmer. so, I will NOT delete above sentence accordingly.

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 16, 2024

updated on 03-16-2024(2024-03-16,13:28, Beijing Time / GMT + 8)

Finally, I did it(although NOT real "real-time subtitle" and bugfix is required currently) after solve a technical problem in the source code of customized whisper.cpp.

Parts of latest source codes could be found at(master branch is preferred for R&D development activity):

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java

This PoC only works on Xiaomi 14 currently. The reason is that Xiaomi 14 contains a very very very powerful mobile SoC(Qualcomm SM8650-AB Snapdragon 8 Gen 3 (4 nm) and the special build optimization method only works on Xiaomi 14.

Unexpected behavior such as ANR(Application Not Responding), .......app crash would be happened on other low-end Qualcomm mobile SoC driven Android phone.

mmexport1710566697248

IMG_20240316_183745

kantv-realtime-subtitle-demo-with-whispercpp.mp4

The benchmark of ASR performance on Xiaomi 14 with special build optimization is less then 1 second(about 700-800 millisecond) but the performance is about 5-7 secs in complicated real scenario such as TV playback and TV recording and ASR audio recording works at the same time.

Screenshot from 2024-03-16 17-12-24

@ggerganov, The hardware AI engine in Snapdragon 8 Gen 3 should/might/could be utilized in GGML for purpose of real "real-time" English subtitle.

Screenshot from 2024-03-10 21-57-20

I don't know why today's network is stable and Google is available accordingly and Google search is really helpful for this breakthrough progress. anyway, thanks so much.BTW, miniwav (got it by great Google) is also really helpful(during troubleshooting) for this breakthrough progress. @mhroth, Thanks a lot.

At the last, I'd like to express my sincerely thanks to the great open source AI project whisper.cpp once again at the moment:without the strength and power by the excellent and amazing whisper.cpp, the above scenario in this PoC/project could not be seen.

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 16, 2024

Updated on 03-16-2024,21:19, got a better whispercpp inference performance(from 6 secs to 0.7 sec) in complicated real scenario such as online-TV playback and online-TV transcription and online-TV audio recording works at the same time after fine-tune for Xiaomi 14(commit could be found here).

Screenshot from 2024-03-16 21-18-24


Updated on 03-16-2024,22:36 (Beijing Time / GMT + 8), here is a video of running the whisper.cpp on a Xiaomi 14 device - fully offline, on-device (no Client-Server).

realtime-subtitle-by-whispercpp-demo-on-xiaomi14.mp4

Updated on 03-17-2024,11:19 (Beijing Time / GMT + 8)

FYI:

Parts of latest source codes of this PoC could be found at:

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c

https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java

Final outcome of this PoC could be found at kantv-poc-with-whispercpp branch.

The master branch is preferred for AI experts or programmers 's R&D development activity since 03-17-2024.

BTW:

  • These codes(a new and very concise Android example of whispercpp and key-point codes of implement real-time subtitle on Android device......)could/MIGHT not be merged to upstream because it's heavily depend on FFmpeg and MIGHT be brought side effect of code pollution in upstream whisper.cpp. but these codes could be referenced for standalone R&D development activity on Android-based device.

  • Xiaomi 14(Qualcomm Snapdragon 8 Gen 3) or other powerful Android phone(Qualcomm Snapdragon 8 Gen 4 would be available on Oct 2024) is strongly recommended for this PoC or R&D development activity to keep sync with updated master branch.

Roadmap:

  • merge latest codes from upstream whisper.cpp to validate inference performance on Android device(updated on 03-18-2024, 10:34, done)
  • remove customized Exoplayer and cleanup codes and then make this project as Project Whispercpp-Android: an open source streaming media + device-side AI project based on great FFmpeg + great Whispercpp used for study or practice the-state-of-the-art AI technology in real application/real complicated scenario(updated on 03-18-2024, 10:34, done)
  • study internal detail of ggml/whisper.cpp and study AI engine in Android device for purpose of real "real-time" subtitle in real complicated scenario(online-TV playback and online-TV transcription(real-time subtitle) and online-TV language translation and online-TV video&audio recording works at the same time)
  • ...

@zhouwg zhouwg changed the title PoC:integrate whisper.cpp to KanTV for purpose of real-time subtitle with online TV PoC:integrate whisper.cpp to KanTV for purpose of real-time English subtitle with English online-TV(OTT TV) Mar 17, 2024
zhouwg added a commit that referenced this issue Mar 18, 2024
switch to Project Whispercpp-Android successfully according to roadmap after finsihed PoC #64

this is the new baseline for new Project KanTV(aka Project Whispercpp-Android)
@zhouwg zhouwg closed this as completed Mar 22, 2024
@darcyg
Copy link

darcyg commented Mar 23, 2024

Congratulations to you

@zhouwg
Copy link
Owner Author

zhouwg commented Mar 23, 2024

Congratulations to you

😄 thanks. have fun with the great whisper.cpp(backend by the great OpenAI),this is truly amazing AI technology brought by the great genius programmer @ggerganov.

@zhouwg zhouwg reopened this Apr 14, 2024
@zhouwg zhouwg removed the good first issue Good for newcomers label Apr 14, 2024
@zhouwg zhouwg closed this as completed Apr 15, 2024
@zhouwg zhouwg reopened this Apr 15, 2024
@zhouwg zhouwg changed the title PoC:integrate whisper.cpp to KanTV for purpose of real-time English subtitle with English online-TV(OTT TV) PoC:integrate whisper.cpp to KanTV for purpose of implementation of real-time AI subtitle with English online-TV(OTT TV) Apr 15, 2024
@zhouwg zhouwg changed the title PoC:integrate whisper.cpp to KanTV for purpose of implementation of real-time AI subtitle with English online-TV(OTT TV) PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) Apr 15, 2024
@zhouwg zhouwg self-assigned this Apr 23, 2024
@zhouwg zhouwg closed this as completed May 26, 2024
@zhouwg zhouwg changed the title PoC:Integrate whisper.cpp to KanTV for purpose of clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) PoC:clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) May 27, 2024
@zhouwg zhouwg changed the title PoC:clean-room implementation of real-time AI subtitle with English online-TV(OTT TV) PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants