Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less hacky integration with whisper.cpp #15

Open
shun-liang opened this issue Oct 15, 2024 · 1 comment
Open

Less hacky integration with whisper.cpp #15

shun-liang opened this issue Oct 15, 2024 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@shun-liang
Copy link
Owner

On Apple Silicon, whisper.cpp runs much faster than faster-whisper, as the whisper.cpp accelerates with the Apple GPU through CoreML, while faster-whisper only supports running on the CPUs on Apple Silicon.

Faster-whisper relies on CTranslate2 for Transformers inferences. There does not seem to be any hope that CoreML will be supported by that project. (See OpenNMT/CTranslate2#1607 and OpenNMT/CTranslate2#1586)

Right now, yt2doc uses faster-whisper as the transcription backend by default, but does also support whisper.cpp. The whisper.cpp support however is somewhat hacky (see here) and cumbersome as it requires the user to have installed/compiled whisper.cpp on their device themselves.

It's possible to use whisper.cpp through one of its Python bindings. However, among them, only pywhispercpp is actively maintained and claims to have supported CoreML. The CoreML support of pywhispercpp, however, requires cloning the repository and build the project locally with an environment variable that feeds into the building process. I have not found a way to include pywhispercpp with the environment variable in the pyproject.toml of this project as a dependency, which is essntial to distribute yt2doc through PyPI.

Any idea or solution on this is much appreciated.

@shun-liang shun-liang added the help wanted Extra attention is needed label Oct 15, 2024
@shun-liang
Copy link
Owner Author

Alternatiely, we may be able to drop both faster-whisper and whisper.cpp if running Whisper models on Hugging Face's transformers gives good support on Apple GPU without faff. Need to investigate that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant