Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : add context param for disable gpu #1293

Merged
merged 17 commits into from
Nov 6, 2023

Conversation

jhen0409
Copy link
Contributor

@jhen0409 jhen0409 commented Sep 15, 2023

Currently the Metal backend is using some SIMD operations, which it's only supported for Apple7+ family devices (ref: Metal-Feature-Set-Tables.pdf). In order to allow most older devices to run whisper.cpp normally, we can provide a param like use_gpu.

I think it will be helpful for some whisper.cpp binding that enabled Metal in build, but need to support old devices, or just don't want to access the GPU resources in some cases.

In this PR, I've added a new struct whisper_context_params { use_gpu = true } for that, I think it will also used for param like use_mmap in the future.

@ggerganov please let me know if you think this is a good idea or not, if yes I will do update for other backends & all bindings.

@bobqianic
Copy link
Collaborator

It seems that this modification could introduce some backward compatibility issues, which would necessitate refactoring many of the APIs. I recall the last time I attempted to introduce a debug-mode option in whisper.cpp. It proved challenging to implement without disrupting the existing API. Generally speaking, I believe the current API design lacks elegance. A well-designed API should ensure both forward and backward compatibility, so that minor changes don't inconvenience or disrupt the users.

@jhen0409
Copy link
Contributor Author

It seems that this modification could introduce some backward compatibility issues, which would necessitate refactoring many of the APIs. I recall the last time I attempted to introduce a debug-mode option in whisper.cpp. It proved challenging to implement without disrupting the existing API. Generally speaking, I believe the current API design lacks elegance. A well-designed API should ensure both forward and backward compatibility, so that minor changes don't inconvenience or disrupt the users.

We can consider to add new methods and mark deprecated for the old methods, it can retain some flexibility for users.

(There were some issues on Comments with my phone that made me delete the previous comment, sorry for the double notifications.)

@bobqianic
Copy link
Collaborator

@ggerganov ping

@ggerganov
Copy link
Owner

Will take a look this week - been travelling for a few days

@ggerganov
Copy link
Owner

@jhen0409 Yes, this is a good idea

@jhen0409
Copy link
Contributor Author

jhen0409 commented Oct 5, 2023

I'm looking for a way to disable CUDA backend (and OpenCL) by the param, should we just check GGML_BACKEND_GPU or need add a param to ggml?

I found all tensors are not setup backend as GGML_BACKEND_GPU, and it only use ggml_cuda_mul_mat (during can_mul_mat(...) will be true). Compared to llama.cpp, I think it missing setup such like a call of ggml_cuda_assign_buffers, but I'm not still not familiar with it. Also, it seems like related to #1179.

ggml-metal.m Outdated
Comment on lines 156 to 178
#ifdef GGML_SWIFT
bundle = SWIFTPM_MODULE_BUNDLE;
#else
UNUSED(msl_library_source);
bundle = [NSBundle bundleForClass:[GGMLMetalClass class]];
#endif

// read the source from "ggml-metal.metal" into a string and use newLibraryWithSource
{
NSError * error = nil;
NSString * libPath = [bundle pathForResource:@"default" ofType:@"metallib"];
if (libPath != nil) {
NSURL * libURL = [NSURL fileURLWithPath:libPath];
metal_printf("%s: loading '%s'\n", __func__, [libPath UTF8String]);
ctx->library = [ctx->device newLibraryWithURL:libURL error:&error];
} else {
metal_printf("%s: default.metallib not found, loading from source\n", __func__);

NSString * sourcePath = [bundle pathForResource:@"ggml-metal" ofType:@"metal"];
metal_printf("%s: loading '%s'\n", __func__, [sourcePath UTF8String]);
NSString * src = [NSString stringWithContentsOfFile:sourcePath encoding:NSUTF8StringEncoding error:&error];
if (error) {
metal_printf("%s: error: %s\n", __func__, [[error description] UTF8String]);
return NULL;
}
Copy link
Contributor Author

@jhen0409 jhen0409 Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enabled metal for test whisper.swiftui, and the project required to load compiled default.metallib, so I made this change. @bachittle I think this should be also help for ggerganov/llama.cpp#3284.

For GGML_SWIFT I use SWIFTPM_MODULE_BUNDLE instead and reuse the code of default.metallib load, I think it should be also work in llama.cpp swift package (need some tests later).

@jhen0409
Copy link
Contributor Author

jhen0409 commented Oct 6, 2023

Aside from the GPU backend questions, I think another things are ready to review.

@jhen0409 jhen0409 marked this pull request as ready for review October 6, 2023 06:33
@paulocoutinhox
Copy link

It solve this #1386?

@bobqianic
Copy link
Collaborator

I found all tensors are not setup backend as GGML_BACKEND_GPU, and it only use ggml_cuda_mul_mat (during can_mul_mat(...) will be true). Compared to llama.cpp, I think it missing setup such like a call of ggml_cuda_assign_buffers, but I'm not still not familiar with it. Also, it seems like related to #1179.

I'm in a similar situation. I'm completely lost in the llama.cpp code. Can't figure out how to offload the other operations to the GPU.

@ggerganov
Copy link
Owner

Aside from the GPU backend questions, I think another things are ready to review.

I will implement full GPU offloading in the following days.
Turning off CUDA is currently discussed in ggerganov/llama.cpp#3946

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhen0409 This should be good to merge, correct?

@jhen0409
Copy link
Contributor Author

jhen0409 commented Nov 5, 2023

@jhen0409 This should be good to merge, correct?

Yes, it should be ready.

@ggerganov ggerganov merged commit 0463028 into ggerganov:master Nov 6, 2023
39 checks passed
vonstring pushed a commit to vonstring/whisper.cpp that referenced this pull request Nov 7, 2023
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
felrock pushed a commit to felrock/whisper.cpp that referenced this pull request Nov 18, 2023
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need feedback Testing and feedback with results are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants