Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Llava as a shared library for downstream projects #3613

Merged
merged 34 commits into from
Nov 6, 2023

Conversation

damian0815
Copy link
Contributor

@damian0815 damian0815 commented Oct 13, 2023

  • convert llava example into struct and init/process/free prompts
  • populate llava-test.cpp with tests
  • load image from base64 string embedded in the prompt
  • decouple image loading from model loading so a single llava_context instance can serve multiple prompt requests
  • whatever else is necessary to enable llama-cpp-python bindings ..?

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 13, 2023

This PR will be very helpful for implementing llava on the server.

@damian0815
Copy link
Contributor Author

damian0815 commented Oct 13, 2023

base64 support is working.

../llama.cpp/build/bin/llava -m ./ggml-model-q5_k.gguf --mmproj mmproj-model-f16.gguf --temp 0.1 \ 
-p '<img src=""> describe the visual content of this image' --verbose-prompt

base64 bytes are this image:
overfitting_lc

output:
The image shows a graph with a line that starts at the top left corner and goes down to the bottom right corner. The line is labeled "loss" and has a negative slope. The graph also has a line labeled "validation" that starts at the top right corner and goes down to the bottom left corner. This line has a positive slope. The graph is likely showing the results of a machine learning model's performance, with the "loss" line representing the model's error and the "validation" line representing the model's accuracy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this should go to common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


bool clip_image_load_from_bytes(const unsigned char * bytes, int bytes_length, clip_image_u8 * img) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool clip_image_load_from_bytes(const unsigned char * bytes, int bytes_length, clip_image_u8 * img) {
bool clip_image_load_from_bytes(const unsigned char * bytes, size_t size, clip_image_u8 * img) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


const char * clip_path = params.mmproj.c_str();
const char * img_path = params.image.c_str();
static void find_image_tag_in_prompt(const std::string& prompt, size_t& begin_out, size_t& end_out) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have such functions in llava-utils.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done

auto base64_bytes_start = img_base64_str_start + strlen(IMG_BASE64_TAG_BEGIN);
auto base64_bytes_count = img_base64_str_end - base64_bytes_start;
auto base64_str = prompt.substr(base64_bytes_start, base64_bytes_count );
printf("base64_str: '%s'\n", base64_str.c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not print this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was debugging/wip - done.


if (!clip_image_preprocess(ctx_clip, &img, &img_res, /*pad2square =*/ true)) {
fprintf(stderr, "%s: unable to preprocess %s\n", __func__, img_path);
struct llava_context * llava_init(gpt_params * params) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image loading and inference parts should be stripped off this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

free(ctx_llava->image_embd);
}

void llava_process_prompt(struct llava_context * ctx_llava, gpt_params * params, const char * prompt) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid such god-like functions for now --it kills hackability in early stages of development --it brings no benefit over functions in llava-utils.h. Better to have single-responsibility functions as much as possible to enhance the development speed and flexibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 sure

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 13, 2023

Server now support LlaVA #3589

@damian0815
Copy link
Contributor Author

damian0815 commented Oct 14, 2023

@monatis thanks for the review, but do please note this PR is still in "draft" state. i addressed all your comments, and actually many of them anticipated what i had planned to do anyway.

so. i'm not sure if the following is correct/desired or in fitting with the way this project is organised, but i went ahead and did it anyway. essentially, i have refactored the existing llava demo to a llava library and a llava cli/MVP demo:

  • the demo executable now has its own cpp file llava-cli.cpp. building now emits llava-cli executable rather than llava. documentation has been updated to reflect this.
  • this leaves llava.h and llava.cpp as a standalone library. i've updated the CMakeLists.txt to roll these along with clip.h and clip.cpp into a libllava.a output.

as mentioned in the top post my primary goal here is to enable lmql support, which means enabling llava code paths with llama-cpp-python bindings. am i headed in the right direction to do this, or is there something more appropriate i could be doing? i note there's a server project as well, but i'm not sure if/how this plays with llama-cpp-python.

@damian0815
Copy link
Contributor Author

actually now that i look over llama-cpp-python this might all be overengineered. all that's really needed i guess is

  1. a mechanism to load the multimodal projector model
  2. some way of generating image embeds and injecting them into the llama context

@damian0815
Copy link
Contributor Author

damian0815 commented Oct 15, 2023

this is in a much better state now.

the public api is basically:

/** load mmproj model */
LLAMA_API struct clip_ctx * clip_model_load(const char * fname, const int verbosity);
/** free mmproj model */
LLAMA_API void clip_free(struct clip_ctx * ctx);

/** sanity check for clip <-> llava embed size match */
LLAMA_API bool llava_validate_embed_size(const llama_context * ctx_llama, const clip_ctx * ctx_clip);

/** build an image embed from image file bytes */
LLAMA_API struct llava_image_embed * llava_image_embed_make_with_bytes(struct clip_ctx * ctx_clip, int n_threads, const unsigned char * image_bytes, int image_bytes_length);
/** build an image embed from a path to an image filename */
LLAMA_API struct llava_image_embed * llava_image_embed_make_with_filename(struct clip_ctx * ctx_clip, int n_threads, const char * image_path);
/** free an embedding made with llava_image_embed_make_* */
LLAMA_API void llava_image_embed_free(struct llava_image_embed * embed);

/** write the image represented by embed into the llama context with batch size n_batch, 
 * starting at context pos n_past. on completion, n_past points to the next position in the context after the image embed. */
LLAMA_API bool llava_eval_image_embed(struct llama_context * ctx_llama, const struct llava_image_embed * embed, int n_batch, int * n_past);

@slaren
Copy link
Collaborator

slaren commented Oct 15, 2023

Some minor suggestions about the API:

  • Move n_threads to the context as llama does
  • Do not ask for n_batch, use the value from llama_context::cparams. You can add a function uint32_t llama_n_batch(llama_context * ctx) to obtain it if needed.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The llava API is not a good idea.

The only thing a 3rd party project needs in order to implement LLaVA is:

  • CLIP API
  • LLaMA API

Everything else are helper functions that do not belong neither to llama.cpp nor clip.cpp. If we want to reuse LLaVA helpers across llama.cpp's examples (e.g. llava, server, etc.) these should go into common/llava.h/.cpp

@monatis
Copy link
Collaborator

monatis commented Oct 15, 2023

LLAMA_API struct llava_image_embed * llava_image_embed_make_with_filename(struct clip_ctx * ctx_clip, int n_threads, const char * image_path);

Some first-impression suggestions from my side:

  • Better to keep such structs / functions in clip_ prefix because:
    1. Having two different prefixes is confusing.
    2. It's not future-proof --I'll implement other SOTA multimodal models very soon starting with Idefics and re-use these functions to encode images.
  • Use the verb encode instead of make --the latter implies a constructor.
  • Use size_t instead of int to type sizes.

I'll make a thorough review tomorrow.

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 15, 2023

The llava API is not a good idea.

The only thing a 3rd party project needs in order to implement LLaVA is:

  • CLIP API
  • LLaMA API

Everything else are helper functions that do not belong neither to llama.cpp nor clip.cpp. If we want to reuse LLaVA helpers across llama.cpp's examples (e.g. llava, server, etc.) these should go into common/llava.h/.cpp

See my implementation of llava in server example

CMakeLists.txt Outdated Show resolved Hide resolved
@monatis monatis marked this pull request as ready for review November 6, 2023 00:39
@monatis monatis marked this pull request as draft November 6, 2023 01:31
Copy link
Collaborator

@monatis monatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows+CUDA still fails.

@monatis monatis marked this pull request as ready for review November 6, 2023 02:01
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close to what I had in mind, though some things on the API level can be improved. I've added a few comments regarding that.

build-info.h Outdated Show resolved Hide resolved
common/CMakeLists.txt Outdated Show resolved Hide resolved
examples/llava/llava-utils.h Outdated Show resolved Hide resolved
examples/llava/llava-utils.h Outdated Show resolved Hide resolved
examples/llava/llava.cpp Outdated Show resolved Hide resolved
examples/llava/llava.h Outdated Show resolved Hide resolved
examples/llava/llava.h Outdated Show resolved Hide resolved
examples/llava/clip.h Outdated Show resolved Hide resolved
target_link_libraries(${TARGET} PRIVATE common ggml ${CMAKE_THREAD_LIBS_INIT})
set(TARGET llava)

add_library(${TARGET} STATIC llava.cpp llava.h clip.cpp clip.h)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STATIC should be dropped here to allow building llava as a shared library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update it to build a shared library as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, from my tests that looks like the only change I need to integrate this with llama-cpp-python.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, now it also builds as a shared lib.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, thank you!

Ommiting the STATIC / OBJECT from target_link_libraries also seemed to work with just the llava target without the need for the llava_shared target but if having different targets is preffered I don't mind it.

@FSSRepo
Copy link
Collaborator

FSSRepo commented Nov 6, 2023

It seems to me that a good way to check if this PR is achieving its goals is to implement the Llava API in the server example. If it proves to be a better option than how Llava currently functions on the server, it's easy to deduce that we're heading in the right direction since we can eliminate a lot of duplicate code from the server. This is just my opinion to help; I don't intend to come across as a know-it-all.

@monatis monatis changed the title refactor Llava into a servable object/API Expose Llava as a shared library for downstream projects Nov 6, 2023
@monatis monatis merged commit 381efbf into ggerganov:master Nov 6, 2023
32 checks passed
@damian0815
Copy link
Contributor Author

glad this made it in, thanks folks!

monatis added a commit that referenced this pull request Nov 13, 2023
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
…#3613)

* wip llava python bindings compatibility

* add external llava API

* add base64 in-prompt image support

* wip refactor image loading

* refactor image load out of llava init

* cleanup

* further cleanup; move llava-cli into its own file and rename

* move base64.hpp into common/

* collapse clip and llava libraries

* move llava into its own subdir

* wip

* fix bug where base64 string was not removed from the prompt

* get libllava to output in the right place

* expose llava methods in libllama.dylib

* cleanup memory usage around clip_image_*

* cleanup and refactor *again*

* update headerdoc

* build with cmake, not tested (WIP)

* Editorconfig

* Editorconfig

* Build with make

* Build with make

* Fix cyclical depts on Windows

* attempt to fix build on Windows

* attempt to fix build on Windows

* Upd TODOs

* attempt to fix build on Windows+CUDA

* Revert changes in cmake

* Fix according to review comments

* Support building as a shared library

* address review comments

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants