sync : ggml (ggml-backend) #3548

ggerganov · 2023-10-08T14:39:04Z

@slaren Should we merge what we already have for ggml-backend? If there isn't any breaking change, we might want to do so in order to keep things in sync more easily

ggml-ci

FSSRepo · 2023-10-08T15:09:52Z

Hello, I have some questions about backends, and these are general (apply to any project, not necessarily LLM inference) as I have seen that ggml-cuda and other backends in llama.cpp differ from the implementation in the ggml repository, there are some specific kernels. I am planning to implement the CUDA backend in stable-diffusion.cpp, but I have a lot of doubts. Could you please take some time to clarify them for me?

slaren · 2023-10-08T15:19:50Z

@ggerganov yes, I think we should merge this here to avoid conflicts in the future. I already tested the changes to ggml-alloc here yesterday, but I'll run some additional tests to be sure.

@FSSRepo most of the development of the backends happens in this repository, but the changes to ggml are regularly synced between the ggml and llama.cpp repositories. I am not sure if that was your question, if you have other doubts just ask.

sync : ggml (ggml-backend) (ggerganov#3548)

cebtenzzre · 2023-10-13T22:03:26Z

This change broke offloading Falcon-7B with ngl=34 (v cache on GPU but not k cache). I'm getting this assertion failure:

GGML_ASSERT: ggml-cuda.cu:7068: src0->backend == GGML_BACKEND_GPU

slaren · 2023-10-14T01:01:29Z

@cebtenzzre I think it was already broken before, this change just made it assert rather than generate nonsense. The source of the problem is that the packed wqkv weight doesn't play well with CUDA, it needs the kcur portion on the CPU, but currently I don't see a good way to copy a view back to the CPU. This should be fixed in the future with ggml-backend.

cebtenzzre · 2023-11-07T18:22:14Z

This change broke offloading Falcon-7B with ngl=34

Maybe Falcon was already broken (don't remember if I checked), but LLaMA 7B with ngl=34 was definitely producing correct output before this PR: #3820

xaedes · 2023-12-21T17:07:45Z

ggml-alloc.c

 // check if a tensor is allocated by this buffer
 static bool ggml_allocr_is_own(struct ggml_allocr * alloc, const struct ggml_tensor * tensor) {
-    void * ptr = tensor->data;
-    return ptr >= alloc->data && (char *)ptr < (char *)alloc->data + alloc->max_size;
+    return tensor->buffer == alloc->buffer;


In train & finetune I make new allocator, allocate tensors, free the allocator and then repeat that for different groups of tensors.
Before this change the actual data pointer was used to check ownership. This pointer differed between successive allocators due to a different data buffer.
But the buffer member used here may not differ, because it is malloced by ggml_backend_buffer_init. It actually happens that the previous memory address is used, because the previous allocator along with its buffer was freed.

So with this change ggml_allocr_is_own does not work correctly with the use case of a temporary allocator.
The allocator must be kept around as long as the tensors itself.

Small example:

std::vector<char> buf0; std::vector<char> buf1; buf0.resize(...); buf1.resize(...); struct ggml_tensor * a = ...; struct ggml_tensor * b = ...; struct ggml_allocr * alloc; alloc = ggml_allocr_new(buf0.data(), buf0.size(), alignment); ggml_allocr_alloc(alloc, a); ggml_allocr_free(alloc); alloc = ggml_allocr_new(buf1.data(), buf1.size(), alignment); ggml_allocr_alloc(alloc, b); // Now ggml_allocr_is_own(alloc, a) can be unexpectedly `true` ggml_allocr_free(alloc);

This results in the allocator freeing tensors that are not actually owned by itself.
I write it 'can happen', but I actually observe this behaviour in train & finetune.
Whether it manifests itself probably depends on malloc.

I can change the train & finetune programs to not free the allocators.
This would solve the problem.

But in the above example it still is a bit weird, that ggml_allocr_is_own(alloc, a) can unexpectedly return true when the previous allocator was freed.

Maybe I missed other cases, but this should have been fixed in #4486.

The problem is that checking the addresses doesn't really work with measure allocators. The previous implementation used virtual memory as a stopgap, but this has other problems. Ultimately, the issue is that ggml_allocr_new creates a temporary ggml_backend_buffer that is freed when the allocator is freed, and from that point these tensors cannot be used anymore since they point to a freed buffer. When using ggml-backend it is possible to free the allocator if you supply your own buffer, but otherwise the ggml-alloc should be kept alive for the lifetime of the tensors.

Indeed, this solves issues somewhat for finetune. But there are some more cases. I will push PR after I come back from holiday family visit.

sync : ggml (ggml-backend)

20fad2e

ggml-ci

zig : add ggml-backend to the build

2fafc15

slaren approved these changes Oct 8, 2023

View reviewed changes

ggerganov marked this pull request as ready for review October 8, 2023 17:19

ggerganov merged commit db3abcc into master Oct 8, 2023
34 checks passed

ggerganov deleted the sync branch October 8, 2023 17:19

zshannon added a commit to zshannon/llama.cpp that referenced this pull request Oct 8, 2023

Merge pull request #10 from ggerganov/master

a111ff9

sync : ggml (ggml-backend) (ggerganov#3548)

jhen0409 mentioned this pull request Oct 9, 2023

fix swift package + add example swift implementation of batched api #3562

Merged

slaren mentioned this pull request Oct 11, 2023

Finetune crash with Mistral Model #3578

Closed

xaedes reviewed Dec 21, 2023

View reviewed changes

slaren mentioned this pull request Jan 9, 2024

Failed to train-text-from-scratch #4791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : ggml (ggml-backend) #3548

sync : ggml (ggml-backend) #3548

ggerganov commented Oct 8, 2023

FSSRepo commented Oct 8, 2023

slaren commented Oct 8, 2023

cebtenzzre commented Oct 13, 2023 •

edited

Loading

slaren commented Oct 14, 2023

cebtenzzre commented Nov 7, 2023 •

edited

Loading

xaedes Dec 21, 2023 •

edited

Loading

slaren Dec 21, 2023

xaedes Dec 26, 2023

sync : ggml (ggml-backend) #3548

sync : ggml (ggml-backend) #3548

Conversation

ggerganov commented Oct 8, 2023

FSSRepo commented Oct 8, 2023

slaren commented Oct 8, 2023

cebtenzzre commented Oct 13, 2023 • edited Loading

slaren commented Oct 14, 2023

cebtenzzre commented Nov 7, 2023 • edited Loading

xaedes Dec 21, 2023 • edited Loading

Choose a reason for hiding this comment

slaren Dec 21, 2023

Choose a reason for hiding this comment

xaedes Dec 26, 2023

Choose a reason for hiding this comment

cebtenzzre commented Oct 13, 2023 •

edited

Loading

cebtenzzre commented Nov 7, 2023 •

edited

Loading

xaedes Dec 21, 2023 •

edited

Loading