ggml : cgraph export/import/eval example + GPU support #108

ggerganov · 2023-04-24T18:54:14Z

This is the first step towards full GPU and custom hardware inference support (see ggerganov/llama.cpp#915)

The idea is to be able to export the ggml computation graphs (ggml_cgraph) into standalone .ggml files.
These files can be later imported by a separate application and evaluated based on the available hardware / framework (CUDA, Metal, WebGPU, etc.). The computation graph contains everything necessary to perform the inference:

model weights
operations
work buffers
sizes + offsets
memory layout

As an example, we export the MNIST computation graph from the mnist example into the file mnist.ggml:

$ ./bin/mnist ./models/mnist/ggml-model-f32.bin ../examples/mnist/models/mnist/t10k-images.idx3-ubyte

Next, using the mnist-cpu tool, we load the graph and re-evaluate it on the CPU using ggml_graph_compute():

./bin/mnist-cpu ./mnist.ggml ../examples/mnist/models/mnist/t10k-images.idx3-ubyte

Or we can run it on the Apple Silicon GPU using Metal:

./bin/mnist-mtl ./mnist.ggml ../examples/mnist/models/mnist/t10k-images.idx3-ubyte

Here is a sample run:

$ ./bin/mnist ./models/mnist/ggml-model-f32.bin ../examples/mnist/models/mnist/t10k-images.idx3-ubyte

mnist_model_load: loading model from './models/mnist/ggml-model-f32.bin'
mnist_model_load: ggml ctx size =   1.52 MB
main: loaded model in     1.02 ms
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ * * * * * * _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ * * * * * * * * _ * * * _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ * _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ * * * _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ * * * * * * * _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ * * * _ _ _ _ * * * _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ * * * * _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * * * _ _ _ _ _ _ _ _ * * * _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ * * * * * * * * * _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ggml_graph_dump_dot: dot -Tpng mnist.dot -o mnist.dot.png && open mnist.dot.png

magic            67676d6c
version                 1
leafs                   5
nodes                   6
eval                 6120

TYPE   OP              NDIMS      NE0      NE1      NE2      NE3              NB0              NB1              NB2              NB3             DATA             NAME
f32    NONE                2      500       10        1        1                4             2000            20000            20000      0x1201877d0       fc2_weight
f32    NONE                2      784      500        1        1                4             3136          1568000          1568000      0x120008100       fc1_weight
f32    NONE                1      784        1        1        1                4             3136             3136             3136      0x11e809f00            input
f32    NONE                1      500        1        1        1                4             2000             2000             2000      0x120186f00         fc1_bias
f32    NONE                1       10        1        1        1                4               40               40               40      0x12018c6f0         fc2_bias

ARG    TYPE   OP              NDIMS      NE0      NE1      NE2      NE3              NB0              NB1              NB2              NB3   NTASKS             DATA             NAME
DST    f32    MUL_MAT             1      500        1        1        1                4             2000             2000             2000        1      0x11e80ac40           node_0
SRC0   f32    NONE                2      784      500        1        1                4             3136          1568000          1568000        0      0x120008100       fc1_weight
SRC1   f32    NONE                1      784        1        1        1                4             3136             3136             3136        0      0x11e809f00            input

DST    f32    ADD                 1      500        1        1        1                4             2000             2000             2000        1      0x11e80b510           node_1
SRC0   f32    MUL_MAT             1      500        1        1        1                4             2000             2000             2000        1      0x11e80ac40           node_0
SRC1   f32    NONE                1      500        1        1        1                4             2000             2000             2000        0      0x120186f00         fc1_bias

DST    f32    RELU                1      500        1        1        1                4             2000             2000             2000        1      0x11e80bde0           node_2
SRC0   f32    ADD                 1      500        1        1        1                4             2000             2000             2000        1      0x11e80b510           node_1

DST    f32    MUL_MAT             1       10        1        1        1                4               40               40               40        1      0x11e80c6b0           node_3
SRC0   f32    NONE                2      500       10        1        1                4             2000            20000            20000        0      0x1201877d0       fc2_weight
SRC1   f32    RELU                1      500        1        1        1                4             2000             2000             2000        1      0x11e80bde0           node_2

DST    f32    ADD                 1       10        1        1        1                4               40               40               40        1      0x11e80c7e0           node_4
SRC0   f32    MUL_MAT             1       10        1        1        1                4               40               40               40        1      0x11e80c6b0           node_3
SRC1   f32    NONE                1       10        1        1        1                4               40               40               40        0      0x12018c6f0         fc2_bias

DST    f32    SOFT_MAX            1       10        1        1        1                4               40               40               40        1      0x11e80c910            probs
SRC0   f32    ADD                 1       10        1        1        1                4               40               40               40        1      0x11e80c7e0           node_4


mnist_eval: exported compute graph to 'mnist.ggml'
main: predicted digit is 3

$ dot -Tpng mnist.dot -o mnist.dot.png && open mnist.dot.png

CPU (via ggml)

$ ./bin/mnist-cpu ./mnist.ggml ../examples/mnist/models/mnist/t10k-images.idx3-ubyte
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * * * * _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * * * * * _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ * * * * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ * * * * _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ * * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ * * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ * * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ * * * * * _ _ _ _ _ 
_ _ _ _ _ * * * _ _ _ _ _ _ _ _ _ * * _ _ _ * * _ _ _ _ 
_ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ * _ _ _ _ _ * * _ _ _ 
_ _ _ _ _ _ * * _ _ _ _ _ _ _ _ * _ _ _ _ _ _ * _ _ _ _ 
_ _ _ _ _ _ * * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ 
_ _ _ _ _ _ _ * * * _ _ _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ 
_ _ _ _ _ _ _ _ _ * * * * * * * * _ _ _ * * * _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ * * * * * * * * * * _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ggml_graph_import: loaded leaf 0: '      fc2_weight',   2 dims,     20000 bytes
ggml_graph_import: loaded leaf 1: '      fc1_weight',   2 dims,   1568000 bytes
ggml_graph_import: loaded leaf 2: '           input',   1 dims,      3136 bytes
ggml_graph_import: loaded leaf 3: '        fc1_bias',   1 dims,      2000 bytes
ggml_graph_import: loaded leaf 4: '        fc2_bias',   1 dims,        40 bytes
ggml_graph_import: loaded node 0: '          node_0',   1 dims,      2000 bytes
ggml_graph_import: loaded node 1: '          node_1',   1 dims,      2000 bytes
ggml_graph_import: loaded node 2: '          node_2',   1 dims,      2000 bytes
ggml_graph_import: loaded node 3: '          node_3',   1 dims,        40 bytes
ggml_graph_import: loaded node 4: '          node_4',   1 dims,        40 bytes
ggml_graph_import: loaded node 5: '           probs',   1 dims,        40 bytes
main: predicted digit is 6

Metal

$ ./bin/mnist-mtl ./mnist.ggml ../examples/mnist/models/mnist/t10k-images.idx3-ubyte 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ * * * * * * * _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ * * * _ _ _ _ _ _ * * _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * _ _ _ _ _ _ * * * * _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ * * _ _ _ _ * * * _ * _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ * * * * * _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ * _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ggml_cgraph_import: loaded leaf 0: '      fc2_weight',   2 dims,     20000 bytes
ggml_cgraph_import: loaded leaf 1: '      fc1_weight',   2 dims,   1568000 bytes
ggml_cgraph_import: loaded leaf 2: '           input',   1 dims,      3136 bytes
ggml_cgraph_import: loaded leaf 3: '        fc1_bias',   1 dims,      2000 bytes
ggml_cgraph_import: loaded leaf 4: '        fc2_bias',   1 dims,        40 bytes
ggml_cgraph_import: loaded node 0: '          node_0',   1 dims,      2000 bytes
ggml_cgraph_import: loaded node 1: '          node_1',   1 dims,      2000 bytes
ggml_cgraph_import: loaded node 2: '          node_2',   1 dims,      2000 bytes
ggml_cgraph_import: loaded node 3: '          node_3',   1 dims,        40 bytes
ggml_cgraph_import: loaded node 4: '          node_4',   1 dims,        40 bytes
ggml_cgraph_import: loaded node 5: '           probs',   1 dims,        40 bytes
mnist_mtl_init: allocating
mnist_mtl_init: using MPS
mnist_mtl_init: allocated data buffer, size = 1594896
mnist_mtl_init: allocated eval buffer, size = 9120
mnist_mtl_init: allocated results buffer, size = 40
mnist_mtl_eval: evaluating
mnist_mtl_eval: encoding node   0, op =  MUL_MAT
mnist_mtl_get_buffer: data tensor '      fc1_weight', offs =    20512, size =  1568000
mnist_mtl_get_buffer: data tensor '           input', offs =  1588628, size =     3136
mnist_mtl_get_buffer: eval tensor '          node_0', offs =     1536, size =     2000
mnist_mtl_eval: encoding node   1, op =      ADD
mnist_mtl_get_buffer: eval tensor '          node_0', offs =     1536, size =     2000
mnist_mtl_get_buffer: data tensor '        fc1_bias', offs =  1591880, size =     2000
mnist_mtl_get_buffer: eval tensor '          node_1', offs =     3792, size =     2000
mnist_mtl_eval: encoding node   2, op =     RELU
mnist_mtl_get_buffer: eval tensor '          node_1', offs =     3792, size =     2000
mnist_mtl_get_buffer: eval tensor '          node_2', offs =     6048, size =     2000
mnist_mtl_eval: encoding node   3, op =  MUL_MAT
mnist_mtl_get_buffer: data tensor '      fc2_weight', offs =      396, size =    20000
mnist_mtl_get_buffer: eval tensor '          node_2', offs =     6048, size =     2000
mnist_mtl_get_buffer: eval tensor '          node_3', offs =     8304, size =       40
mnist_mtl_eval: encoding node   4, op =      ADD
mnist_mtl_get_buffer: eval tensor '          node_3', offs =     8304, size =       40
mnist_mtl_get_buffer: data tensor '        fc2_bias', offs =  1593996, size =       40
mnist_mtl_get_buffer: eval tensor '          node_4', offs =     8608, size =       40
mnist_mtl_eval: encoding node   5, op = SOFT_MAX
mnist_mtl_get_buffer: eval tensor '          node_4', offs =     8608, size =       40
mnist_mtl_get_buffer: eval tensor '           probs', offs =     8912, size =       40
mnist_mtl_get_buffer: eval tensor '           probs', offs =     8912, size =       40
mnist_mtl_eval: time elapsed = 0.001637
mnist_mtl_eval: probs[ 0] = 0.000000
mnist_mtl_eval: probs[ 1] = 0.000000
mnist_mtl_eval: probs[ 2] = 0.000000
mnist_mtl_eval: probs[ 3] = 0.000000
mnist_mtl_eval: probs[ 4] = 0.000000
mnist_mtl_eval: probs[ 5] = 0.000000
mnist_mtl_eval: probs[ 6] = 0.000000
mnist_mtl_eval: probs[ 7] = 0.000000
mnist_mtl_eval: probs[ 8] = 0.000000
mnist_mtl_eval: probs[ 9] = 1.000000
mnist_mtl_free: deallocating
main: predicted digit is 9

slimsag · 2023-05-10T19:00:33Z

@ggerganov I'm a bit curious/interested in this approach; I like that you are trying to separate ggml and the GPU implementation layer like this.

I'd be keen to make a quick attempt at executing the ggml graph output you have here using WebGPU from Zig; but I'm not sure exactly how to piece that output together (or even read it, necessarily) - so I wonder if you'd consider adding a C example or something that executes it on the CPU and validates the results it gets, so I could better understand how it works?

ggerganov · 2023-05-10T22:07:18Z

@slimsag Will try to prioritise this soon and finalize the export format + a CPU and/or Metal example

JohnnyOpcode · 2023-05-18T05:47:16Z

Netron supports many formats of exported graphs already. I think GGML could be easily added.

https://github.com/lutzroeder/netron

ggerganov · 2023-05-24T18:49:14Z

Bit of slow progress here, but I think it is starting to work out
Hopefully will have a working prototype over the weekend

Sslithercode · 2023-05-25T02:58:06Z

Ive been waiting for this for months, Nothing has been as easy to use as llama.cpp.

ggerganov · 2023-05-27T15:47:30Z

Ok, I'm finally at the interesting part. I have the ggml compute graph exported together without all tensor data and work buffers. Now I have to map this to the GPU and implement the operators. For MNIST we have just 4 operators:

F32 add
F32 mul mat
F32 RELU
F32 SOFT_MAX

Regarding the memory mapping, it looks like I need to use MTLHeap to map the ggml contexts and then create the MTLBuffers corresponding to the compute graph tensors as views of the heap(s) using newBufferWithLength:options:offset:

Everything should go into a single MTLCommandBuffer

philipturner · 2023-05-28T13:29:53Z

Everything should go into a single MTLCommandBuffer

Even though that command buffer takes multiple milliseconds, it won't cause a UI hitch. The Apple GPU can execute two separate command buffers concurrently from different MTLCommandQueues. The only stipulation is, a single kernel invocation within the cmdbuf doesn't take >16 ms. I recommend using Metal Frame Capture if possible (a bit buggy though).

ggerganov · 2023-05-28T14:31:20Z

This is now working as expected and can serve as a proof-of-concept for offloading a ggml compute graph to be evaluated on the GPU via Metal (or some other framework, like CUDA). There are still many things to be careful about and it's easy to mess things up, but I think with time I will be able to make it easier to work with.

Before merging this, I will move the new import / export functions to the core ggml library (currently, they are in common).

After merging, the next step will be to implement LLaMA inference with the same approach.
This will involve implementing the missing matrix-vector multiplication kernels, RoPE kernel, Norm kernel + solving the "dynamic shape" problem where some of the tensor shapes depend on the number of input / predicted tokens.

examples/mnist/main-mtl.m

ggerganov force-pushed the cgraph-export branch 4 times, most recently from 6264c52 to eed3eac Compare May 24, 2023 10:09

9876691 mentioned this pull request May 26, 2023

Build and execute our own computation graph rustformers/llm#137

Open

ggerganov force-pushed the cgraph-export branch from c52df7d to b4e3b5c Compare May 27, 2023 09:01

ggerganov changed the title ~~ggml : cgraph export brainstorming~~ ggml : cgraph export/import/eval example May 27, 2023

ggerganov marked this pull request as ready for review May 27, 2023 13:10

ggerganov added 9 commits May 27, 2023 17:07

ggml : cgraph export brainstorming

f477d4f

mnist : code style

2a03421

mnist : minor

85dcc0c

ggml : initial cgraph export

95c8507

ggml : initial graph import (wip)

3120189

ggml : import op args correctly

d2d1c22

ggml : add ggml_get_tensor_by_name()

4cfd92b

mnist : add compute graph evaluation on CPU example

b0450c2

ggml : add ggml_tensor_overhead()

ddea488

ggerganov force-pushed the cgraph-export branch from 28a5288 to ddea488 Compare May 27, 2023 14:08

ggerganov added 2 commits May 27, 2023 17:10

ggml : rename new functions to ggml_cgraph_...

f698dbf

mnist : add Metal inference skeleton (WIP)

bf93623

ggerganov mentioned this pull request May 28, 2023

Apple Silicon GPU Support Possible? ggerganov/llama.cpp#1545

Closed

ggerganov added 3 commits May 28, 2023 14:07

mnist : working on the Metal pipeline (WIP)

bb126f9

mnist : prepare the Metal encoder (WIP)

24ea9dd

mnist : first Metal kernel for F32 ADD

2ec1dff

mnist : looks like MTLHeap does not work

966f9e6

ggerganov added 4 commits May 28, 2023 16:34

mnist : initial full pass of MNIST on the GPU (not verified)

1bc9181

mnist : minor cleanup

4134bac

mnist : full GPU inference works

a556b57

mnist : use custom soft_max kernel since MPSMatrixSoftMax is bugged

8f8653b

ggerganov changed the title ~~ggml : cgraph export/import/eval example~~ ggml : cgraph export/import/eval example + GPU support May 28, 2023

ggerganov commented May 28, 2023

View reviewed changes

examples/mnist/main-mtl.m Outdated Show resolved Hide resolved

ggerganov added 5 commits May 28, 2023 21:02

mnist : use constant for soft_max instead of hardcoded 10

3b97377

mnist : check multiple predictions (Metal)

e350f13

mnist : minor

4fa01f0

ggml : move cgraph import / export to ggml

79dcbfd

mnist : remove common dependencies

25adade

hammer mentioned this pull request May 28, 2023

Local AI interactions refstudio/refstudio#59

Open

ggerganov added 3 commits May 29, 2023 19:14

mnist : fix soft_max threadgroup size

e6dc506

mnist : init no_alloc member

f9b04df

ggml : improve "get tensor" API

c8013c5

ggerganov merged commit 3b697a2 into master May 29, 2023

ggerganov deleted the cgraph-export branch May 29, 2023 16:28

ggerganov mentioned this pull request May 29, 2023

llama : Metal inference ggerganov/llama.cpp#1642

Merged

20 tasks

cztomsik mentioned this pull request May 31, 2023

Q4 quantization support huggingface/safetensors#197

Closed

python273 mentioned this pull request May 31, 2023

ggml support tinygrad/tinygrad#878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : cgraph export/import/eval example + GPU support #108

ggml : cgraph export/import/eval example + GPU support #108

ggerganov commented Apr 24, 2023 •

edited

Loading

slimsag commented May 10, 2023

ggerganov commented May 10, 2023

JohnnyOpcode commented May 18, 2023

ggerganov commented May 24, 2023

Sslithercode commented May 25, 2023

ggerganov commented May 27, 2023 •

edited

Loading

philipturner commented May 28, 2023

ggerganov commented May 28, 2023 •

edited

Loading

ggml : cgraph export/import/eval example + GPU support #108

ggml : cgraph export/import/eval example + GPU support #108

Conversation

ggerganov commented Apr 24, 2023 • edited Loading

CPU (via ggml)

Metal

slimsag commented May 10, 2023

ggerganov commented May 10, 2023

JohnnyOpcode commented May 18, 2023

ggerganov commented May 24, 2023

Sslithercode commented May 25, 2023

ggerganov commented May 27, 2023 • edited Loading

philipturner commented May 28, 2023

ggerganov commented May 28, 2023 • edited Loading

ggerganov commented Apr 24, 2023 •

edited

Loading

ggerganov commented May 27, 2023 •

edited

Loading

ggerganov commented May 28, 2023 •

edited

Loading