[need help] a simple python implementation of parallel.cpp #930

littlebai3618 · 2023-11-21T06:06:16Z

I am in need of an HTTP API that supports continuous batch processing, so I have decided to implement it myself.

I encountered some issues while trying to implement continuous batch processing using the low-level llama.cpp API provided by this project. Therefore, I have posted my implementation here to seek help.
i mainly refer to：https://github.com/ggerganov/llama.cpp/blob/master/examples/parallel/parallel.cpp

I am experiencing a possible memory leak when performing continuous batch processing with large contexts and batches.

I have raised two separate issues, one in this repository (llama.cpp) and another in llama-cpp-python, to provide more information about the problem.

Welcome to point out any errors, and I will fix them as soon as possible.

notice:
this demo not support grammar、 terminal args、 prompt file

littlebai3618 added 3 commits November 21, 2023 13:22

add a simple python implementation of parallel.cpp

3664f6e

add a simple python implementation of parallel.cpp

24cc9d3

add a simple python implementation of parallel.cpp

f76cc0e

littlebai3618 marked this pull request as draft November 21, 2023 06:09

littlebai3618 closed this Nov 24, 2023

Provide feedback