Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: New python based entry point for containers #1686

Closed
wants to merge 1 commit into from

Conversation

jpodivin
Copy link
Contributor

@jpodivin jpodivin commented Jun 3, 2023

No description provided.

@jpodivin
Copy link
Contributor Author

jpodivin commented Jun 5, 2023

@SlyEcho This is the endpoint change I've mentioned.

@SlyEcho
Copy link
Collaborator

SlyEcho commented Jun 5, 2023

Does it allow passing arguments with spaces?

@jpodivin
Copy link
Contributor Author

jpodivin commented Jun 6, 2023

Does it allow passing arguments with spaces?

I'm not sure what exactly you mean. My implementation makes some changes. But, for example, these work:

docker run -v /local/model/path/:/models llama.cpp:endpoint --all-in-one /models/7B
docker run -v/local/model/path/:/models llama.cpp:endpoint --run /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512

But I wouldn't really like to get this specific version merged. I would prefer subcommands over those args, as I wouldn't have to worry about overloading arguments between run/quantize/etc.

Not to say that it's impossible, but subcommands do offer flexibility you don't have with multiple arguments.

@SlyEcho
Copy link
Collaborator

SlyEcho commented Jun 6, 2023

That's what I meant.

I would prefer subcommands over those args, as I wouldn't have to worry about overloading arguments between run/quantize/etc

Couldn't it be done by just checking argv[1]? I don't know how to do it in Python but there is not really any "parsing" needed, I think, just switching by the string value. Kind of like the old script did.

@jpodivin
Copy link
Contributor Author

jpodivin commented Jun 7, 2023

Advantage of python argparse is that it does take care of those checks for you, including edge cases.
You can define CLI of arbitrary complexity using combination of commands, subcommands and args. The help, error handling, type checks and defaults come in the bargain.

Now you can do that with plain conditionals checks on argv elements, but you would also have to implement some other logic to make it robust. And that puts unnecessary burden on maintainers.

What I would propose is something like this:

usage: tools.py [-h] model {run,convert,quantize,all-in-one} ...

positional arguments:
  model                 Directory containing model file, or model file itself
                        (*.pth, *.pt, *.bin)
  {run,convert,quantize,all-in-one}
    run                 Run a model previously converted into ggml
    convert             Convert a llama model into ggml
    quantize            Optimize with quantization process ggml
    all-in-one          Execute convert & quantize

options:
  -h, --help            show this help message and exit

Each one of those would have their own args, so no chance of overloading. Actually I'm bit ashamed I've not proposed it already.

Edit: I'll make ASCIInema demo later today, when I have more time.

@jpodivin
Copy link
Contributor Author

jpodivin commented Jun 7, 2023

I have the promised demo.[0] It's aciinema[1] recording, so you can play it in terminal. I do recommend using the -i 1 when playing it, as otherwise you will have to endure long gaps in activity, as my desktop struggles. Also, I've been doing other things, and after quantization command I forgot I was recording for couple of minutes.

[0] https://gist.github.com/jpodivin/ef4d037c21bfc2ce0a9f91b1d3f29ea5
[1] https://asciinema.org/docs/usage

@jpodivin jpodivin force-pushed the python-endpoint branch 2 times, most recently from 5020781 to 89e7976 Compare June 8, 2023 18:30
@jpodivin
Copy link
Contributor Author

jpodivin commented Jun 8, 2023

@SlyEcho wdyt?

@SlyEcho
Copy link
Collaborator

SlyEcho commented Jun 8, 2023

I have not had time to test it yet.

@jpodivin jpodivin marked this pull request as ready for review June 15, 2023 06:37
@SlyEcho SlyEcho mentioned this pull request Aug 9, 2023
@jpodivin
Copy link
Contributor Author

@SlyEcho So are we moving this out of WIP? Or should I just close it?

@SlyEcho
Copy link
Collaborator

SlyEcho commented Aug 29, 2023

I think it's still worth it.

There are a couple of things that have changed:

  1. A lot more Dockerfiles.
  2. The model format changed, so now we have .gguf files.

@jpodivin
Copy link
Contributor Author

jpodivin commented Aug 29, 2023 via email

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
@jpodivin
Copy link
Contributor Author

@SlyEcho I've updated the script to work with new model format and server binary. Also I've replaced the endpoint in other container files. I've tested the standard file quantization, conversion, run and server binary with open-llama.

@jpodivin jpodivin closed this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants