-
Notifications
You must be signed in to change notification settings - Fork 12
2.2.9 Backend: text generation inference
Handle:
tgi
URL: http://localhost:33851/
A Rust, Python and gRPC server for inference from HuggingFace.
TGI's API is not fully compatible with Open Webui, so you also need to use litellm
if you need to connect them together.
# [Optional] Pull the tgi images
# ahead of starting the service
harbor pull tgi
# Harbor's litellm is pointing to the tgi
# by default as well
harbor up tgi litellm
# You should now see a new "tgi" model in the Open WebUI
TGI downloads the models on its own when the service is started. Harbor's instance of the tgi
will use your global Huggingface cache. TGI supports multiple quantisation types and one must be specified alongside the model name.
When you located the model you're interested to run, Harbor can be configured as following:
# Specify the repository to run
harbor tgi model repo/model
# Also, specify the type of quant TGI should use
harbor tgi quant awq
# [Optional] some repos store specific version in a revision rather than all together, it can be specified as well
harbor tgi revision 4.0bpw
# [Optional] configure any additional arguments
# that might be needed for a specific model
harbor tgi args '--rope-factor 4.0'
# Alternatively, you can provide a full
# set of args for the TGI CLI in one go
harbor config set tgi.model.specifier '--model-id repo/model --quantize awq --revision 3_5'
# To run a gated model, ensure that you've
# also set your Huggingface API Token
harbor hf token <your-token>
TGI will serve one model at a time and must be restarted to switch models.
In addition to specifying the model, you can provide extra environment variables in the .env
file or extra arguments to pass to the TGI CLI as defined in the TGI Cli Options
# .env
ROPE_SCALING=linear
# CLI
harbor tgi args '--rope-factor 4.0'
You can run text-generation-inference
CLIs with harbor run
:
# Prints server help
harbor run tgi --help
# Launch a shell and explore on your own
harbor shell tgi
When tgi
service is running, you can access embedded tools via exec
:
harbor exec tgi text-generation-server --help