Skip to content

2.2.9 Backend: text generation inference

av edited this page Sep 14, 2024 · 1 revision

Handle: tgi URL: http://localhost:33851/

Making TGI deployment optimal
GitHub Repo stars Swagger API documentation

A Rust, Python and gRPC server for inference from HuggingFace.

Starting

TGI's API is not fully compatible with Open Webui, so you also need to use litellm if you need to connect them together.

# [Optional] Pull the tgi images
# ahead of starting the service
harbor pull tgi

# Harbor's litellm is pointing to the tgi
# by default as well
harbor up tgi litellm

# You should now see a new "tgi" model in the Open WebUI

Models

TGI downloads the models on its own when the service is started. Harbor's instance of the tgi will use your global Huggingface cache. TGI supports multiple quantisation types and one must be specified alongside the model name.

When you located the model you're interested to run, Harbor can be configured as following:

# Specify the repository to run
harbor tgi model repo/model
# Also, specify the type of quant TGI should use
harbor tgi quant awq

# [Optional] some repos store specific version in a revision rather than all together, it can be specified as well
harbor tgi revision 4.0bpw

# [Optional] configure any additional arguments
# that might be needed for a specific model
harbor tgi args '--rope-factor 4.0'

# Alternatively, you can provide a full
# set of args for the TGI CLI in one go
harbor config set tgi.model.specifier '--model-id repo/model --quantize awq --revision 3_5'

# To run a gated model, ensure that you've
# also set your Huggingface API Token
harbor hf token <your-token>

TGI will serve one model at a time and must be restarted to switch models.

Configuration

In addition to specifying the model, you can provide extra environment variables in the .env file or extra arguments to pass to the TGI CLI as defined in the TGI Cli Options

# .env
ROPE_SCALING=linear

# CLI
harbor tgi args '--rope-factor 4.0'

CLI tools

You can run text-generation-inference CLIs with harbor run:

# Prints server help
harbor run tgi --help

# Launch a shell and explore on your own
harbor shell tgi

When tgi service is running, you can access embedded tools via exec:

harbor exec tgi text-generation-server --help
Clone this wiki locally