has anyone successfully loaded any HuggingFace LLM? #2673

sandys · 2023-06-23T05:18:23Z

sandys
Jun 23, 2023

hi all
has anyone here successfully loaded and worked with any Huggingface LLM ?
we tried to use https://huggingface.co/mosaicml/mpt-7b and our attempt is https://github.com/arakoodev/onnx-djl-example , but it doesnt seem to work. (we convert it to ONNX and try to load in DJL)

is there a better way to do it ? It seems that this PR was merged (#2637), so im wondering whether it is possible now.

siddvenk · 2023-06-23T19:51:16Z

siddvenk
Jun 23, 2023
Maintainer

Hi @sandys,

Thanks for the questions. DJL Serving is a complete end to end solution that I think will fit your needs. You can use DJLServing to host Huggingface LLMs with our Python, DeepSpeed, or FasterTransformer Engines.

For mpt-7b, I would recommend either the DeepSpeed or Python engine.

You can pull the deepjavalibrary/djl-serving:deepspeed-nightly image and use that. This image supports both huggingface accelerate and deepspeed. I would recommend using deepspeed with auto tensor parallelism.

You will want to create a model directory, and in this directory include a serving.properties file like this:

engine=DeepSpeed # can also use Python
option.model_id=mosaicml/mpt-7b
option.trust_remote_code=true
option.dtype=fp16
option.tensor_parallel_degree=<number of gpus available on your machine>

We have default python handlers that will handle the model loading and inference processing, but if you want to create your own you can also include a model.py file in the same directory. You can use our default handler as a guide here https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/setup/djl_python/deepspeed.py

you can then run the container and serve the model like this:

docker run -it --runtime=nvidia -p 8080:8080 --shm-size=2g -v <path to local model dir>:/opt/ml/model/test deepjavalibrary/djl-serving:deepspeed

We have many examples of using our containers with sagemaker. You can find those here https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/generativeai

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

has anyone successfully loaded any HuggingFace LLM? #2673

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

has anyone successfully loaded any HuggingFace LLM? #2673

sandys Jun 23, 2023

Replies: 1 comment

siddvenk Jun 23, 2023 Maintainer

sandys
Jun 23, 2023

siddvenk
Jun 23, 2023
Maintainer