Replies: 1 comment
-
Hi @sandys, Thanks for the questions. DJL Serving is a complete end to end solution that I think will fit your needs. You can use DJLServing to host Huggingface LLMs with our Python, DeepSpeed, or FasterTransformer Engines. For mpt-7b, I would recommend either the DeepSpeed or Python engine. You can pull the You will want to create a model directory, and in this directory include a
We have default python handlers that will handle the model loading and inference processing, but if you want to create your own you can also include a model.py file in the same directory. You can use our default handler as a guide here https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/setup/djl_python/deepspeed.py you can then run the container and serve the model like this:
We have many examples of using our containers with sagemaker. You can find those here https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/generativeai |
Beta Was this translation helpful? Give feedback.
-
hi all
has anyone here successfully loaded and worked with any Huggingface LLM ?
we tried to use https://huggingface.co/mosaicml/mpt-7b and our attempt is https://github.com/arakoodev/onnx-djl-example , but it doesnt seem to work. (we convert it to ONNX and try to load in DJL)
is there a better way to do it ? It seems that this PR was merged (#2637), so im wondering whether it is possible now.
Beta Was this translation helpful? Give feedback.
All reactions