Reranker inference service intended for use with the Digital Assistant. Simply hosts a reranker model using HuggingFace transformers and exposes a prediction endpoint.
make build
To run in the project use
make run
When running in production, use
docker volume create hf_cache # If not exists
docker run -it -p 5000:5000 -v hf_cache:/app/hf_cache --gpus all -e API_KEY=<token> ghcr.io/aidotse/reranker-inference:latest
make push