Radiant Model Runtime

Run any LLM in any environment. Supports CPU and GPU model runtimes.

Usage

To run the server run it either through poetry

poetry run start

or with uvicorn

poetry run uvicorn app.app:app --host 0.0.0.0 --port=8000 --reload

API

The API is intentionally super simple. It support text generation, embedding, json extraction and fine tuning. Besides, it supports loading different model architectures. Because we use FastAPI and OpenAPI can found under /docs.

Text generation

Parameters

{
  "prompt": "string", # The prompt to run 
  "token_count": 100, # (Optional) Maximum token count
  "temperature": 0, # (Optional)
  "verbose": false, # (Optional) Log additional debug information
  "stream": false # (Optional)
}

Example

curl -XPOST http://localhost:8000/api/text_generation -H 'content-type: application/json' -d '{ "prompt": "123"}'

Embedding

Parameters

{
  "prompts": [
    "string" # A list of text strings to embedd.
  ]
}

Example

curl -XPOST http://localhost:8000/api/embedding -H 'content-type: application/json' -d '{ "prompts": ["123"]}'

Fine tuning

Parameters

{
  "examples": [  # Example text that the model should be trained on
    "string"
  ],
  "steps": 100, # Training steps
  "base_model": "string", # LLM base model
  "name": "string" # Name of the new finetuned model
}

Example

curl -XPOST http://localhost:8000/api/finetunin/sft -H 'content-type: application/json' -d '{ "examples": ["123"], "steps": 10, "base_model": "llama2", "name": "finetuned_llama2"}'

Features

Support loading models in GPU memory and offloading them when switching models
Supports loading and caching of models from S3
Supports the most popular open source models and runtime like llama2, Mistral, vLLM + Llama, Ollama
Supports SFT through a simple API and storing the adapter in S3

Roadmap

Add generic Huggingface transformer interface
Add more finetuning strategies
Support Azure
Support GCP

Installation

# Make sure you have poetry and the respective libraries installed

poetry install
pip3 install install flash-attn==2.3.1.post1 --no-build-isolation
pip3 install "transformers[torch]"

Questions?

Create an issue or discussion in this repository.

Or, reach out to our team! @jakob_frick, @__anjor, @maxnajork on X or team@radiantai.com.

Contributing Guidelines

Thank you for your interest in contributing to our project! Before you begin writing code, it would be helpful if you read these contributing guidelines. Following them will make the contribution process easier and more efficient for everyone involved.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
.vscode		.vscode
app		app
libs		libs
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Radiant Model Runtime

Run any LLM in any environment. Supports CPU and GPU model runtimes.

Usage

API

Text generation

Embedding

Fine tuning

Features

Roadmap

Installation

Questions?

Contributing Guidelines

About

Releases

Packages

Contributors 2

Languages

License

deployradiant/model-runtime

Folders and files

Latest commit

History

Repository files navigation

Radiant Model Runtime

Run any LLM in any environment. Supports CPU and GPU model runtimes.

Usage

API

Text generation

Embedding

Fine tuning

Features

Roadmap

Installation

Questions?

Contributing Guidelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages