Update readme #357

nstogner · 2024-12-22T17:08:37Z

Remove references to Open Web UI (soon to be removed)
Update feature list formatting and wording
Simplify wording in sections
Get rid of star chart

samos123 · 2024-12-22T18:05:04Z

docs/README.md

+
+## Key Features
+
+🚀 **LLM Operator** - Manages vLLM and Ollama servers  


Maybe we add that we support VLMs too as part of this line.

samos123 · 2024-12-22T18:08:15Z

docs/README.md

+🖥 **Hardware Flexible** - Runs on CPU, GPU, or TPU  
+💾 **Efficient Caching** - Supports EFS, Filestore, and more  
+🎙️ **Speech Processing** - Transcribe audio via FasterWhisper  
+🔢 **Vector Operations** - Generate embeddings via Infinity  


Rename to Embedding Operator - manages Infinity servers.

I think Vector Operations isnt what I would think of when think of when looking for vector or embedding server.

samos123 · 2024-12-22T18:09:56Z

docs/README.md


 ## Local Quickstart

-
-<video controls src="https://github.com/user-attachments/assets/711d1279-6af9-4c6c-a052-e59e7730b757" width="800"></video>


We should have a video demo imo. So people can quickly see what they would get after installing.

I am hoping to get rid of OpenWebUI, which is showcased in the video

We can keep the video showcasing an example chatUI (e.g. OpenWebUI) even though KubeAI doesn't include installation of it. I prefer to keep the video until we have a new video though.

samos123 · 2024-12-23T03:20:36Z

docs/README.md

@@ -119,50 +117,13 @@ Now open your browser to [localhost:8000](http://localhost:8000) and select the

 If you go back to the browser and start a chat with Qwen2, you will notice that it will take a while to respond at first. This is because we set `minReplicas: 0` for this model and KubeAI needs to spin up a new Pod (you can verify with `kubectl get models -oyaml qwen2-500m-cpu`).

-## Documentation


I think we should highlight our full documentation before the Local Quickstart guide.

nstogner added 2 commits December 22, 2024 12:07

Update readme

4451e66

Clarify on Kubernetes

405d4a1

nstogner requested a review from samos123 December 22, 2024 17:10

samos123 reviewed Dec 22, 2024

View reviewed changes

samos123 reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update readme #357

Update readme #357

nstogner commented Dec 22, 2024

samos123 Dec 22, 2024

samos123 Dec 22, 2024

samos123 Dec 22, 2024

nstogner Dec 23, 2024

samos123 Dec 28, 2024

samos123 Dec 23, 2024


		## Key Features

		🚀 LLM Operator - Manages vLLM and Ollama servers


		## Local Quickstart


		<video controls src="https://github.com/user-attachments/assets/711d1279-6af9-4c6c-a052-e59e7730b757" width="800"></video>

		@@ -119,50 +117,13 @@ Now open your browser to [localhost:8000](http://localhost:8000) and select the

		If you go back to the browser and start a chat with Qwen2, you will notice that it will take a while to respond at first. This is because we set `minReplicas: 0` for this model and KubeAI needs to spin up a new Pod (you can verify with `kubectl get models -oyaml qwen2-500m-cpu`).

		## Documentation

Update readme #357

Are you sure you want to change the base?

Update readme #357

Conversation

nstogner commented Dec 22, 2024

samos123 Dec 22, 2024

Choose a reason for hiding this comment

samos123 Dec 22, 2024

Choose a reason for hiding this comment

samos123 Dec 22, 2024

Choose a reason for hiding this comment

nstogner Dec 23, 2024

Choose a reason for hiding this comment

samos123 Dec 28, 2024

Choose a reason for hiding this comment

samos123 Dec 23, 2024

Choose a reason for hiding this comment