EmbeddedLLM

vllm Public Forked from vllm-project/vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

EmbeddedLLM/vllm’s past year of commit activity

Python 89 Apache-2.0 4,606 4 0 Updated Nov 8, 2024
Liger-Kernel Public Forked from linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training

EmbeddedLLM/Liger-Kernel’s past year of commit activity

Python 0 BSD-2-Clause 201 0 0 Updated Nov 8, 2024
axolotl-amd Public Forked from axolotl-ai-cloud/axolotl
Go ahead and axolotl questions

EmbeddedLLM/axolotl-amd’s past year of commit activity

Python 0 Apache-2.0 878 0 0 Updated Nov 7, 2024
infinity-executable Public Forked from michaelfeil/infinity
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.

EmbeddedLLM/infinity-executable’s past year of commit activity

Python 0 MIT 114 0 0 Updated Nov 7, 2024
skypilot Public Forked from skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

EmbeddedLLM/skypilot’s past year of commit activity

Python 0 Apache-2.0 518 0 0 Updated Nov 7, 2024
flash-attention-docker Public
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.

EmbeddedLLM/flash-attention-docker’s past year of commit activity

Shell 1 Apache-2.0 0 0 0 Updated Oct 26, 2024
flash-attention-rocm Public Forked from ROCm/flash-attention
ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.

EmbeddedLLM/flash-attention-rocm’s past year of commit activity

Python 0 BSD-3-Clause 1,328 0 0 Updated Oct 26, 2024
vllm-rocmfork Public Forked from ROCm/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs

EmbeddedLLM/vllm-rocmfork’s past year of commit activity

Python 0 Apache-2.0 4,606 0 0 Updated Oct 23, 2024
etalon Public Forked from project-etalon/etalon
LLM Serving Performance Evaluation Harness

EmbeddedLLM/etalon’s past year of commit activity

Python 0 Apache-2.0 5 0 0 Updated Oct 17, 2024
unstructured-python-client Public Forked from Unstructured-IO/unstructured-python-client
A Python client for the Unstructured hosted API

EmbeddedLLM/unstructured-python-client’s past year of commit activity

Python 0 MIT 16 0 1 Updated Oct 14, 2024

View all repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EmbeddedLLM

Pinned Loading

Repositories

People

Top languages

Most used topics