EmbeddedLLM
Pinned Loading
Repositories
- infinity-executable Public Forked from michaelfeil/infinity
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
EmbeddedLLM/infinity-executable’s past year of commit activity - skypilot Public Forked from skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
EmbeddedLLM/skypilot’s past year of commit activity - flash-attention-docker Public
This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
EmbeddedLLM/flash-attention-docker’s past year of commit activity - flash-attention-rocm Public Forked from ROCm/flash-attention
ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
EmbeddedLLM/flash-attention-rocm’s past year of commit activity - vllm-rocmfork Public Forked from ROCm/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
EmbeddedLLM/vllm-rocmfork’s past year of commit activity - unstructured-python-client Public Forked from Unstructured-IO/unstructured-python-client
A Python client for the Unstructured hosted API
EmbeddedLLM/unstructured-python-client’s past year of commit activity