serverless infrastructure for multimodal indexing, retrieving and generation. integrate in one line of code. completely open source.
Companies rarely have data that lives in a single location. Not only does it span various location (S3, Snowflake, MongoDB, etc.) but the data also varies in modality (image, video, audio, text). NUX (New User Experience) is an open source developer framework for consolidating insights across the enterprise, and abstracts it down to 2 lines of code.
The guide below use nux's python client. For examples interfacing with the nux api directly, see examples.
Import and initialize the client
from nuxai import NUX
from pydantic import BaseModel
# init NUX client
nux = NUX("API-KEY")
# create your first collection
collection_id = nux.create_collection(namespace="files.resume")
Configure and initiate the indexing worker
# Index file urls, raw string, or byte objects.
index_id = client.index(["https://s3.us-east-2.amazonaws.com/resume.pdf"])
# check the status
index_id.status()
# which returns
{'UPLOADED': 1, 'PROCESSING': 0, 'READY': 0, 'ERROR': 0}
Retrieve results using KNN
# retrieve the results
results = client.search(query="What was Ethan's first job?")
Generate JSON output using context from KNN results
# specify json output
class UserModel(BaseModel):
name: str
age: int
# generate a response with context from results
generation = client.generate.openai.chat(
engine="gpt-3.5-turbo",
response_shape=UserModel,
context=f"Content from resume: {results}",
messages=[
{"role": "user", "content": query},
],
)
nux's architecture is divided into several components, each runs as seperate local web services for ease of use and deployment:
Handles the parsing of different file types to make them accessible for further processing.
FileType | Extensions |
---|---|
Image | jpg, png, etc. |
Document | pdf, docx, etc. |
Audio | mp3, wav, etc. |
Included parsers:
- Website Scraper: Web scraper with recursive
depth
specification. - Image: Object detection, OCR or generating embeddings.
- Text: Extracting raw text or metadata from files.
- Audio: Transcribing audio or generating embeddings.
- Video: Scene detection, object recognition and transcribing.
Provides a set of APIs for interacting with the framework, including:
- Index: Creating searchable indexes of the processed data.
- Chunk: Breaking down large texts or files into independent chunks.
- Retrieve: Query your storage engine of choice
- Generate: Generate output based on your LLM of choice.
- Integrations: 3rd party integrations for read and write support
For securely storing processed data and embeddings. All storage engines support hybrid search (BM25 & KNN).
- MongoDB (Cloud Only): For storing indexed data and metadata.
Handles the generation and fine-tuning of content based on the indexed data.
- Generate: For generating outputs that adheres to a JSON schema
- Embed: Generating embedding based on input
Future enhancements planned for OSS nux:
- CDC connection with databases, storage for real-time sync
- Fine-tuning support for BERT encoders and LoRa adapters.
- Integration with hybrid databases (Weaviate, Qdrant, and Redis).
- Multimodal querying & generation
- Kubernetes deployment options
- Additional integrations (Google Drive, Box, Dropbox, etc.).
- Support for more models (both embedding and LLMs).
- Evaluation tools for index, query, and generate processes.
- Learn to Rank (LTR) and re-ranking features.
For those interested in a fully managed hosting solution:
- Full Dashboard: Provision collections, A/B test queries, revision history, rollbacks and more
- Serverless: For hosting the indexing and querying jobs
- Monitoring: Full visibility into indexing, retrieval and generation performance
- Compliance Checks: Ensuring that your data handling meets regulatory standards (HIPAA, SOC-2, etc.)
- Support: 24 hour SLA
- User Management: Managed OAuth 2.0, SSO, and more
- Audit and Access History: Full data lineage and usage history
- Security: Private endpoint, network access, and more
- First 10 GB free
- $1.00 per GB per month with discounts for upfront commitments.