Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 2.46 KB

README.md

File metadata and controls

66 lines (47 loc) · 2.46 KB

UForm for JavaScript

UForm multimodal AI SDK offers a simple way to integrate multimodal AI capabilities into your JavaScript applications. Built around ONNX, the SDK is supposed to work with most runtimes and almost any hardware.

Installation

There are several ways to install the UForm JavaScript SDK from NPM.

pnpm add uform 
npm add uform  
yarn add uform  

Quick Start

Embeddings

import { getModel, Modality, TextProcessor, TextEncoder, ImageEncoder, ImageProcessor } from '@unum-cloud/uform';

const { configPath, modalityPaths, tokenizerPath } = await getModel({
    modelId: 'unum-cloud/uform3-image-text-english-small',
    modalities: [Modality.TextEncoder, Modality.ImageEncoder],
    token: null, // Optional Hugging Face token for private models
    saveDir: null, // Optional directory to save the model to       
});

const textProcessor = new TextProcessor(configPath, tokenizerPath);
await textProcessor.init();
const processedTexts = await textProcessor.process("a small red panda in a zoo");

const textEncoder = new TextEncoder(modalityPaths.text_encoder, textProcessor);
await textEncoder.init();
const textOutput = await textEncoder.encode(processedTexts);
assert(textOutput.embeddings.dims.length === 2, "Output should be 2D");
await textEncoder.dispose();

const imageProcessor = new ImageProcessor(configPath);
await imageProcessor.init();
const processedImages = await imageProcessor.process("path/to/image.png");

const imageEncoder = new ImageEncoder(modalityPaths.image_encoder, imageProcessor);
await imageEncoder.init();
const imageOutput = await imageEncoder.encode(processedImages);
assert(imageOutput.embeddings.dims.length === 2, "Output should be 2D");

The textOutput and imageOutput would contain features and embeddings properties, which are the same as the features and embeddings properties in the Python SDK. The embeddings can later be compared using the cosine similarity or other distance metrics.

Generative Models

Coming soon ...

Technical Details

Faster Search

Depending on the application, the embeddings can be down-casted to smaller numeric representations without losing much recall. Independent of the quantization level, native JavaScript functionality may be too slow for large-scale search. In such cases, consider using USearch or SimSimD.