UForm multimodal AI SDK offers a simple way to integrate multimodal AI capabilities into your JavaScript applications. Built around ONNX, the SDK is supposed to work with most runtimes and almost any hardware.
There are several ways to install the UForm JavaScript SDK from NPM.
pnpm add uform
npm add uform
yarn add uform
import { getModel, Modality, TextProcessor, TextEncoder, ImageEncoder, ImageProcessor } from '@unum-cloud/uform';
const { configPath, modalityPaths, tokenizerPath } = await getModel({
modelId: 'unum-cloud/uform3-image-text-english-small',
modalities: [Modality.TextEncoder, Modality.ImageEncoder],
token: null, // Optional Hugging Face token for private models
saveDir: null, // Optional directory to save the model to
});
const textProcessor = new TextProcessor(configPath, tokenizerPath);
await textProcessor.init();
const processedTexts = await textProcessor.process("a small red panda in a zoo");
const textEncoder = new TextEncoder(modalityPaths.text_encoder, textProcessor);
await textEncoder.init();
const textOutput = await textEncoder.encode(processedTexts);
assert(textOutput.embeddings.dims.length === 2, "Output should be 2D");
await textEncoder.dispose();
const imageProcessor = new ImageProcessor(configPath);
await imageProcessor.init();
const processedImages = await imageProcessor.process("path/to/image.png");
const imageEncoder = new ImageEncoder(modalityPaths.image_encoder, imageProcessor);
await imageEncoder.init();
const imageOutput = await imageEncoder.encode(processedImages);
assert(imageOutput.embeddings.dims.length === 2, "Output should be 2D");
The textOutput
and imageOutput
would contain features
and embeddings
properties, which are the same as the features
and embeddings
properties in the Python SDK.
The embeddings can later be compared using the cosine similarity or other distance metrics.
Coming soon ...
Depending on the application, the embeddings can be down-casted to smaller numeric representations without losing much recall. Independent of the quantization level, native JavaScript functionality may be too slow for large-scale search. In such cases, consider using USearch or SimSimD.