PGVecto.rs Python library, supports Django, SQLAlchemy, and Psycopg 3.
Vector | Sparse Vector | Half-Precision Vector | Binary Vector | |
---|---|---|---|---|
SQLAlchemy | âś…Insert | âś…Insert | âś…Insert | âś…Insert |
Psycopg3 | âś…Insert âś…Copy | âś…Insert âś…Copy | âś…Insert âś…Copy | âś…Insert âś…Copy |
Django | âś…Insert | âś…Insert | âś…Insert | âś…Insert |
Install from PyPI:
pip install pgvecto_rs
And use it with your database library:
Or as a standalone SDK:
To initialize a pgvecto.rs instance, you can run our official image by Quick start:
You can get the latest tags from the Release page. For example, it might be:
docker run \
--name pgvecto-rs-demo \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
-d tensorchord/pgvecto-rs:pg16-v0.3.0
Install dependencies:
pip install "pgvecto_rs[sqlalchemy]"
Initialize a connection
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
engine = create_engine(URL)
with Session(engine) as session:
pass
Enable the extension
from sqlalchemy import text
session.execute(text('CREATE EXTENSION IF NOT EXISTS vectors'))
Create a model
from pgvecto_rs.sqlalchemy import Vector
class Item(Base):
embedding = mapped_column(Vector(3))
All supported types are shown in this table
Native types | Types for SQLAlchemy | Correspond to pgvector-python |
---|---|---|
vector | VECTOR | VECTOR |
svector | SVECTOR | SPARSEVEC |
vecf16 | VECF16 | HALFVEC |
bvector | BVECTOR | BIT |
Insert a vector
from sqlalchemy import insert
stmt = insert(Item).values(embedding=[1, 2, 3])
session.execute(stmt)
session.commit()
Add an approximate index
from sqlalchemy import Index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf
index = Index(
"emb_idx_1",
Item.embedding,
postgresql_using="vectors",
postgresql_with={
"options": f"$${IndexOption(index=Ivf(), threads=1).dumps()}$$"
},
postgresql_ops={"embedding": "vector_l2_ops"},
)
# or
index = Index(
"emb_idx_2",
Item.embedding,
postgresql_using="vectors",
postgresql_with={
"options": f"$${IndexOption(index=Hnsw()).dumps()}$$"
},
postgresql_ops={"embedding": "vector_l2_ops"},
)
# Apply changes
index.create(session.bind)
Get the nearest neighbors to a vector
from sqlalchemy import select
session.scalars(select(Item.embedding).order_by(Item.embedding.l2_distance(target.embedding)))
Also supports max_inner_product
, cosine_distance
and jaccard_distance(for BVECTOR)
Get items within a certain distance
session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))
See examples/sqlalchemy_example.py and tests/test_sqlalchemy.py for more examples
Install dependencies:
pip install "pgvecto_rs[psycopg3]"
Initialize a connection
import psycopg
URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
with psycopg.connect(URL) as conn:
pass
Enable the extension and register vector types
from pgvecto_rs.psycopg import register_vector
conn.execute('CREATE EXTENSION IF NOT EXISTS vectors')
register_vector(conn)
# or asynchronously
# await register_vector_async(conn)
Create a table
conn.execute('CREATE TABLE items (embedding vector(3))')
Insert or copy vectors into table
conn.execute('INSERT INTO items (embedding) VALUES (%s)', ([1, 2, 3],))
# or faster, copy it
with conn.cursor() as cursor, cursor.copy(
"COPY items (embedding) FROM STDIN (FORMAT BINARY)"
) as copy:
copy.write_row([np.array([1, 2, 3])])
Add an approximate index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf
conn.execute(
"CREATE INDEX emb_idx_1 ON items USING \
vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
IndexOption(index=Hnsw(), threads=1).dumps()
),
)
# or
conn.execute(
"CREATE INDEX emb_idx_2 ON items USING \
vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
IndexOption(index=Ivf()).dumps()
),
)
# Apply all changes
conn.commit()
Get the nearest neighbors to a vector
conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()
Get the distance
conn.execute('SELECT embedding <-> %s FROM items \
ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()
Get items within a certain distance
conn.execute('SELECT * FROM items WHERE embedding <-> %s < 1.0 \
ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()
See examples/psycopg_example.py and tests/test_psycopg.py for more examples
Install dependencies:
pip install "pgvecto_rs[django]"
Create a migration to enable the extension
from pgvecto_rs.django import VectorExtension
class Migration(migrations.Migration):
operations = [
VectorExtension()
]
Add a vector field to your model
from pgvecto_rs.django import VectorField
class Document(models.Model):
embedding = VectorField(dimensions=3)
All supported types are shown in this table
Native types | Types for Django | Correspond to pgvector-python |
---|---|---|
vector | VectorField | VectorField |
svector | SparseVectorField | SparseVectorField |
vecf16 | Float16VectorField | HalfVectorField |
bvector | BinaryVectorField | BitField |
Insert a vector
Item(embedding=[1, 2, 3]).save()
Add an approximate index
from django.db import models
from pgvecto_rs.django import HnswIndex, IvfIndex
from pgvecto_rs.types import IndexOption, Hnsw
class Item(models.Model):
class Meta:
indexes = [
HnswIndex(
name="emb_idx_1",
fields=["embedding"],
opclasses=["vector_l2_ops"],
m=16,
ef_construction=100,
threads=1,
)
# or
IvfIndex(
name="emb_idx_2",
fields=["embedding"],
nlist=3,
opclasses=["vector_l2_ops"],
),
]
Get the nearest neighbors to a vector
from pgvecto_rs.django import L2Distance
Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]
Also supports MaxInnerProduct
, CosineDistance
and JaccardDistance(for BinaryVectorField)
Get the distance
Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))
Get items within a certain distance
Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)
See examples/django_example.py and tests/test_django.py for more examples.
Our SDK is designed to use the pgvecto.rs out-of-box. You can exploit the power of pgvecto.rs to do similarity search or retrieve with filters, without writing any SQL code.
Install dependencies:
pip install "pgvecto_rs[sdk]"
A minimal example:
from pgvecto_rs.sdk import PGVectoRs, Record
# Create a client
client = PGVectoRs(
db_url="postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres",
collection_name="example",
dimension=3,
)
try:
# Add some records
client.insert(
[
Record.from_text("hello 1", [1, 2, 3]),
Record.from_text("hello 2", [1, 2, 4]),
]
)
# Search with default operator (sqrt_euclid).
# The results is sorted by distance
for rec, dis in client.search([1, 2, 5]):
print(rec.text)
print(dis)
finally:
# Clean up (i.e. drop the table)
client.drop()
Output:
hello 2
1.0
hello 1
4.0
See examples/sdk_example.py and tests/test_sdk.py for more examples.
This package is managed by PDM.
Set up things:
pdm venv create
pdm use # select the venv inside the project path
pdm sync -d -G :all --no-isolation
# lock requirement
# need pdm >=2.17: https://pdm-project.org/latest/usage/lock-targets/#separate-lock-files-or-merge-into-one
pdm lock -d -G :all --python=">=3.9"
pdm lock -d -G :all --python="<3.9" --append
# install package to local
# `--no-isolation` is required for scipy
pdm install -d --no-isolation
Run lint:
pdm run format
pdm run fix
pdm run check
Run test in current environment:
pdm run test
Tox is used to test the package locally.
Run test in all environment:
tox run
We would like to express our gratitude to the creators and contributors of the pgvector-python repository for their valuable code and architecture, which greatly influenced the development of this repository.