PGVecto.rs support for Python

PGVecto.rs Python library, supports Django, SQLAlchemy, and Psycopg 3.

	Vector	Sparse Vector	Half-Precision Vector	Binary Vector
SQLAlchemy	✅Insert	✅Insert	✅Insert	✅Insert
Psycopg3	✅Insert ✅Copy	✅Insert ✅Copy	✅Insert ✅Copy	✅Insert ✅Copy
Django	✅Insert	✅Insert	✅Insert	✅Insert

Usage

Install from PyPI:

pip install pgvecto_rs

And use it with your database library:

SQLAlchemy
Psycopg3
Django

Or as a standalone SDK:

usage of SDK

Requirements

To initialize a pgvecto.rs instance, you can run our official image by Quick start:

You can get the latest tags from the Release page. For example, it might be:

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.3.0

SQLAlchemy

Install dependencies:

pip install "pgvecto_rs[sqlalchemy]"

Initialize a connection

from sqlalchemy import create_engine
from sqlalchemy.orm import Session

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
engine = create_engine(URL)
with Session(engine) as session:
    pass

Enable the extension

from sqlalchemy import text

session.execute(text('CREATE EXTENSION IF NOT EXISTS vectors'))

Create a model

from pgvecto_rs.sqlalchemy import Vector

class Item(Base):
    embedding = mapped_column(Vector(3))

All supported types are shown in this table

Native types	Types for SQLAlchemy	Correspond to pgvector-python
vector	VECTOR	VECTOR
svector	SVECTOR	SPARSEVEC
vecf16	VECF16	HALFVEC
bvector	BVECTOR	BIT

Insert a vector

from sqlalchemy import insert

stmt = insert(Item).values(embedding=[1, 2, 3])
session.execute(stmt)
session.commit()

Add an approximate index

from sqlalchemy import Index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf

index = Index(
    "emb_idx_1",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Ivf(), threads=1).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# or
index = Index(
    "emb_idx_2",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Hnsw()).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# Apply changes
index.create(session.bind)

Get the nearest neighbors to a vector

from sqlalchemy import select

session.scalars(select(Item.embedding).order_by(Item.embedding.l2_distance(target.embedding)))

Also supports max_inner_product, cosine_distance and jaccard_distance(for BVECTOR)

Get items within a certain distance

session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))

See examples/sqlalchemy_example.py and tests/test_sqlalchemy.py for more examples

Psycopg3

Install dependencies:

pip install "pgvecto_rs[psycopg3]"

Initialize a connection

import psycopg

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
with psycopg.connect(URL) as conn:
    pass

Enable the extension and register vector types

from pgvecto_rs.psycopg import register_vector

conn.execute('CREATE EXTENSION IF NOT EXISTS vectors')
register_vector(conn)
# or asynchronously
# await register_vector_async(conn)

Create a table

conn.execute('CREATE TABLE items (embedding vector(3))')

Insert or copy vectors into table

conn.execute('INSERT INTO items (embedding) VALUES (%s)', ([1, 2, 3],))
# or faster, copy it
with conn.cursor() as cursor, cursor.copy(
    "COPY items (embedding) FROM STDIN (FORMAT BINARY)"
) as copy:
    copy.write_row([np.array([1, 2, 3])])

Add an approximate index

from pgvecto_rs.types import IndexOption, Hnsw, Ivf

conn.execute(
    "CREATE INDEX emb_idx_1 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Hnsw(), threads=1).dumps()
    ),
)
# or
conn.execute(
    "CREATE INDEX emb_idx_2 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Ivf()).dumps()
    ),
)
# Apply all changes
conn.commit()

Get the nearest neighbors to a vector

conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()

Get the distance

conn.execute('SELECT embedding <-> %s FROM items \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

Get items within a certain distance

conn.execute('SELECT * FROM items WHERE embedding <-> %s < 1.0 \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

See examples/psycopg_example.py and tests/test_psycopg.py for more examples

Django

Install dependencies:

pip install "pgvecto_rs[django]"

Create a migration to enable the extension

from pgvecto_rs.django import VectorExtension

class Migration(migrations.Migration):
    operations = [
        VectorExtension()
    ]

Add a vector field to your model

from pgvecto_rs.django import VectorField

class Document(models.Model):
    embedding = VectorField(dimensions=3)

All supported types are shown in this table

Native types	Types for Django	Correspond to pgvector-python
vector	VectorField	VectorField
svector	SparseVectorField	SparseVectorField
vecf16	Float16VectorField	HalfVectorField
bvector	BinaryVectorField	BitField

Insert a vector

Item(embedding=[1, 2, 3]).save()

Add an approximate index

from django.db import models
from pgvecto_rs.django import HnswIndex, IvfIndex
from pgvecto_rs.types import IndexOption, Hnsw


class Item(models.Model):
    class Meta:
        indexes = [
            HnswIndex(
                name="emb_idx_1",
                fields=["embedding"],
                opclasses=["vector_l2_ops"],
                m=16,
                ef_construction=100,
                threads=1,
            )
            # or
            IvfIndex(
                name="emb_idx_2",
                fields=["embedding"],
                nlist=3,
                opclasses=["vector_l2_ops"],
            ),
        ]

Get the nearest neighbors to a vector

from pgvecto_rs.django import L2Distance

Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]

Also supports MaxInnerProduct, CosineDistance and JaccardDistance(for BinaryVectorField)

Get the distance

Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))

Get items within a certain distance

Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)

See examples/django_example.py and tests/test_django.py for more examples.

SDK

Our SDK is designed to use the pgvecto.rs out-of-box. You can exploit the power of pgvecto.rs to do similarity search or retrieve with filters, without writing any SQL code.

Install dependencies:

pip install "pgvecto_rs[sdk]"

A minimal example:

from pgvecto_rs.sdk import PGVectoRs, Record

# Create a client
client = PGVectoRs(
    db_url="postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres",
    collection_name="example",
    dimension=3,
)

try:
    # Add some records
    client.insert(
        [
            Record.from_text("hello 1", [1, 2, 3]),
            Record.from_text("hello 2", [1, 2, 4]),
        ]
    )

    # Search with default operator (sqrt_euclid).
    # The results is sorted by distance
    for rec, dis in client.search([1, 2, 5]):
        print(rec.text)
        print(dis)
finally:
    # Clean up (i.e. drop the table)
    client.drop()

Output:

hello 2
1.0
hello 1
4.0

See examples/sdk_example.py and tests/test_sdk.py for more examples.

Development

This package is managed by PDM.

Set up things:

pdm venv create
pdm use # select the venv inside the project path
pdm sync -d -G :all --no-isolation

# lock requirement
# need pdm >=2.17: https://pdm-project.org/latest/usage/lock-targets/#separate-lock-files-or-merge-into-one
pdm lock -d -G :all --python=">=3.9"
pdm lock -d -G :all --python="<3.9" --append
# install package to local
# `--no-isolation` is required for scipy
pdm install -d --no-isolation

Run lint:

pdm run format
pdm run fix
pdm run check

Run test in current environment:

pdm run test

Test

Tox is used to test the package locally.

Run test in all environment:

tox run

Acknowledgement

We would like to express our gratitude to the creators and contributors of the pgvector-python repository for their valuable code and architecture, which greatly influenced the development of this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
examples		examples
src/pgvecto_rs		src/pgvecto_rs
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PGVecto.rs support for Python

Usage

Requirements

SQLAlchemy

Psycopg3

Django

SDK

Development

Test

Acknowledgement

About

Releases 4

Packages

Contributors 3

Languages

License

tensorchord/pgvecto.rs-py

Folders and files

Latest commit

History

Repository files navigation

PGVecto.rs support for Python

Usage

Requirements

SQLAlchemy

Psycopg3

Django

SDK

Development

Test

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages