Skip to content

Commit

Permalink
enhance: update float16/bfloat16 examples (#2388)
Browse files Browse the repository at this point in the history
In the Python ecosystem, users may use basic libraries such as numpy,
Pandas, TensorFlow, PyTorch... to process float16/bfloat16 vectors.
However, users may have float32 vectors and are not clear about
how to handle float16/bfloat16 vectors in pymilvus.

Currently, pymilvus supports numpy array as embedding vector inputs.
However, numpy itself does not support bfloat16 type.

This PR demonstrates the way of converting float arrays in insert/search
API.

**insert (accept numpy array as input)**:

- float32 vector (owned by users) -> float16 vector (input param of
insert API). numpy is enough, no more dependency.
- float32 vector (owned by users) -> bfloat16 vector (input param of
insert API). Depends on `tf.bfloat16`. Pytorch can not convert
`torch.bfloat16` to numpy array.

**search (the API returns bytes as float16/bfloat16 vector)**:

- float16 vector (bytes). User can convert it into numpy array, PyTorch
Tensor or TensorFlow Tensor.
- bfloat16 vector (bytes). User can convert it into PyTorch Tensor or
TensorFlow Tensor.

There are many deep learning platforms available in Python, and
we can't determine which ecosystem users want. Therefore, this PR
doesn't add the method for float vector conversion in pymilvus.

References:

- numpy/numpy#19808
- pytorch/pytorch#90574

issue: milvus-io/milvus#37448

Signed-off-by: Yinzuo Jiang <yinzuo.jiang@zilliz.com>
Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
  • Loading branch information
jiangyinzuo authored Dec 2, 2024
1 parent 5eca3e4 commit 9ebd67d
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 5 deletions.
12 changes: 10 additions & 2 deletions examples/datatypes/bfloat16_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import random
import numpy as np
import tensorflow as tf
import torch
from pymilvus import (
connections,
utility,
Expand All @@ -20,6 +21,11 @@ def gen_bf16_vectors(num, dim):
for _ in range(num):
raw_vector = [random.random() for _ in range(dim)]
raw_vectors.append(raw_vector)
# Numpy itself does not support bfloat16, use TensorFlow extension instead.
# PyTorch does not support converting bfloat16 vector to numpy array.
# See:
# - https://github.com/numpy/numpy/issues/19808
# - https://github.com/pytorch/pytorch/issues/90574
bf16_vector = tf.cast(raw_vector, dtype=tf.bfloat16).numpy()
bf16_vectors.append(bf16_vector)
return raw_vectors, bf16_vectors
Expand Down Expand Up @@ -57,8 +63,10 @@ def bf16_vector_search():
index_params={"index_type": index_type, "params": index_params, "metric_type": "L2"})
hello_milvus.load()
print("index_type = ", index_type)
res = hello_milvus.search(vectors[0:10], vector_field_name, {"metric_type": "L2"}, limit=1)
print(res)
res = hello_milvus.search(vectors[0:10], vector_field_name, {"metric_type": "L2"}, limit=1, output_fields=["bfloat16_vector"])
print("raw bytes: ", res[0][0].get("bfloat16_vector"))
print("tensorflow Tensor: ", tf.io.decode_raw(res[0][0].get("bfloat16_vector"), tf.bfloat16, little_endian=True))
print("pytorch Tensor: ", torch.frombuffer(res[0][0].get("bfloat16_vector"), dtype=torch.bfloat16))
hello_milvus.release()
hello_milvus.drop_index()

Expand Down
10 changes: 7 additions & 3 deletions examples/datatypes/float16_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,16 @@

default_fp16_index_params = [{"nlist": 128}]

# float16, little endian
fp16_little = np.dtype('e').newbyteorder('<')

def gen_fp16_vectors(num, dim):
raw_vectors = []
fp16_vectors = []
for _ in range(num):
raw_vector = [random.random() for _ in range(dim)]
raw_vectors.append(raw_vector)
fp16_vector = np.array(raw_vector, dtype=np.float16)
fp16_vector = np.array(raw_vector, dtype=fp16_little)
fp16_vectors.append(fp16_vector)
return raw_vectors, fp16_vectors

Expand Down Expand Up @@ -57,8 +60,9 @@ def fp16_vector_search():
index_params={"index_type": index_type, "params": index_params, "metric_type": "L2"})
hello_milvus.load()
print("index_type = ", index_type)
res = hello_milvus.search(vectors[0:10], vector_field_name, {"metric_type": "L2"}, limit=1)
print(res)
res = hello_milvus.search(vectors[0:10], vector_field_name, {"metric_type": "L2"}, limit=1, output_fields=["float16_vector"])
print("raw bytes: ", res[0][0].get("float16_vector"))
print("numpy ndarray: ", np.frombuffer(res[0][0].get("float16_vector"), dtype=fp16_little))
hello_milvus.release()
hello_milvus.drop_index()

Expand Down

0 comments on commit 9ebd67d

Please sign in to comment.