opensearch-project · VachaShah · Jul 26, 2023 · Jul 24, 2023 · Jul 24, 2023 · Jul 24, 2023
@@ -18,6 +18,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 - Added support for latest OpenSearch versions 2.7.0, 2.8.0 ([#445](https://github.com/opensearch-project/opensearch-py/pull/445))
 - Added samples ([#447](https://github.com/opensearch-project/opensearch-py/pull/447))
 - Improved CI performance of integration with unreleased OpenSearch ([#318](https://github.com/opensearch-project/opensearch-py/pull/318))
+- Added k-NN guide and samples ([#449](https://github.com/opensearch-project/opensearch-py/pull/449))
 ### Changed
 - Moved security from `plugins` to `clients` ([#442](https://github.com/opensearch-project/opensearch-py/pull/442))
 ### Deprecated

@@ -26,11 +26,7 @@ Then import it like any other module:
 from opensearchpy import OpenSearch
 ```
 
-For better performance we recommend the async client. To add the async client to your project, install it using [pip](https://pip.pypa.io/):
-
-```bash
-pip install opensearch-py[async]
-```
+For better performance we recommend the async client. See [Asynchronous I/O](guides/async.md) for more information.
 
 In general, we recommend using a package manager, such as [poetry](https://python-poetry.org/docs/), for your projects. This is the package manager used for [samples](samples).
 
@@ -61,7 +57,7 @@ info = client.info()
 print(f"Welcome to {info['version']['distribution']} {info['version']['number']}!")
 ```
 
-See [hello.py](samples/hello/hello.py) for a working sample, and [guides/ssl](guides/ssl.md) for how to setup SSL certificates.
+See [hello.py](samples/hello/hello.py) for a working synchronous sample, and [guides/ssl](guides/ssl.md) for how to setup SSL certificates.
 
 ### Creating an Index
 
@@ -148,6 +144,7 @@ print(response)
 
 ## Advanced Features
 
+- [Asynchronous I/O](guides/async.md)
 - [Authentication (IAM, SigV4)](guides/auth.md)
 - [Configuring SSL](guides/ssl.md)
 - [Bulk Indexing](guides/bulk.md)
@@ -161,4 +158,5 @@ print(response)
 
 - [Security](guides/plugins/security.md) 
 - [Alerting](guides/plugins/alerting.md) 
-- [Index Management](guides/plugins/index_management.md) 
+- [Index Management](guides/plugins/index_management.md)
+- [k-NN](guides/plugins/knn.md)
@@ -0,0 +1,152 @@
+- [Asynchronous I/O](#asynchronous-io)
+  - [Setup](#setup)
+  - [Async Loop](#async-loop)
+  - [Connect to OpenSearch](#connect-to-opensearch)
+  - [Create an Index](#create-an-index)
+  - [Index Documents](#index-documents)
+  - [Refresh the Index](#refresh-the-index)
+  - [Search](#search)
+  - [Delete Documents](#delete-documents)
+  - [Delete the Index](#delete-the-index)
+
+# Asynchronous I/O
+
+This client supports asynchronous I/O that improves performance and increases throughput. See [hello-async.py](../samples/hello/hello-async.py) or [knn-async-basics.py](../samples/knn/knn-async-basics.py) for a working asynchronous sample.
+
+## Setup
+
+To add the async client to your project, install it using [pip](https://pip.pypa.io/):
+
+```bash
+pip install opensearch-py[async]
+```
+
+In general, we recommend using a package manager, such as [poetry](https://python-poetry.org/docs/), for your projects. This is the package manager used for [samples](../samples). The following example includes `opensearch-py[async]` in `pyproject.toml`.
+
+```toml
+[tool.poetry.dependencies]
+opensearch-py = { path = "../", extras=["async"] }
+```
+
+## Async Loop
+
+```python
+import asyncio
+
+async def main():
+    client = AsyncOpenSearch(...)
+    try:
+        # your code here
+    finally:
+        client.close()
+
+if __name__ == "__main__":
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+    loop.run_until_complete(main())
+    loop.close()
+```
+
+## Connect to OpenSearch
+
+```python
+host = 'localhost'
+port = 9200
+auth = ('admin', 'admin') # For testing only. Don't store credentials in code.
+
+client = AsyncOpenSearch(
+    hosts = [{'host': host, 'port': port}],
+    http_auth = auth,
+    use_ssl = True,
+    verify_certs = False,
+    ssl_show_warn = False
+)
+
+info = await client.info()
+print(f"Welcome to {info['version']['distribution']} {info['version']['number']}!")
+```
+
+## Create an Index
+
+```python
+index_name = 'test-index'
+
+index_body = {
+    'settings': {
+        'index': {
+            'number_of_shards': 4
+        }
+    }
+}
+
+if not await client.indices.exists(index=index_name):
+    await client.indices.create(
+        index_name, 
+        body=index_body
+    )
+```
+
+## Index Documents
+
+```python
+await asyncio.gather(*[
+    client.index(
+        index = index_name,
+        body = {
+            'title': f"Moneyball {i}",
+            'director': 'Bennett Miller',
+            'year': '2011'
+        },
+        id = i
+    ) for i in range(10)
+])
+```
+
+## Refresh the Index
+
+```python
+await client.indices.refresh(index=index_name)
+```
+
+## Search
+
+```python
+    q = 'miller'
+
+    query = {
+        'size': 5,
+        'query': {
+            'multi_match': {
+                'query': q,
+                'fields': ['title^2', 'director']
+            }
+        }
+    }
+
+    results = await client.search(
+        body = query,
+        index = index_name
+    )
+
+    for hit in results["hits"]["hits"]:
+      print(hit)
+```
+
+## Delete Documents
+
+```python
+await asyncio.gather(*[
+    client.delete(
+        index = index_name,
+        id = i
+    ) for i in range(10)
+])
+```
+
+## Delete the Index
+
+```python
+await client.indices.delete(
+    index = index_name
+)
+```
@@ -0,0 +1,117 @@
+- [k-NN Plugin](#k-nn-plugin)
+  - [Basic Approximate k-NN](#basic-approximate-k-nn)
+    - [Create an Index](#create-an-index)
+    - [Index Vectors](#index-vectors)
+    - [Search for Nearest Neighbors](#search-for-nearest-neighbors)
+  - [Approximate k-NN with a Boolean Filter](#approximate-k-nn-with-a-boolean-filter)
+  - [Approximate k-NN with a Lucene Filter](#approximate-k-nn-with-a-lucene-filter)
+
+# k-NN Plugin
+
+Short for k-nearest neighbors, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. See [documentation](https://opensearch.org/docs/latest/search-plugins/knn/index/) for more information.
+
+## Basic Approximate k-NN
+
+In the following example we create a 5-dimensional k-NN index with random data. You can find a synchronous version of this working sample in [samples/knn/knn-basics.py](../../samples/knn/knn-basics.py) and an asynchronous one in [samples/knn/knn-async-basics.py](../../samples/knn/knn-async-basics.py).
+
+```bash
+$ poetry run knn/knn-basics.py
+
+Searching for [0.61, 0.05, 0.16, 0.75, 0.49] ...
+{'_index': 'my-index', '_id': '3', '_score': 0.9252405, '_source': {'values': [0.64, 0.3, 0.27, 0.68, 0.51]}}
+{'_index': 'my-index', '_id': '4', '_score': 0.802375, '_source': {'values': [0.49, 0.39, 0.21, 0.42, 0.42]}}
+{'_index': 'my-index', '_id': '8', '_score': 0.7826564, '_source': {'values': [0.33, 0.33, 0.42, 0.97, 0.56]}}
+```
+
+### Create an Index
+
+```python
+dimensions = 5
+client.indices.create(index_name, 
+    body={
+        "settings":{
+            "index.knn": True
+        },
+        "mappings":{
+            "properties": {
+                "values": {
+                    "type": "knn_vector", 
+                    "dimension": dimensions
+                },
+            }
+        }
+    }
+)
+```
+
+### Index Vectors
+
+Create 10 random vectors and insert them using the bulk API.
+
+```python
+vectors = []
+for i in range(10):
+    vec = []
+    for j in range(dimensions): 
+        vec.append(round(random.uniform(0, 1), 2)) 
+
+    vectors.append({
+        "_index": index_name,
+        "_id": i,
+        "values": vec,
+    })
+
+helpers.bulk(client, vectors)
+
+client.indices.refresh(index=index_name)
+```
+
+### Search for Nearest Neighbors
+
+Create a random vector of the same size and search for its nearest neighbors.
+
+```python
+vec = []
+for j in range(dimensions): 
+    vec.append(round(random.uniform(0, 1), 2)) 
+
+search_query = {
+    "query": {
+        "knn": {
+            "values": {
+                "vector": vec, 
+                "k": 3
+            }
+        }
+    }
+}
+
+results = client.search(index=index_name, body=search_query)
+for hit in results["hits"]["hits"]:
+    print(hit)
+```
+
+## Approximate k-NN with a Boolean Filter
+
+In [the boolean-filter.py sample](../../samples/knn/knn-boolean-filter.py) we create a 5-dimensional k-NN index with random data and a `metadata` field that contains a book genre (e.g. `fiction`). The search query is a k-NN search filtered by genre. The filter clause is outside the k-NN query clause and is applied after the k-NN search.
+
+```bash
+$ poetry run knn/knn-boolean-filter.py 
+
+Searching for [0.08, 0.42, 0.04, 0.76, 0.41] with the 'romance' genre ...
+
+{'_index': 'my-index', '_id': '445', '_score': 0.95886475, '_source': {'values': [0.2, 0.54, 0.08, 0.87, 0.43], 'metadata': {'genre': 'romance'}}}
+{'_index': 'my-index', '_id': '2816', '_score': 0.95256233, '_source': {'values': [0.22, 0.36, 0.01, 0.75, 0.57], 'metadata': {'genre': 'romance'}}}
+```
+
+## Approximate k-NN with a Lucene Filter
+
+In [the lucene-filter.py sample](../../samples/knn/knn-lucene-filter.py) we implement the example in [the k-NN documentation](https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/), which creates an index that uses the Lucene engine and HNSW as the method in the mapping, containing hotel location and parking data, then search for the top three hotels near the location with the coordinates `[5, 4]` that are rated between 8 and 10, inclusive, and provide parking.
+
+```bash
+$ poetry run knn/knn-lucene-filter.py
+
+{'_index': 'hotels-index', '_id': '3', '_score': 0.72992706, '_source': {'location': [4.9, 3.4], 'parking': 'true', 'rating': 9}}
+{'_index': 'hotels-index', '_id': '6', '_score': 0.3012048, '_source': {'location': [6.4, 3.4], 'parking': 'true', 'rating': 9}}
+{'_index': 'hotels-index', '_id': '5', '_score': 0.24154587, '_source': {'location': [3.3, 4.5], 'parking': 'true', 'rating': 8}}
+```