Fetch Surrounding Chunks Notebook (elastic#261)

* Fetch Surrounding Chunks commit of Fetch Surrounding Chunks python notebook * added pip install pandas added !pip install pandas * added pip install google.colab fixed issue during checks. installed google.colab * updated notebook to use api key updated notebook to use api key instead of username and password similar to notebook here: https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/00-quick-start.ipynb#scrollTo=f38e0397 * Updated Notebook Updated notebook to handle downloading required models such as elser and sentence transformer minilm * updated notebook var var chapter_number was not initialized. Fixed. * updated notebook for chapter_number bug chapter_number = None. forgot = sign * updated noted added es_model_id * updated notebook remove es_model_id as it is not needed. * var ini ini fetch-surrounding-chunks * dense_embedding_model_id dense_embedding_model_id was missing from query. renamed. * update for debug for debugging changed max_chapter_chunk_result * updated notebook added error handling * Ini Push removed fetch surrounding chunks from doc chunking folder and into supporting blog content
leemthompo · Jun 6, 2024 · befc009 · befc009
1 parent 7808847
commit befc009
Show file tree

Hide file tree

Showing 2 changed files with 1,327 additions and 0 deletions.
diff --git a/supporting-blog-content/fetch-surrounding-chunks/README.md b/supporting-blog-content/fetch-surrounding-chunks/README.md
@@ -0,0 +1,8 @@
+# Fetch Surrounding Chunks (N-1, N+1)
+
+This notebook is designed to handle the ingestion of book text (Harry Potter and the Sorcerer's Stone) into an Elasticsearch Cloud instance. It includes partitioning the book text into chapters and chunking the chapter text, which are then ingested into Elasticsearch. The setup utilizes a nested structure, and for each chunk, it stores dense and sparse (ELSER) vector representations along with the text representation.
+
+Searches are performed using dense vector comparisons, sparse vector comparisons, and text search in parallel to demonstrate the power of hybrid search strategies. Additionally, the notebook is configured to retrieve adjacent chunks (n-1 and n+1), allowing for a more contextual understanding of the search results.
+
+## Elasticsearch Version
+Versions of Elasticsearch `8.13` and `8.14` were tested with this notebook.  The notebook will not work with previous versions Elasticsearch