Skip to content

Commit

Permalink
Fetch Surrounding Chunks Notebook (elastic#261)
Browse files Browse the repository at this point in the history
* Fetch Surrounding Chunks

commit of Fetch Surrounding Chunks python notebook

* added pip install pandas

added !pip install pandas

* added pip install google.colab

fixed issue during checks.  installed  google.colab

* updated notebook to use api key

updated notebook to use api key instead of username and password similar to notebook here: https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/00-quick-start.ipynb#scrollTo=f38e0397

* Updated Notebook

Updated notebook to handle downloading required models such as elser and  sentence transformer minilm

* updated notebook var

var chapter_number was not initialized. Fixed.

* updated notebook for chapter_number

bug chapter_number = None. forgot = sign

* updated noted

added es_model_id

* updated notebook

remove es_model_id as it is not needed.

* var ini

ini fetch-surrounding-chunks

* dense_embedding_model_id

dense_embedding_model_id was missing from query.  renamed.

* update for debug

for debugging changed max_chapter_chunk_result

* updated notebook

added error handling

* Ini Push

removed fetch surrounding chunks from doc chunking folder and into supporting blog content
  • Loading branch information
sunilemanjee authored Jun 6, 2024
1 parent 7808847 commit befc009
Show file tree
Hide file tree
Showing 2 changed files with 1,327 additions and 0 deletions.
8 changes: 8 additions & 0 deletions supporting-blog-content/fetch-surrounding-chunks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Fetch Surrounding Chunks (N-1, N+1)

This notebook is designed to handle the ingestion of book text (Harry Potter and the Sorcerer's Stone) into an Elasticsearch Cloud instance. It includes partitioning the book text into chapters and chunking the chapter text, which are then ingested into Elasticsearch. The setup utilizes a nested structure, and for each chunk, it stores dense and sparse (ELSER) vector representations along with the text representation.

Searches are performed using dense vector comparisons, sparse vector comparisons, and text search in parallel to demonstrate the power of hybrid search strategies. Additionally, the notebook is configured to retrieve adjacent chunks (n-1 and n+1), allowing for a more contextual understanding of the search results.

## Elasticsearch Version
Versions of Elasticsearch `8.13` and `8.14` were tested with this notebook. The notebook will not work with previous versions Elasticsearch
Loading

0 comments on commit befc009

Please sign in to comment.