This is the ready-to-use demo repository to test Elasticsearch semantic search with embedded transformer using ingest pipeline.
- Data
- wikimedia enwiki 20221201 dump url
- Model
- sentence-transformers/msmarco-MiniLM-L-12-v3 link(Hugging Face)
This repository uses the softwares/tools/frameworkds below.
- docker
- docker-compose
- python (>3.10)
Run ./Es/setup.sh
to launch Elasticsearch, upload model, and configure the ingest pipeline.
Run ./indexer/setup.sh
to download, and index the data.
If you observe 429 error, reduce the batch size and please retry.
There is a GUI comparison tool under ./eval
directory.
Go ./eval
directory and run streamlit run main.py
to launch the comparison tool.
This repository enables Elasticsearch Trial License inorder to use ML node to run embedding transformer model in ingest pipeline.