-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
41 changed files
with
1,305 additions
and
701 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
[role="xpack"] | ||
[[playground]] | ||
= Playground | ||
|
||
preview::[] | ||
|
||
// Variable (attribute) definition | ||
:x: Playground | ||
|
||
Use {x} to combine your Elasticsearch data with the power of large language models (LLMs) for retrieval augmented generation (RAG). | ||
The chat interface translates your natural language questions into {es} queries, retrieves the most relevant results from your {es} documents, and passes those documents to the LLM to generate tailored responses. | ||
|
||
Once you start chatting, use the UI to view and modify the Elasticsearch queries that search your data. | ||
You can also view the underlying Python code that powers the chat interface, and download this code to integrate into your own application. | ||
|
||
Learn how to get started on this page. | ||
Refer to the following for more advanced topics: | ||
|
||
* <<playground-context>> | ||
* <<playground-query>> | ||
* <<playground-troubleshooting>> | ||
|
||
[float] | ||
[[playground-how-it-works]] | ||
== How {x} works | ||
|
||
Here's a simpified overview of how {x} works: | ||
|
||
* User *creates a connection* to LLM provider | ||
* User *selects a model* to use for generating responses | ||
* User *define the model's behavior and tone* with initial instructions | ||
** *Example*: "_You are a friendly assistant for question-answering tasks. Keep responses as clear and concise as possible._" | ||
* User *selects {es} indices* to search | ||
* User *enters a question* in the chat interface | ||
* {x} *autogenerates an {es} query* to retrieve relevant documents | ||
** User can *view and modify underlying {es} query* in the UI | ||
* {x} *auto-selects relevant fields* from retrieved documents to pass to the LLM | ||
** User can *edit fields targeted* | ||
* {x} passes *filtered documents* to the LLM | ||
** The LLM generates a response based on the original query, initial instructions, chat history, and {es} context | ||
* User can *view the Python code* that powers the chat interface | ||
** User can also *Download the code* to integrate into application | ||
|
||
[float] | ||
[[playground-availability-prerequisites]] | ||
== Availability and prerequisites | ||
|
||
For Elastic Cloud and self-managed deployments {x} is available in the *Search* space in {kib}, under *Content* > *{x}*. | ||
|
||
For Elastic Serverless, {x} is available in your {es} project UI. | ||
// TODO: Confirm URL path for Serverless | ||
|
||
To use {x}, you'll need the following: | ||
|
||
1. An Elastic *v8.14.0+* deployment or {es} *Serverless* project. (Start a https://cloud.elastic.co/registration[free trial]). | ||
2. At least one *{es} index* with documents to search. | ||
** See <<playground-getting-started-ingest, ingest data>> if you'd like to ingest sample data. | ||
3. An account with a *supported LLM provider*. {x} supports the following: | ||
+ | ||
[cols="2a,2a,1a", options="header"] | ||
|=== | ||
| Provider | Models | Notes | ||
|
||
| *Amazon Bedrock* | ||
a| | ||
* Anthropic: Claude 3 Sonnet | ||
* Anthropic: Claude 3 Haiku | ||
a| | ||
Does not currently support streaming. | ||
|
||
| *OpenAI* | ||
a| | ||
* GPT-3 turbo | ||
* GPT-4 turbo | ||
a| | ||
|
||
| *Azure OpenAI* | ||
a| | ||
* GPT-3 turbo | ||
* GPT-4 turbo | ||
a| | ||
|
||
|=== | ||
|
||
[float] | ||
[[playground-getting-started]] | ||
== Getting started | ||
|
||
[float] | ||
[[playground-getting-started-connect]] | ||
=== Connect to LLM provider | ||
|
||
To get started with {x}, you need to create a <<action-types,connector>> for your LLM provider. | ||
Follow these steps on the {x} landing page: | ||
|
||
. Under *Connect to LLM*, click *Create connector*. | ||
. Select your *LLM provider*. | ||
. *Name* your connector. | ||
. Select a *URL endpoint* (or use the default). | ||
. Enter *access credentials* for your LLM provider. | ||
|
||
[TIP] | ||
==== | ||
If you need to update a connector, or add a new one, click the wrench button (🔧) under *Model settings*. | ||
==== | ||
|
||
[float] | ||
[[playground-getting-started-ingest]] | ||
=== Ingest data (optional) | ||
|
||
_You can skip this step if you already have data in one or more {es} indices._ | ||
|
||
There are many options for ingesting data into {es}, including: | ||
|
||
* The {enterprise-search-ref}/crawler.html[Elastic crawler] for web content (*NOTE*: Not yet available in _Serverless_) | ||
* {enterprise-search-ref}/connectors.html[Elastic connectors] for data synced from third-party sources | ||
* The {es} {ref}/docs-bulk.html[Bulk API] for JSON documents | ||
+ | ||
.*Expand* for example | ||
[%collapsible] | ||
============== | ||
To add a few documents to an index called `books` run the following in Dev Tools Console: | ||
[source,console] | ||
---- | ||
POST /_bulk | ||
{ "index" : { "_index" : "books" } } | ||
{"name": "Snow Crash", "author": "Neal Stephenson", "release_date": "1992-06-01", "page_count": 470} | ||
{ "index" : { "_index" : "books" } } | ||
{"name": "Revelation Space", "author": "Alastair Reynolds", "release_date": "2000-03-15", "page_count": 585} | ||
{ "index" : { "_index" : "books" } } | ||
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328} | ||
{ "index" : { "_index" : "books" } } | ||
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227} | ||
{ "index" : { "_index" : "books" } } | ||
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268} | ||
{ "index" : { "_index" : "books" } } | ||
{"name": "The Handmaids Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311} | ||
---- | ||
============== | ||
|
||
We've also provided some Jupyter notebooks to easily ingest sample data into {es}. | ||
Find these in the https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/ingestion-and-chunking[elasticsearch-labs] repository. | ||
These notebooks use the official {es} Python client. | ||
// TODO: [The above link will be broken until https://github.com/elastic/elasticsearch-labs/pull/232 is merged] | ||
|
||
[float] | ||
[[playground-getting-started-index]] | ||
=== Select {es} indices | ||
|
||
Once you've connected to your LLM provider, it's time to choose the data you want to search. | ||
Follow the steps under *Select indices*: | ||
|
||
. Select one or more {es} indices under *Add index*. | ||
. Click *Start* to launch the chat interface. | ||
+ | ||
[.screenshot] | ||
image::select-indices.png[width=400] | ||
|
||
Learn more about the underlying {es} queries used to search your data in <<playground-query>>. | ||
|
||
[float] | ||
[[playground-getting-started-setup-chat]] | ||
=== Set up the chat interface | ||
|
||
You can start chatting with your data immediately, but you might want to tweak some defaults first. | ||
|
||
[.screenshot] | ||
image::chat-interface.png[] | ||
|
||
You can adjust the following under *Model settings*: | ||
|
||
* *Model*. The model used for generating responses. | ||
* *Instructions*. Also known as the _system prompt_, these initial instructions and guidelines define the behavior of the model throughout the conversation. Be *clear and specific* for best results. | ||
* *Include citations*. A toggle to include citations from the relevant {es} documents in responses. | ||
|
||
{x} also uses another LLM under the hood, to encode all previous questions and responses, and make them available to the main model. | ||
This ensures the model has "conversational memory". | ||
|
||
Under *Indices*, you can edit which {es} indices will be searched. | ||
This will affect the underlying {es} query. | ||
|
||
[TIP] | ||
==== | ||
Click *✨ Regenerate* to resend the last query to the model for a fresh response. | ||
Click *⟳ Clear chat* to clear chat history and start a new conversation. | ||
==== | ||
|
||
[float] | ||
[[playground-next-steps]] | ||
=== Next steps | ||
|
||
Once you've got {x} up and running, and you've tested out the chat interface, you might want to explore some more advanced topics: | ||
|
||
* <<playground-context>> | ||
* <<playground-query>> | ||
* <<playground-troubleshooting>> | ||
|
||
include::playground-context.asciidoc[] | ||
include::playground-query.asciidoc[] | ||
include::playground-troubleshooting.asciidoc[] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
[role="xpack"] | ||
[[playground-context]] | ||
== Optimize model context | ||
|
||
preview::[] | ||
|
||
// Variable (attribute) definition | ||
:x: Playground | ||
|
||
Context is the information you provide to the LLM, to optimize the relevance of your query results. | ||
Without additional context, an LLM will generate results solely based on its training data. | ||
In {x}, this additional context is the information contained in your {es} indices. | ||
|
||
There are a few ways to optimize this context for better results. | ||
Some adjustments can be made directly in the {x} UI. | ||
Others require refining your indexing strategy, and potentially reindexing your data. | ||
|
||
[float] | ||
[[playground-context-ui]] | ||
== Edit context in UI | ||
|
||
Use the *Edit context* button in the {x} UI to adjust the number of documents and fields sent to the LLM. | ||
|
||
If you're hitting context length limits, try the following: | ||
|
||
* Limit the number of documents retrieved | ||
* Pick a field with less tokens, reducing the context length | ||
|
||
[float] | ||
[[playground-context-index]] | ||
== Other context optimizations | ||
|
||
This section covers additional context optimizations that you won't be able to make directly in the UI. | ||
|
||
[float] | ||
[[playground-context-index-chunking]] | ||
=== Chunking large documents | ||
|
||
If you're working with large fields, you may need to adjust your indexing strategy. | ||
Consider breaking your documents into smaller chunks, such as sentences or paragraphs. | ||
|
||
If you don't yet have a chunking strategy, start by chunking your documents into passages. | ||
|
||
Otherwise, consider updating your chunking strategy, for example, from sentence based to paragraph based chunking. | ||
|
||
Refer to the following Python notebooks for examples of how to chunk your documents: | ||
|
||
* https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/ingestion-and-chunking/json-chunking-ingest.ipynb[JSON documents] | ||
* https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/ingestion-and-chunking/pdf-chunking-ingest.ipynb[PDF document] | ||
* https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/ingestion-and-chunking/website-chunking-ingest.ipynb[Website content] | ||
|
||
[float] | ||
[[playground-context-balance]] | ||
=== Balancing cost and latency | ||
|
||
Here are some general recommendations for balancing cost and latency with different context sizes: | ||
|
||
Optimize context length:: | ||
Determine the optimal context length through empirical testing. | ||
Start with a baseline and adjust incrementally to find a balance that optimizes both response quality and system performance. | ||
Implement token pruning for ELSER model:: | ||
If you're using our ELSER model, consider implementing token pruning to reduce the number of tokens sent to the model. | ||
Refer to these relevant blog posts: | ||
+ | ||
* https://www.elastic.co/search-labs/blog/introducing-elser-v2-part-2[Optimizing retrieval with ELSER v2] | ||
* https://www.elastic.co/search-labs/blog/text-expansion-pruning[Improving text expansion performance using token pruning] | ||
Monitor and adjust:: | ||
Continuously monitor the effects of context size changes on performance and adjust as necessary. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
[xpack] | ||
[[playground-query]] | ||
== View and modify queries | ||
|
||
:x: Playground | ||
|
||
preview::[] | ||
|
||
Once you've set up your chat interface, you can start chatting with the model. | ||
{x} will automatically generate {es} queries based on your questions, and retrieve the most relevant documents from your {es} indices. | ||
The {x} UI enables you to view and modify these queries. | ||
|
||
* Click *View query* to open the visual query editor. | ||
* Modify the query by selecting fields to query per index. | ||
* Click *Save changes*. | ||
|
||
[TIP] | ||
==== | ||
The `{query}` variable represents the user's question, rewritten as an {es} query. | ||
==== | ||
|
||
The following screenshot shows the query editor in the {x} UI. | ||
In this simple example, the `books` index has two fields: `author` and `name`. | ||
Selecting a field adds it to the `fields` array in the query. | ||
|
||
[.screenshot] | ||
image::images/edit-query.png[View and modify queries] | ||
|
||
[float] | ||
[[playground-query-relevance]] | ||
=== Improving relevance | ||
|
||
The fields you select in the query editor determine the relevance of the retrieved documents. | ||
|
||
Remember that the next step in the workflow is to send the retrieved documents to the LLM to answer the question. | ||
Context length is an important factor in ensuring the model has enough information to generate a relevant answer. | ||
Refer to <<playground-context, Optimize context>> for more information. | ||
|
||
<<playground-troubleshooting, Troubleshooting>> provides tips on how to diagnose and fix relevance issues. | ||
|
||
[.screenshot] | ||
|
||
|
||
|
||
[NOTE] | ||
==== | ||
{x} uses the {ref}/retriever.html[`retriever`] syntax for {es} queries. | ||
Retrievers make it easier to compose and test different retrieval strategies in your search pipelines. | ||
==== | ||
// TODO: uncomment and add to note once following page is live | ||
//Refer to {ref}/retrievers-overview.html[documentation] for a high level overview of retrievers. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
[role="xpack"] | ||
[[playground-troubleshooting]] | ||
== Troubleshooting | ||
|
||
preview::[] | ||
|
||
:x: Playground | ||
|
||
Dense vectors are not searchable:: | ||
Embeddings must be generated using the {ref}/inference-processor.html[inference processor] with an ML node. | ||
|
||
Context length error:: | ||
You'll need to adjust the size of the context you're sending to the model. | ||
Refer to <<playground-context>>. | ||
|
||
LLM credentials not working:: | ||
Under *Model settings*, use the wrench button (🔧) to edit your GenAI connector settings. | ||
|
||
Poor answer quality:: | ||
Check the retrieved documents to see if they are valid. | ||
Adjust your {es} queries to improve the relevance of the documents retrieved. Refer to <<playground-query>>. | ||
+ | ||
You can update the initial instructions to be more detailed. This is called 'prompt engineering'. Refer to this https://platform.openai.com/docs/guides/prompt-engineering[OpenAI guide] for more information. | ||
+ | ||
You might need to click *⟳ Clear chat* to clear chat history and start a new conversation. | ||
If you mix topics, the model will find it harder to generate relevant responses. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.