Skip to content

Commit

Permalink
Merge branch 'main' into ad_tour
Browse files Browse the repository at this point in the history
  • Loading branch information
kibanamachine authored May 8, 2024
2 parents 2e49ea4 + 0833045 commit 899736e
Show file tree
Hide file tree
Showing 41 changed files with 1,305 additions and 701 deletions.
Binary file added docs/playground/images/chat-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/playground/images/edit-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/playground/images/select-indices.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
203 changes: 203 additions & 0 deletions docs/playground/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
[role="xpack"]
[[playground]]
= Playground

preview::[]

// Variable (attribute) definition
:x: Playground

Use {x} to combine your Elasticsearch data with the power of large language models (LLMs) for retrieval augmented generation (RAG).
The chat interface translates your natural language questions into {es} queries, retrieves the most relevant results from your {es} documents, and passes those documents to the LLM to generate tailored responses.

Once you start chatting, use the UI to view and modify the Elasticsearch queries that search your data.
You can also view the underlying Python code that powers the chat interface, and download this code to integrate into your own application.

Learn how to get started on this page.
Refer to the following for more advanced topics:

* <<playground-context>>
* <<playground-query>>
* <<playground-troubleshooting>>

[float]
[[playground-how-it-works]]
== How {x} works

Here's a simpified overview of how {x} works:

* User *creates a connection* to LLM provider
* User *selects a model* to use for generating responses
* User *define the model's behavior and tone* with initial instructions
** *Example*: "_You are a friendly assistant for question-answering tasks. Keep responses as clear and concise as possible._"
* User *selects {es} indices* to search
* User *enters a question* in the chat interface
* {x} *autogenerates an {es} query* to retrieve relevant documents
** User can *view and modify underlying {es} query* in the UI
* {x} *auto-selects relevant fields* from retrieved documents to pass to the LLM
** User can *edit fields targeted*
* {x} passes *filtered documents* to the LLM
** The LLM generates a response based on the original query, initial instructions, chat history, and {es} context
* User can *view the Python code* that powers the chat interface
** User can also *Download the code* to integrate into application

[float]
[[playground-availability-prerequisites]]
== Availability and prerequisites

For Elastic Cloud and self-managed deployments {x} is available in the *Search* space in {kib}, under *Content* > *{x}*.

For Elastic Serverless, {x} is available in your {es} project UI.
// TODO: Confirm URL path for Serverless

To use {x}, you'll need the following:

1. An Elastic *v8.14.0+* deployment or {es} *Serverless* project. (Start a https://cloud.elastic.co/registration[free trial]).
2. At least one *{es} index* with documents to search.
** See <<playground-getting-started-ingest, ingest data>> if you'd like to ingest sample data.
3. An account with a *supported LLM provider*. {x} supports the following:
+
[cols="2a,2a,1a", options="header"]
|===
| Provider | Models | Notes

| *Amazon Bedrock*
a|
* Anthropic: Claude 3 Sonnet
* Anthropic: Claude 3 Haiku
a|
Does not currently support streaming.

| *OpenAI*
a|
* GPT-3 turbo
* GPT-4 turbo
a|

| *Azure OpenAI*
a|
* GPT-3 turbo
* GPT-4 turbo
a|

|===

[float]
[[playground-getting-started]]
== Getting started

[float]
[[playground-getting-started-connect]]
=== Connect to LLM provider

To get started with {x}, you need to create a <<action-types,connector>> for your LLM provider.
Follow these steps on the {x} landing page:

. Under *Connect to LLM*, click *Create connector*.
. Select your *LLM provider*.
. *Name* your connector.
. Select a *URL endpoint* (or use the default).
. Enter *access credentials* for your LLM provider.

[TIP]
====
If you need to update a connector, or add a new one, click the wrench button (🔧) under *Model settings*.
====

[float]
[[playground-getting-started-ingest]]
=== Ingest data (optional)

_You can skip this step if you already have data in one or more {es} indices._

There are many options for ingesting data into {es}, including:

* The {enterprise-search-ref}/crawler.html[Elastic crawler] for web content (*NOTE*: Not yet available in _Serverless_)
* {enterprise-search-ref}/connectors.html[Elastic connectors] for data synced from third-party sources
* The {es} {ref}/docs-bulk.html[Bulk API] for JSON documents
+
.*Expand* for example
[%collapsible]
==============
To add a few documents to an index called `books` run the following in Dev Tools Console:
[source,console]
----
POST /_bulk
{ "index" : { "_index" : "books" } }
{"name": "Snow Crash", "author": "Neal Stephenson", "release_date": "1992-06-01", "page_count": 470}
{ "index" : { "_index" : "books" } }
{"name": "Revelation Space", "author": "Alastair Reynolds", "release_date": "2000-03-15", "page_count": 585}
{ "index" : { "_index" : "books" } }
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328}
{ "index" : { "_index" : "books" } }
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227}
{ "index" : { "_index" : "books" } }
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268}
{ "index" : { "_index" : "books" } }
{"name": "The Handmaids Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311}
----
==============

We've also provided some Jupyter notebooks to easily ingest sample data into {es}.
Find these in the https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/ingestion-and-chunking[elasticsearch-labs] repository.
These notebooks use the official {es} Python client.
// TODO: [The above link will be broken until https://github.com/elastic/elasticsearch-labs/pull/232 is merged]

[float]
[[playground-getting-started-index]]
=== Select {es} indices

Once you've connected to your LLM provider, it's time to choose the data you want to search.
Follow the steps under *Select indices*:

. Select one or more {es} indices under *Add index*.
. Click *Start* to launch the chat interface.
+
[.screenshot]
image::select-indices.png[width=400]

Learn more about the underlying {es} queries used to search your data in <<playground-query>>.

[float]
[[playground-getting-started-setup-chat]]
=== Set up the chat interface

You can start chatting with your data immediately, but you might want to tweak some defaults first.

[.screenshot]
image::chat-interface.png[]

You can adjust the following under *Model settings*:

* *Model*. The model used for generating responses.
* *Instructions*. Also known as the _system prompt_, these initial instructions and guidelines define the behavior of the model throughout the conversation. Be *clear and specific* for best results.
* *Include citations*. A toggle to include citations from the relevant {es} documents in responses.

{x} also uses another LLM under the hood, to encode all previous questions and responses, and make them available to the main model.
This ensures the model has "conversational memory".

Under *Indices*, you can edit which {es} indices will be searched.
This will affect the underlying {es} query.

[TIP]
====
Click *✨ Regenerate* to resend the last query to the model for a fresh response.
Click *⟳ Clear chat* to clear chat history and start a new conversation.
====

[float]
[[playground-next-steps]]
=== Next steps

Once you've got {x} up and running, and you've tested out the chat interface, you might want to explore some more advanced topics:

* <<playground-context>>
* <<playground-query>>
* <<playground-troubleshooting>>

include::playground-context.asciidoc[]
include::playground-query.asciidoc[]
include::playground-troubleshooting.asciidoc[]

68 changes: 68 additions & 0 deletions docs/playground/playground-context.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
[role="xpack"]
[[playground-context]]
== Optimize model context

preview::[]

// Variable (attribute) definition
:x: Playground

Context is the information you provide to the LLM, to optimize the relevance of your query results.
Without additional context, an LLM will generate results solely based on its training data.
In {x}, this additional context is the information contained in your {es} indices.

There are a few ways to optimize this context for better results.
Some adjustments can be made directly in the {x} UI.
Others require refining your indexing strategy, and potentially reindexing your data.

[float]
[[playground-context-ui]]
== Edit context in UI

Use the *Edit context* button in the {x} UI to adjust the number of documents and fields sent to the LLM.

If you're hitting context length limits, try the following:

* Limit the number of documents retrieved
* Pick a field with less tokens, reducing the context length

[float]
[[playground-context-index]]
== Other context optimizations

This section covers additional context optimizations that you won't be able to make directly in the UI.

[float]
[[playground-context-index-chunking]]
=== Chunking large documents

If you're working with large fields, you may need to adjust your indexing strategy.
Consider breaking your documents into smaller chunks, such as sentences or paragraphs.

If you don't yet have a chunking strategy, start by chunking your documents into passages.

Otherwise, consider updating your chunking strategy, for example, from sentence based to paragraph based chunking.

Refer to the following Python notebooks for examples of how to chunk your documents:

* https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/ingestion-and-chunking/json-chunking-ingest.ipynb[JSON documents]
* https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/ingestion-and-chunking/pdf-chunking-ingest.ipynb[PDF document]
* https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/ingestion-and-chunking/website-chunking-ingest.ipynb[Website content]

[float]
[[playground-context-balance]]
=== Balancing cost and latency

Here are some general recommendations for balancing cost and latency with different context sizes:

Optimize context length::
Determine the optimal context length through empirical testing.
Start with a baseline and adjust incrementally to find a balance that optimizes both response quality and system performance.
Implement token pruning for ELSER model::
If you're using our ELSER model, consider implementing token pruning to reduce the number of tokens sent to the model.
Refer to these relevant blog posts:
+
* https://www.elastic.co/search-labs/blog/introducing-elser-v2-part-2[Optimizing retrieval with ELSER v2]
* https://www.elastic.co/search-labs/blog/text-expansion-pruning[Improving text expansion performance using token pruning]
Monitor and adjust::
Continuously monitor the effects of context size changes on performance and adjust as necessary.
51 changes: 51 additions & 0 deletions docs/playground/playground-query.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
[xpack]
[[playground-query]]
== View and modify queries

:x: Playground

preview::[]

Once you've set up your chat interface, you can start chatting with the model.
{x} will automatically generate {es} queries based on your questions, and retrieve the most relevant documents from your {es} indices.
The {x} UI enables you to view and modify these queries.

* Click *View query* to open the visual query editor.
* Modify the query by selecting fields to query per index.
* Click *Save changes*.

[TIP]
====
The `{query}` variable represents the user's question, rewritten as an {es} query.
====

The following screenshot shows the query editor in the {x} UI.
In this simple example, the `books` index has two fields: `author` and `name`.
Selecting a field adds it to the `fields` array in the query.

[.screenshot]
image::images/edit-query.png[View and modify queries]

[float]
[[playground-query-relevance]]
=== Improving relevance

The fields you select in the query editor determine the relevance of the retrieved documents.

Remember that the next step in the workflow is to send the retrieved documents to the LLM to answer the question.
Context length is an important factor in ensuring the model has enough information to generate a relevant answer.
Refer to <<playground-context, Optimize context>> for more information.

<<playground-troubleshooting, Troubleshooting>> provides tips on how to diagnose and fix relevance issues.

[.screenshot]



[NOTE]
====
{x} uses the {ref}/retriever.html[`retriever`] syntax for {es} queries.
Retrievers make it easier to compose and test different retrieval strategies in your search pipelines.
====
// TODO: uncomment and add to note once following page is live
//Refer to {ref}/retrievers-overview.html[documentation] for a high level overview of retrievers.
26 changes: 26 additions & 0 deletions docs/playground/playground-troubleshooting.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[role="xpack"]
[[playground-troubleshooting]]
== Troubleshooting

preview::[]

:x: Playground

Dense vectors are not searchable::
Embeddings must be generated using the {ref}/inference-processor.html[inference processor] with an ML node.

Context length error::
You'll need to adjust the size of the context you're sending to the model.
Refer to <<playground-context>>.

LLM credentials not working::
Under *Model settings*, use the wrench button (🔧) to edit your GenAI connector settings.

Poor answer quality::
Check the retrieved documents to see if they are valid.
Adjust your {es} queries to improve the relevance of the documents retrieved. Refer to <<playground-query>>.
+
You can update the initial instructions to be more detailed. This is called 'prompt engineering'. Refer to this https://platform.openai.com/docs/guides/prompt-engineering[OpenAI guide] for more information.
+
You might need to click *⟳ Clear chat* to clear chat history and start a new conversation.
If you mix topics, the model will find it harder to generate relevant responses.
7 changes: 1 addition & 6 deletions docs/redirects.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -432,9 +432,4 @@ This connector was renamed. Refer to <<openai-action-type>>.
== APIs

For the most up-to-date API details, refer to the
{kib-repo}/tree/{branch}/x-pack/plugins/alerting/docs/openapi[alerting], {kib-repo}/tree/{branch}/x-pack/plugins/cases/docs/openapi[cases], {kib-repo}/tree/{branch}/x-pack/plugins/actions/docs/openapi[connectors], and {kib-repo}/tree/{branch}/x-pack/plugins/ml/common/openapi[machine learning] open API specifications.

[role="exclude",id="playground"]
== Playground

Coming in 8.14.0.
{kib-repo}/tree/{branch}/x-pack/plugins/alerting/docs/openapi[alerting], {kib-repo}/tree/{branch}/x-pack/plugins/cases/docs/openapi[cases], {kib-repo}/tree/{branch}/x-pack/plugins/actions/docs/openapi[connectors], and {kib-repo}/tree/{branch}/x-pack/plugins/ml/common/openapi[machine learning] open API specifications.
2 changes: 2 additions & 0 deletions docs/user/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ include::alerting/index.asciidoc[]

include::{kibana-root}/docs/observability/index.asciidoc[]

include::{kibana-root}/docs/playground/index.asciidoc[]

include::{kibana-root}/docs/apm/index.asciidoc[]

include::{kibana-root}/docs/siem/index.asciidoc[]
Expand Down
Loading

0 comments on commit 899736e

Please sign in to comment.