-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043
[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
…iting accidentally
614ee57
to
e627b85
Compare
0cc07e8
to
a9ed9e9
Compare
I've not looked through the code so maybe you took this into account, but we also have the documents that we pre-load into the knowledge base. Those should not have dynamically generated uuids, but predetermined IDs. |
@@ -79,9 +79,10 @@ export type ConversationUpdateRequest = ConversationRequestBase & { | |||
|
|||
export interface KnowledgeBaseEntry { | |||
'@timestamp': string; | |||
id: string; | |||
id: string; // unique ID | |||
doc_id?: string; // human readable ID generated by the LLM and used by the LLM to lookup and update existing entries. TODO: rename `doc_id` to `lookup_id` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
is globally unique, doc_id
is only unique per user. Multiple entries can be assigned the same doc_id
if they are created for different users.
doc_id?: string; | ||
id?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc_id
can be used by the LLM to lookup entries. I see no reason to expand that concept to instructions. instructions can still have pre-determined id's - they do not have to be UUIDs. See the lens docs for an example of this
a9ed9e9
to
14854d2
Compare
@@ -42,7 +42,7 @@ const chatCompleteBaseRt = t.type({ | |||
]), | |||
instructions: t.array( | |||
t.intersection([ | |||
t.partial({ doc_id: t.string }), | |||
t.partial({ id: t.string }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still possible to overwrite existing instructions by specifying the id
keyword: { | ||
type: 'keyword', | ||
ignore_above: 256, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding nested keyword in order to be able to sort on it. Using nested keyword is recommended over fielddata
as it is more performant (should have been used for doc_id
as well).
this.dependencies.logger.debug( | ||
`Adding ${operations.length} operations to queue. Queue size now: ${this._queue.length})` | ||
); | ||
this._queue.push(...operations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaict we had a bug here before: By calling this._queue.push
conditionally we were not adding operations to the queue when isModelReady=true
. This meant that anything imported after the model had been setup was being dropped 😱
In general I hope we can get rid of the queue, or separate the queuing logic from the knowledge base. Having the queue embedded makes it more complex to work with the KB than it needs to be.
...s/observability_solution/observability_ai_assistant/server/utils/recall/score_suggestions.ts
Show resolved
Hide resolved
@dgieselaar Perhaps see this comment #191043 (comment) |
@@ -151,7 +151,6 @@ export default function ({ getService }: FtrProviderContext) { | |||
'fleet:update_agent_tags:retry', | |||
'fleet:upgrade_action:retry', | |||
'logs-data-telemetry', | |||
'observabilityAIAssistant:indexQueuedDocumentsTaskType', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task was removed. It is not longer needed
⏳ Build in-progress
History
|
Starting backport for target branches: 8.x https://github.com/elastic/kibana/actions/runs/11719652062 |
…iting accidentally (elastic#191043) Closes elastic#184069 **The Problem** The LLM decides the identifier (both `_id` and `doc_id`) for knowledge base entries. The `_id` must be globally unique in Elasticsearch but the LLM can easily pick the same id for different users thereby overwriting one users learning with another users learning. **Solution** The LLM should not pick the `_id`. With this PR a UUID is generated for new entries. This means the LLM will only be able to create new KB entries - it will not be able to update existing ones. `doc_id` has been removed, and replaced with a `title` property. Title is simply a human readable string - it is not used to identify KB entries. To retain backwards compatability, we will display the `doc_id` if `title` is not available --------- Co-authored-by: Sandra G <neptunian@users.noreply.github.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 7c92a10)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…overwriting accidentally (#191043) (#199263) # Backport This will backport the following commits from `main` to `8.x`: - [[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally (#191043)](#191043) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Søren Louv-Jansen","email":"soren.louv@elastic.co"},"sourceCommit":{"committedDate":"2024-11-07T08:55:34Z","message":"[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally (#191043)\n\nCloses https://github.com/elastic/kibana/issues/184069\r\n\r\n**The Problem**\r\nThe LLM decides the identifier (both `_id` and `doc_id`) for knowledge\r\nbase entries. The `_id` must be globally unique in Elasticsearch but the\r\nLLM can easily pick the same id for different users thereby overwriting\r\none users learning with another users learning.\r\n\r\n**Solution**\r\nThe LLM should not pick the `_id`. With this PR a UUID is generated for\r\nnew entries. This means the LLM will only be able to create new KB\r\nentries - it will not be able to update existing ones.\r\n\r\n`doc_id` has been removed, and replaced with a `title` property. Title\r\nis simply a human readable string - it is not used to identify KB\r\nentries.\r\nTo retain backwards compatability, we will display the `doc_id` if\r\n`title` is not available\r\n\r\n---------\r\n\r\nCo-authored-by: Sandra G <neptunian@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"7c92a10b324a8b1e10ae8924e5525b071b5c9797","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","v9.0.0","backport:prev-minor","Team:Obs AI Assistant","ci:project-deploy-observability"],"title":"[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally","number":191043,"url":"https://github.com/elastic/kibana/pull/191043","mergeCommit":{"message":"[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally (#191043)\n\nCloses https://github.com/elastic/kibana/issues/184069\r\n\r\n**The Problem**\r\nThe LLM decides the identifier (both `_id` and `doc_id`) for knowledge\r\nbase entries. The `_id` must be globally unique in Elasticsearch but the\r\nLLM can easily pick the same id for different users thereby overwriting\r\none users learning with another users learning.\r\n\r\n**Solution**\r\nThe LLM should not pick the `_id`. With this PR a UUID is generated for\r\nnew entries. This means the LLM will only be able to create new KB\r\nentries - it will not be able to update existing ones.\r\n\r\n`doc_id` has been removed, and replaced with a `title` property. Title\r\nis simply a human readable string - it is not used to identify KB\r\nentries.\r\nTo retain backwards compatability, we will display the `doc_id` if\r\n`title` is not available\r\n\r\n---------\r\n\r\nCo-authored-by: Sandra G <neptunian@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"7c92a10b324a8b1e10ae8924e5525b071b5c9797"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/191043","number":191043,"mergeCommit":{"message":"[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally (#191043)\n\nCloses https://github.com/elastic/kibana/issues/184069\r\n\r\n**The Problem**\r\nThe LLM decides the identifier (both `_id` and `doc_id`) for knowledge\r\nbase entries. The `_id` must be globally unique in Elasticsearch but the\r\nLLM can easily pick the same id for different users thereby overwriting\r\none users learning with another users learning.\r\n\r\n**Solution**\r\nThe LLM should not pick the `_id`. With this PR a UUID is generated for\r\nnew entries. This means the LLM will only be able to create new KB\r\nentries - it will not be able to update existing ones.\r\n\r\n`doc_id` has been removed, and replaced with a `title` property. Title\r\nis simply a human readable string - it is not used to identify KB\r\nentries.\r\nTo retain backwards compatability, we will display the `doc_id` if\r\n`title` is not available\r\n\r\n---------\r\n\r\nCo-authored-by: Sandra G <neptunian@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"7c92a10b324a8b1e10ae8924e5525b071b5c9797"}}]}] BACKPORT--> Co-authored-by: Søren Louv-Jansen <soren.louv@elastic.co>
Closes #184069
The Problem
The LLM decides the identifier (both
_id
anddoc_id
) for knowledge base entries. The_id
must be globally unique in Elasticsearch but the LLM can easily pick the same id for different users thereby overwriting one users learning with another users learning.Solution
The LLM should not pick the
_id
. With this PR a UUID is generated for new entries. This means the LLM will only be able to create new KB entries - it will not be able to update existing ones.doc_id
has been removed, and replaced with atitle
property. Title is simply a human readable string - it is not used to identify KB entries.To retain backwards compatability, we will display the
doc_id
iftitle
is not available