Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043

Merged
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
e627b85
[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwr…
sorenlouv Aug 22, 2024
ede9593
Fix issues
sorenlouv Aug 22, 2024
4f5cb94
Fix user instruction test
sorenlouv Aug 26, 2024
bafa18b
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
neptunian Aug 27, 2024
14854d2
Fix test
sorenlouv Aug 27, 2024
f6b8303
Cleanup
sorenlouv Aug 27, 2024
f88d826
Improve types
sorenlouv Aug 27, 2024
f42f1ec
i18n
sorenlouv Aug 27, 2024
b718ae5
Remove unused imports
sorenlouv Aug 28, 2024
59a9362
Re-add `ShortIdTable`
sorenlouv Aug 28, 2024
51cb688
Change `recall` to not return entries as nested object
sorenlouv Aug 28, 2024
51495d2
Rename `groupId` back to `docId` to reduce diff
sorenlouv Aug 28, 2024
5fdcc4d
Add test for shortIdTable
sorenlouv Aug 28, 2024
d470e39
Revert change to reduce diff
sorenlouv Aug 28, 2024
fb7370b
Keep original id
sorenlouv Aug 28, 2024
c63a8f9
Omit “User” suffix from API client methods
sorenlouv Aug 28, 2024
a6d55ed
Type fix
sorenlouv Aug 28, 2024
b3f7d3a
Fix categorization bug
sorenlouv Aug 28, 2024
b20bd1b
Revert changes to files
sorenlouv Aug 28, 2024
0f531e4
Add functional test
sorenlouv Aug 29, 2024
63b171d
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Aug 29, 2024
08aa578
i18n
sorenlouv Aug 29, 2024
ad425be
Revert tsconfig changes
sorenlouv Aug 29, 2024
6da17f3
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Aug 29, 2024
90b5484
Include KB entries when they contain contradicting info
sorenlouv Aug 29, 2024
923681c
[CI] Auto-commit changed files from 'node scripts/build_plugin_list_d…
kibanamachine Aug 29, 2024
03e8fe6
Fix tsc and jest
sorenlouv Aug 30, 2024
9b2cf9b
Improve functional test
sorenlouv Aug 30, 2024
5ea4ac2
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
sorenlouv Aug 31, 2024
91c84d2
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
sorenlouv Sep 2, 2024
b7d2f8e
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Oct 30, 2024
1d6beef
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Oct 31, 2024
fc0a970
Fix tsc
sorenlouv Oct 31, 2024
0aaf657
Fix serverless test
sorenlouv Oct 31, 2024
b0e2781
Remove `doc_id`
sorenlouv Nov 5, 2024
035a054
Remove queue logic
sorenlouv Nov 5, 2024
6091e81
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
sorenlouv Nov 5, 2024
996e3d8
Fall back to doc_id
sorenlouv Nov 5, 2024
94996c8
Remove log
sorenlouv Nov 5, 2024
e1ae033
Remove unused knowledge_base/knowledge_base.spec.ts
sorenlouv Nov 5, 2024
993745e
Fix lint issues
sorenlouv Nov 5, 2024
4022dab
Fix serverless test
sorenlouv Nov 5, 2024
3671085
Fix broken api test
sorenlouv Nov 5, 2024
5708359
fix tests
sorenlouv Nov 5, 2024
6e448b8
Fix test
sorenlouv Nov 6, 2024
3114e7d
Improve type
sorenlouv Nov 6, 2024
094f94c
Fix functional test
sorenlouv Nov 6, 2024
744a279
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
sorenlouv Nov 6, 2024
55e5800
Remove queue for task manager types
sorenlouv Nov 6, 2024
3f715be
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
sorenlouv Nov 6, 2024
efa7bf2
editorUser -> editor
sorenlouv Nov 7, 2024
732f4ae
Fix summarisation test
sorenlouv Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,10 @@ export type ConversationUpdateRequest = ConversationRequestBase & {

export interface KnowledgeBaseEntry {
'@timestamp': string;
id: string;
id: string; // unique ID
doc_id?: string; // human readable ID generated by the LLM and used by the LLM to lookup and update existing entries. TODO: rename `doc_id` to `lookup_id`
Copy link
Member Author

@sorenlouv sorenlouv Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is globally unique, doc_id is only unique per user. Multiple entries can be assigned the same doc_id if they are created for different users.

title?: string;
text: string;
doc_id: string;
confidence: 'low' | 'medium' | 'high';
is_correction: boolean;
type?: 'user_instruction' | 'contextual';
Expand All @@ -94,12 +95,12 @@ export interface KnowledgeBaseEntry {
}

export interface Instruction {
doc_id: string;
id: string;
text: string;
}

export interface AdHocInstruction {
doc_id?: string;
id?: string;
Copy link
Member Author

@sorenlouv sorenlouv Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc_id can be used by the LLM to lookup entries. I see no reason to expand that concept to instructions. instructions can still have pre-determined id's - they do not have to be UUIDs. See the lens docs for an example of this

text: string;
instruction_type: 'user_instruction' | 'application_instruction';
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@ const schema: RootSchema<RecallRanking> = {
},
};

export const RecallRankingEventType = 'observability_ai_assistant_recall_ranking';
export const recallRankingEventType = 'observability_ai_assistant_recall_ranking';

export const recallRankingEvent: EventTypeOpts<RecallRanking> = {
eventType: RecallRankingEventType,
eventType: recallRankingEventType,
schema,
};
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* 2.0.
*/

import { v4 } from 'uuid';
import { KnowledgeBaseType } from '../../common/types';
import type { FunctionRegistrationParameters } from '.';
import { KnowledgeBaseEntryRole } from '../../common';
Expand All @@ -31,7 +32,7 @@ export function registerSummarizationFunction({
id: {
type: 'string',
description:
'An id for the document. This should be a short human-readable keyword field with only alphabetic characters and underscores, that allow you to update it later.',
'A lookup id for the document. This should be a short human-readable keyword field with only alphabetic characters and underscores, that allow you to find and update it later.',
},
text: {
type: 'string',
Expand Down Expand Up @@ -62,16 +63,21 @@ export function registerSummarizationFunction({
],
},
},
(
{ arguments: { id, text, is_correction: isCorrection, confidence, public: isPublic } },
async (
{ arguments: { id: docId, text, is_correction: isCorrection, confidence, public: isPublic } },
signal
) => {
// The LLM should be able to update an existing entry by providing the same doc_id
// if no existing entry is found, we generate a uuid
const id = await client.getUuidFromDocId(docId);
Copy link
Member Author

@sorenlouv sorenlouv Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM will (blindly) suggest a doc_id, without any information about existing entries. With the doc_id we can retrieve the _id. It does work but I don't like it very much because the LLM does not consistently produce the same doc_id's even when it should.

A better approach might be to get rid of doc_id entirely. We already provide the LLM with relevant entries via recall. By improving the recall to also include contradicting entries (which I've done in this PR) the LLM should be able to get the _id for the existing entry and use that in order to update it.


return client
.addKnowledgeBaseEntry({
entry: {
doc_id: id,
id: id ?? v4(),
title: docId, // use doc_id as title for now
doc_id: docId,
role: KnowledgeBaseEntryRole.AssistantSummarization,
id,
text,
is_correction: isCorrection,
type: KnowledgeBaseType.Contextual,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ const chatCompleteBaseRt = t.type({
]),
instructions: t.array(
t.intersection([
t.partial({ doc_id: t.string }),
t.partial({ id: t.string }),
Copy link
Member Author

@sorenlouv sorenlouv Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still possible to overwrite existing instructions by specifying the id

t.type({
text: t.string,
instruction_type: t.union([
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
import { notImplemented } from '@hapi/boom';
import { nonEmptyStringRt, toBooleanRt } from '@kbn/io-ts-utils';
import * as t from 'io-ts';
import { v4 } from 'uuid';
import { FunctionDefinition } from '../../../common/functions/types';
import { KnowledgeBaseEntryRole } from '../../../common/types';
import { KnowledgeBaseEntryRole, KnowledgeBaseType } from '../../../common/types';
import type { RecalledEntry } from '../../service/knowledge_base_service';
import { getSystemMessageFromInstructions } from '../../service/util/get_system_message_from_instructions';
import { createObservabilityAIAssistantServerRoute } from '../create_observability_ai_assistant_server_route';
Expand Down Expand Up @@ -107,11 +108,10 @@ const functionSummariseRoute = createObservabilityAIAssistantServerRoute({
endpoint: 'POST /internal/observability_ai_assistant/functions/summarize',
params: t.type({
body: t.type({
id: t.string,
doc_id: t.string,
text: nonEmptyStringRt,
confidence: t.union([t.literal('low'), t.literal('medium'), t.literal('high')]),
is_correction: toBooleanRt,
type: t.union([t.literal('user_instruction'), t.literal('contextual')]),
public: toBooleanRt,
labels: t.record(t.string, t.string),
}),
Expand All @@ -127,22 +127,22 @@ const functionSummariseRoute = createObservabilityAIAssistantServerRoute({
}

const {
doc_id: docId,
confidence,
id,
is_correction: isCorrection,
type,
text,
public: isPublic,
labels,
} = resources.params.body;

const id = await client.getUuidFromDocId(docId);
return client.addKnowledgeBaseEntry({
entry: {
confidence,
id,
doc_id: id,
id: id ?? v4(),
doc_id: docId,
is_correction: isCorrection,
type,
type: KnowledgeBaseType.Contextual,
text,
public: isPublic,
labels,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,9 @@ const saveKnowledgeBaseUserInstruction = createObservabilityAIAssistantServerRou
}

const { id, text, public: isPublic } = resources.params.body;
return client.addKnowledgeBaseEntry({
return client.addUserInstruction({
entry: {
id,
doc_id: id,
text,
public: isPublic,
confidence: 'high',
Expand Down Expand Up @@ -158,9 +157,11 @@ const saveKnowledgeBaseEntry = createObservabilityAIAssistantServerRoute({
body: t.intersection([
t.type({
id: t.string,
title: t.string,
text: nonEmptyStringRt,
}),
t.partial({
doc_id: t.string,
confidence: t.union([t.literal('low'), t.literal('medium'), t.literal('high')]),
is_correction: toBooleanRt,
public: toBooleanRt,
Expand All @@ -185,6 +186,8 @@ const saveKnowledgeBaseEntry = createObservabilityAIAssistantServerRoute({

const {
id,
doc_id: docId,
title,
text,
public: isPublic,
confidence,
Expand All @@ -195,9 +198,10 @@ const saveKnowledgeBaseEntry = createObservabilityAIAssistantServerRoute({

return client.addKnowledgeBaseEntry({
entry: {
id,
text,
doc_id: id,
id,
doc_id: docId,
title,
confidence: confidence ?? 'high',
is_correction: isCorrection ?? false,
type: 'contextual',
Expand Down Expand Up @@ -235,10 +239,15 @@ const importKnowledgeBaseEntries = createObservabilityAIAssistantServerRoute({
params: t.type({
body: t.type({
entries: t.array(
t.type({
id: t.string,
text: nonEmptyStringRt,
})
t.intersection([
t.type({
id: t.string,
text: nonEmptyStringRt,
}),
t.partial({
title: t.string,
}),
])
),
}),
}),
Expand All @@ -252,18 +261,48 @@ const importKnowledgeBaseEntries = createObservabilityAIAssistantServerRoute({
throw notImplemented();
}

const entries = resources.params.body.entries.map((entry) => ({
doc_id: entry.id,
const formattedEntries = resources.params.body.entries.map((entry) => ({
id: entry.id,
title: entry.title,
text: entry.text,
confidence: 'high' as KnowledgeBaseEntry['confidence'],
is_correction: false,
type: 'contextual' as const,
public: true,
labels: {},
role: KnowledgeBaseEntryRole.UserEntry,
...entry,
}));

return await client.importKnowledgeBaseEntries({ entries });
return await client.importKnowledgeBaseEntries({ entries: formattedEntries });
},
});

const importKnowledgeBaseCategoryEntries = createObservabilityAIAssistantServerRoute({
endpoint: 'POST /internal/observability_ai_assistant/kb/entries/category/import',
params: t.type({
body: t.type({
category: t.string,
entries: t.array(
t.type({
id: t.string,
texts: t.array(t.string),
})
),
}),
}),
options: {
tags: ['access:ai_assistant'],
},
handler: async (resources): Promise<void> => {
const client = await resources.service.getClient({ request: resources.request });

if (!client) {
throw notImplemented();
}

const { entries, category } = resources.params.body;

return resources.service.addCategoryToKnowledgeBase(category, entries);
},
});

Expand All @@ -274,6 +313,7 @@ export const knowledgeBaseRoutes = {
...saveKnowledgeBaseUserInstruction,
...getKnowledgeBaseUserInstructions,
...importKnowledgeBaseEntries,
...importKnowledgeBaseCategoryEntries,
...saveKnowledgeBaseEntry,
...deleteKnowledgeBaseEntry,
};
Original file line number Diff line number Diff line change
Expand Up @@ -731,6 +731,40 @@ export class ObservabilityAIAssistantClient {
return this.dependencies.knowledgeBaseService.setup();
};

getUuidFromDocId = async (docId: string) => {
return this.dependencies.knowledgeBaseService.getUuidFromDocId({
namespace: this.dependencies.namespace,
user: this.dependencies.user,
docId,
});
};

addUserInstruction = async ({
entry,
}: {
entry: Omit<KnowledgeBaseEntry, '@timestamp'>;
}): Promise<void> => {
// for now we want to limit the number of user instructions to 1 per user
// if a user instruction already exists for the user, we get the id and update it
this.dependencies.logger.debug('Adding user instruction entry');
const existingId = await this.dependencies.knowledgeBaseService.getPersonalUserInstructionId({
isPublic: entry.public,
namespace: this.dependencies.namespace,
user: this.dependencies.user,
});

if (existingId) {
entry.id = existingId;
this.dependencies.logger.debug(`Updating user instruction with id "${existingId}"`);
}

return this.dependencies.knowledgeBaseService.addEntry({
namespace: this.dependencies.namespace,
user: this.dependencies.user,
entry,
});
};

addKnowledgeBaseEntry = async ({
entry,
}: {
Expand All @@ -753,7 +787,7 @@ export class ObservabilityAIAssistantClient {
document: { ...entry, '@timestamp': new Date().toISOString() },
}));

await this.dependencies.knowledgeBaseService.addEntries({ operations });
await this.dependencies.knowledgeBaseService.importEntries({ operations });
};

getKnowledgeBaseEntries = async ({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -326,35 +326,34 @@ export class ObservabilityAIAssistantService {
addToKnowledgeBaseQueue(entries: KnowledgeBaseEntryRequest[]): void {
this.init()
.then(() => {
this.kbService!.queue(
entries.flatMap((entry) => {
const entryWithSystemProperties = {
...entry,
'@timestamp': new Date().toISOString(),
doc_id: entry.id,
public: true,
confidence: 'high' as const,
type: 'contextual' as const,
is_correction: false,
labels: {
...entry.labels,
},
role: KnowledgeBaseEntryRole.Elastic,
};

const operations =
'texts' in entryWithSystemProperties
? splitKbText(entryWithSystemProperties)
: [
{
type: KnowledgeBaseEntryOperationType.Index,
document: entryWithSystemProperties,
},
];

return operations;
})
);
const operations = entries.flatMap((entry) => {
const entryWithSystemProperties = {
...entry,
doc_id: entry.id,
'@timestamp': new Date().toISOString(),
public: true,
confidence: 'high' as const,
type: 'contextual' as const,
is_correction: false,
labels: {
...entry.labels,
},
role: KnowledgeBaseEntryRole.Elastic,
};

return 'texts' in entryWithSystemProperties
? splitKbText(entryWithSystemProperties)
: [
{
type: KnowledgeBaseEntryOperationType.Index,
document: entryWithSystemProperties,
},
];
});

this.logger.debug(`Queuing ${operations.length} operations (${entries.length} entries)`);

this.kbService!.queue(operations);
})
.catch((error) => {
this.logger.error(
Expand Down
Loading