Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(participant): refactor chat participant state VSCODE-583 #810

Conversation

Anemy
Copy link
Member

@Anemy Anemy commented Sep 9, 2024

VSCODE-583

Previously we were storing a lot of data on the participant. This made it so that things like running generated code would reference the current state of the participant, irregardless of which chat and which chat message the runnable code is clicked in. While we could work around this in some cases, if we can avoid coupling app state with the chat messages it should let us be more flexible in our parsing and handling, while avoiding possible bugs with stale state.

This is a draft as I currently have all of the tests commented out, a lot of random todos, and there's still more work to do around chat history passing. We currently pass a lot of not useful messages from our chat history to the model's chat completion. We also want the user to be able to ask another query or change the database or collection requested when they're asking, this namespace parsing isn't very flexible yet.

src/participant/prompts/history.ts Outdated Show resolved Hide resolved
src/participant/prompts/history.ts Outdated Show resolved Hide resolved
src/participant/prompts/history.ts Outdated Show resolved Hide resolved
src/participant/prompts/history.ts Outdated Show resolved Hide resolved
src/participant/prompts/history.ts Outdated Show resolved Hide resolved
src/participant/prompts/history.ts Outdated Show resolved Hide resolved
@Anemy Anemy changed the title chore(participant): update namespace parsing to use history, remove state from chat participant VSCODE-583 WIP chore(participant): update namespace parsing to use history, remove state from chat participant VSCODE-583 Sep 11, 2024
Copy link
Contributor

@nirinchev nirinchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall restructuring seems fine to me and I like the removal of the state. I seem to have stumbled onto a corner-y case that resulted in an endless loop of prompts. Other than that, I'm not sure I'm a huge fan of the fact we're asking the model to extract the namespace every time, but I understand why we do it and don't have a brilliant idea for an alternative. I'll think about it a bit more and see if I come up with something.

src/participant/participant.ts Outdated Show resolved Hide resolved
src/participant/participant.ts Outdated Show resolved Hide resolved
src/participant/participant.ts Outdated Show resolved Hide resolved
src/participant/prompts/namespace.ts Show resolved Hide resolved
src/participant/participant.ts Outdated Show resolved Hide resolved
src/participant/participant.ts Outdated Show resolved Hide resolved
@Anemy Anemy changed the title WIP chore(participant): update namespace parsing to use history, remove state from chat participant VSCODE-583 chore(participant): update namespace parsing to use history, remove state from chat participant VSCODE-583 Sep 12, 2024
@Anemy Anemy requested a review from alenakhineika September 12, 2024 02:00
@Anemy Anemy marked this pull request as ready for review September 12, 2024 02:00
@alenakhineika
Copy link
Contributor

From the pr description:

We also want the user to be able to ask another query or change the database or collection requested when they're asking

This was already possible before this refactoring. You could ask the participant to change the database or collection name or update the query.

@Anemy
Copy link
Member Author

Anemy commented Sep 12, 2024

Going to remove the regex for the namespace in the chat history from this pr. We were using it as a fallback for when the model couldn't find the namespace in the chat history. Let's see how the model performs without it, and if needed we can create a ticket to add something like it back in.

@Anemy Anemy changed the title chore(participant): update namespace parsing to use history, remove state from chat participant VSCODE-583 chore(participant): remove state from chat participant VSCODE-583 Sep 12, 2024
@Anemy Anemy changed the title chore(participant): remove state from chat participant VSCODE-583 WIP chore(participant): remove state from chat participant VSCODE-583 Sep 12, 2024
@Anemy Anemy changed the title WIP chore(participant): remove state from chat participant VSCODE-583 chore(participant): remove state from chat participant VSCODE-583 Sep 12, 2024
@alenakhineika
Copy link
Contributor

I think the flow is now broken somehow. It doesn't understand when I select a collection name from the list and asks for it again. After I click fo the second time on the collection name, it asks for a database name.

flow

@Anemy
Copy link
Member Author

Anemy commented Sep 13, 2024

@alenakhineika That's from removing the namespace from history regex we added. The model does not do a good job of looking into previous messages to find the namespace. It mostly uses the last user prompt. We could do some other workarounds here, in the POC for this a while back I tried adding all of the user's messages from the chat history into one message to give more context. There's a bit of a tradeoff there, but it might end up giving us a better result for the namespace request.
I think the first step here is storing some chat state as a way to track the namespace overtime. Similar to what this pr is removing, however it would be per-chat rather than a singleton. I created a ticket describing it a more: https://jira.mongodb.org/browse/VSCODE-607

If it's alright with you and @nirinchev I'd like to move that to a separate pr. I've pushed to a few todos in the code to mention that.

@alenakhineika
Copy link
Contributor

The problem happens when you select the collection name for the first time. You connect, click on a database name, then click on a collection name and it doesn't go further, it keeps asking for a namespace and doesn't generate a query. Model is actually doing pretty well by looking at old messages. In the merged code we relied on the namespace in the state, now we can do the same with metadata in response history. There is probably some small issue that needs to be found. I don't think we can merge this as is since the query generation is not working.

@Anemy
Copy link
Member Author

Anemy commented Sep 13, 2024

@alenakhineika Looks like the prompt ordering in the namespace caused those issues. Can you see if it's now fixed? I had moved it to the 2nd to last message, and just moved it back to where it was previously, the first message in the chat we send to the model. 🤦

@alenakhineika
Copy link
Contributor

We shouldn't merge this pr as a temporary solution, and we better tackle all this with https://jira.mongodb.org/browse/VSCODE-607. It breaks the flow on multiple occasions and limits already merged functionality. Currently, you can not iterate on prompts:
collname
4docs
Gaurab and the docs team use the feature branch already, we shouldn't merge something that doesn't work well so they can't use it. Yes, I agree that the merged code works well only with one chat conversation and this should be fixed, but let's do it the right way without merging code that also does not cover all corer cases and breaks what is already working.

@alenakhineika
Copy link
Contributor

alenakhineika commented Sep 13, 2024

Let's use the vscode context storage to move these values from ParticipantController and store them per chat conversation:

_queryGenerationState?: QUERY_GENERATION_STATE;
_databaseName?: string;
_collectionName?: string;
_schema?: string;
_sampleDocuments?: Document[];

I agree that keeping _queryGenerationState may not be ideal, but if this helps to keep track of what is happening in the conversation, maybe this is not so bad. We could split this work into chunks/PRs:

  1. The part that you did to open a playground according to the generated code block
  2. Move the state from ParticipantController to the vscode context storage to have different states per chat conversation
  3. Play around how to get rid of _queryGenerationState (which can be done even in October)

@alenakhineika
Copy link
Contributor

alenakhineika commented Sep 14, 2024

Here I put together what I meant by using the last metadata as the sourse of the chat conversation state: #816

Let me know if this makes sense to you.

@Anemy Anemy changed the title chore(participant): remove state from chat participant VSCODE-583 chore(participant): refactor chat participant state VSCODE-583 Sep 15, 2024
@alenakhineika
Copy link
Contributor

The last changes look good 👍 Only the empty state should be fixed as we discussed in DMs.

Copy link
Contributor

@nirinchev nirinchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits from me. Unfortunately, wasn't able to finish it before dinner time, but will wrap up the review later today.

src/mdbExtensionController.ts Outdated Show resolved Hide resolved
src/mdbExtensionController.ts Outdated Show resolved Hide resolved
src/mdbExtensionController.ts Outdated Show resolved Hide resolved
src/mdbExtensionController.ts Outdated Show resolved Hide resolved
src/participant/constants.ts Outdated Show resolved Hide resolved
Comment on lines +342 to +343
let collectionName: string | undefined = _collectionName;
if (!collectionName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads awkwardly - what's the purpose of _collectionName here and why are we defining a new collectionName variable of the same type as the argument? It feels like I'm missing something, but this method would feel more natural if we had:

async selectCollectionWithParticipant({
  // ...
  collectionName
}): {
  // ...
}): Promise<boolean> {
  if (!collectionName) {
    // ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made them two variables since when the collection name isn't present we set it with the quick pick.

collectionName = await this._selectCollectionWithQuickPick(databaseName);

We could avoid the extra declaration, and set to the variable that's passed in. I didn't do that as I find setting to variables that are passed in arguments to also look a bit awkward. It wouldn't happen here, but I think it starts to bring in a pattern of possibly mutating the argument which we don't want. It would only be an issue with non-primitives, so again, not here.

Would you prefer if it's written as:

  async selectCollectionWithParticipant({
    chatId,
    databaseName,
    collectionName,
  }: {
    chatId: string;
    databaseName: string;
    collectionName?: string;
  }): Promise<boolean> {
    if (!collectionName) {
      collectionName = await this._selectCollectionWithQuickPick(databaseName);
      if (!collectionName) {
        return false;
      }
    }

I'm cool with either.

src/participant/participant.ts Show resolved Hide resolved
src/participant/participant.ts Outdated Show resolved Hide resolved
src/participant/participant.ts Outdated Show resolved Hide resolved
Comment on lines 248 to 250
void vscode.commands.executeCommand('workbench.action.chat.open', {
query: '@MongoDB /query',
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a helper for this with more strongly defined types? I can see we're using it in a bunch of places and it's useful to extract the knowledge of the command in a single place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I would prefer to keep it as is because it is transparent, and it is a common pattern in vscode to call such commands this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a helper function for this, with a comment as to why we do it. Doesn't do much on the typing side though. Were you thinking of having the helper do /query based on parameters? That's something we could do.

Copy link
Contributor

@alenakhineika alenakhineika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anemy
Copy link
Member Author

Anemy commented Sep 16, 2024

@nirinchev going to merge this in to unblock folks, it's been in flight a while and would cause a good amount of conflicts if we keep it unmerged. Please feel free to leave more comments, I'll see to them in a follow up.
Thanks for all the help on this one y'all!

@Anemy Anemy merged commit 764b8d1 into VSCODE-528-mongodb-copilot Sep 16, 2024
3 checks passed
@Anemy Anemy deleted the VSCODE-583-update-chat-participant-to-not-keep-state branch September 16, 2024 22:25
@nirinchev
Copy link
Contributor

Yeah, no worries, I didn't see anything majorly broken, so we can always fix these nits as drive-by's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants