[WIP] Insert context using @ commands #174

Odie · 2024-07-27T15:02:18Z

Hi there!

What this PR does

I’ve started implementing a feature in continue.dev that I think is quite useful. It allows users to use @ commands to automatically include additional context from their project. For example, users can use @file to insert the entire contents of a file or @code to insert a specific function by name.

Here’s an example user message:

@file:lua/gp/completion.lua

Please explain this file briefly.

In this case, the contents of lua/gp/completion.lua would be prepended to the user message before being sent out.

Command completion

The @ commands are assembled with assistance from a custom completion source for ease of use. To try this PR out, please add “hrsh7th/nvim-cmp” as a plugin dependency:

{
    "robitx/gp.nvim",
    dependencies = { "hrsh7th/nvim-cmp" },
    -- additional config
}

The completion source is automatically attached to the chat buffer, so no additional configuration is required.

TODOs

Completion source for @file
Context insertion for @file commands
Generate an index for all function names in project
Completion source for @code
Context insertion for @code commands

This feature is still a work in progress, but the @file command is now functional. Your feedback is welcome!

Finally got menus to at least show up.

Also applies minor fix to gp.logger to respect log levels when it comes to sending notifications.

qaptoR · 2024-07-31T13:53:31Z

So I merged this into my fork on this branch:
https://github.com/qaptoR-nvim/gp.nvim/tree/insert_context

there were minor changes needed due to the recent restructure of the main repo. Also, I removed the (```) code braces around the file context so that it would be more generally usable for other types of contexts needed in the conversation, because I'm using this feature to selectively include specific rule's files for my personal ttrpg.

but I do think that feature would be useful as like an @codefile target, or something similar that does insert the code braces.

Robitx · 2024-07-31T14:33:07Z

Just to backup a thought (haven't looked over the implementation yet).

I'll start neovim from dir A, make a chat with relative references and everything works.
Then ( https://www.youtube.com/watch?v=CwBjoH_ZaNw ) I run new neovim instance from dir B, open the old chat and wish to continue the thread - will it work?

The same issue will arise for any future @contexts we'll add to chats, we'll have to store necessary data for reproducible runs from anywhere.

Odie · 2024-07-31T15:15:11Z

@qaptoR

Hey, thanks the the feedback. I’m still actively working on this feature. I’ll rebase on main when it’s ready to be looked at, probably in the next few days. I’m not sure if that makes it easier or harder to merge into your personal beach though. :(

Also, I removed the (```) code braces around the file context so that it would be more generally usable for other types of contexts needed in the conversation

The @file command actually inserts both a relative file path and the file content. I thought the triple backtick fence would make it clear to the LLM where the file content starts and ends. Does the triple backtick fence actually confuse the LLM for your use case? Do you mind sharing a concrete example of the problem you ran into?

If it really is a problem, I suppose, we can try something like @include to make it so it neither puts in the file path nor add the backticks.

qaptoR · 2024-07-31T15:30:31Z

@Odie the situation I was thinking ahead to regarding the backtick fence was if the included file was itself a markdown file with included triple backtick fence posts. So if the template inserts them, it will invert all of them.

instead, I think including the file path sets a clear demarcation for the LLM to understand that what follows is it's content, especially if newlines are used strategically, such as one newline between the filepath and the content and two or probably better three newlines between file contexts. The LLMs are pretty good at picking up patterns like that. I think two/three newlines before the actual message is also all that is necessary as well because it follows the same pattern.

Robitx · 2024-07-31T15:44:22Z

Checking for longest sequence of ` in the file and using N+1 as fence?

# 4 backticks

```python
def main():
    print("3 backticks")
```

# 4 backticks

Odie · 2024-07-31T15:53:23Z

I'll start neovim from dir A, make a chat with relative references and everything works.
Then ( https://www.youtube.com/watch?v=CwBjoH_ZaNw ) I run new neovim instance from dir B, open the old chat and wish to continue the thread - will it work?

Hmm! Are both dir A and dir B inside the same git repo? I had meant for the relative oaths to work from the project git root. Though the current implementation doesn’t quite do it yet. (It’s maybe one or two lines modification to make it so.) So, this will work at some point soon in the future as I finish up the first pass of this feature and tries to clean up some loose ends.

If what you’re describing is to actually carry the chat across to different repos… then, for sure it doesn’t work that way as implemented. At the moment, the requested contexts are parsed out of the last chat message (presumably from the user). The msg is augmented with the contexts right before it goes out on the wire. The current state of those requested contexts are never recorded anywhere. :(

If we want the chat log to be a definitive record of what exactly was exchanged between the user and the LLM, I guess we’ll have to try to insert the text into the chat buffer instead? The chat log size might explode if the user is repeatedly iterating on a single file though. It’ll also eat up the available context window rather quickly.

qaptoR · 2024-07-31T16:03:39Z

I think the files only need to be included where they are used. so if they are added early in the conversation then they should always be inserted early, it's all the same to the LLM I think, and then it keeps the entire conversation consistent as it progresses.

also, it solves the issue of having it included in the actual chat file. although I would appreciate the option to not insert the text directly into the chat file, as one reason that I like the system as it is, is because it keeps a clean chat history visually where it's more like each insertion from the chat file's perspective is like a reference to shared knowledge. Like a header file in c++, it helps keep things clean and organized

I plan on including tens of files in certain conversations to use with gpt-mini or claude haiku, basically selectively including different sets of what amounts to 100s of pages of rules for my ttrpg so that I can query for inconsistencies or new ideas.

Robitx · 2024-07-31T16:14:32Z

@Odie :GpChatFinder allows you to open old chats from anywhere, which means it should work when picked up again from anywhere. (I know it complicates things, sorry).

We have chat_dir where the chats are stored. One easy solution would be for a chat chat_dir/chatXY.md to have an artifact folder chat_dir/chatXY_artifacts/.. and have all the relevant data in there, without polluting the chat file itself.

If we wanted to complicate things further, we could remember from which dir was the artifact made and when generating new response first try to recreate a fresh instance of the artifact and if that failed fall for the the old instance backed up in the artifacts.

qaptoR · 2024-08-01T01:42:51Z

@Robitx I think an artifacts directory as a solution adds more complexity than is necessary, since it would have to be maintained that when x file is deleted, x artifact file is also deleted.

For the time being I think it should take advantage of the makrdown yaml header section key\value pairs, and every conversation should include a 'cwd' key with the path inserted when it is created. then, all of the relative paths should use this value instead of the vim cwd to remain consistent.

I think this approach accomplishes a few things:

simplicity: no added complexity of managing an artefacts directory is necessary
visibility: each conversation will have it's variables in use immediately available without having to look elsewhere
alterability: as an aside to visibility, easy access means updating the path if necessary after file creation is quick

ultimately, a header with even 20 or 40 key value pairs at the start of the conversation is prefereable to trackign down an artifact file, and scales relatively well for the time being because conversations typically outweigh the header in length, so scrolling is already greatly necessary.

it may even be that for a given chat file x, there is some data which is preferable to maintain as artifact file x, and other data which is beneficial to be quickly visible. But at least I would argue that for 'cwd' it deserves to be visible.

Robitx · 2024-08-01T05:49:34Z

@qaptoR You're right, that's why I put a the One in there One easy solution would be.. 🙂

I don't know yet, which would be better. For online LLM chats this is non issue, since they have to store uploaded artifacts anyway, but we do have a choice.

Just a thought to the solution using cwd header - I can easily imagine situations where single cwd won't be sufficient (for example user working on something across multiple repositories like a project using micro services, or just simply project + referencing some library).

Instead of putting cwd in header use [some_syntax_for_optional_cwd_prefix]@context_macro and hiding it with vim.fn.matchadd("Conceal", .. so it doesn't visibly pollute the chat buffer.

Odie · 2024-08-02T05:27:56Z

Hi all,

The new @code command is now working, though only for lua files.

Function name Indexing

When the user opens the chat buffer via GpChatNew or GpChatToggle inside a project for the first time, it'll index all lua files and the top level functions. At the moment, this is done synchronously. Indexing gp.nvim itself, takes about 70ms on my machine. We're also updating the index on a per-file basis whenever a file is saved. However, this only happens when the gp has been loaded. We're still missing a periodic update on the index, to catch cases where files are altered without gp's knowledge.

`@code` completion

When the user enters @code: in the chat buffer, a full list of known function names and their origin files are listed in the completion menu. Once the user chooses something, the @code command takes the form of @code:rel_path/to/file:full_fn_name.

When the chat msg is submitted, we grab the relevant lines from the src file using info in the index and insert them.

So, a message like this should now work:

@code:lua/gp/db.lua:Db.open

Explain this, please.

New dependency

The index is being kept in a sqlite database, so we now have one additional dependency:

{
    "robitx/gp.nvim",
    dependencies = { "hrsh7th/nvim-cmp", "kkharji/sqlite.lua" },
    -- additional config
}

TODOs

Automatic periodic force rebuild of fn index
Attach completion source on chat buffer only (instead of on *.md)
Expose index rebuild as a command
~~Look into building the index in the background or asynchronously~~
Add support for python files
Look into skipping files to index based on .gitignore (plenary seems to have some support for this?)
merge with main branch

We're also now indexing symbols of different types: function, classes, and class methods

The symbols table is also now defined through the sqlite.lua ORM syntax

We're also now discarding old src_files entries that no longer exist on disk

Odie · 2024-08-04T06:44:23Z

Hi all,

I think I've added all the features I set out to implement.

The latest commit is now depends on plenary to deal with gitignore.

The dependencies required now looks like:

{
    "robitx/gp.nvim",
    dependencies = { "hrsh7th/nvim-cmp", "kkharji/sqlite.lua", "nvim-lua/plenary.nvim" },
    -- additional config
}

I'm actually only using 1 utility function from plenary. So, if there are any objections to this, I can always just keep a local copy of that function instead.

Behaves exactly like @file, but does not inject the file name or a backtick fence around the contents

Odie · 2024-08-04T09:19:13Z

Hi all!

I merged main into the branch. Hopefully, this makes it easier to try out. Please do use it for a bit and let me know if you've run into any issues. I'm sure there are lots of rough edges that needs to be fixed.

Python support

Symbol indexing now grabs plain functions, class methods, and classes using treesitter. They should show up when using the @code command and correctly marked as their corresponding types.

Synchronous indexing

Async support is left undone at the moment. I looked every so briefly into indexing asynchrously, but didn't pursue it further as indexing seem "fast enough" for the small projects I'm trying it with. I'd really like to start using the plugin for a bit and discover other perhaps more urgent problems before tackling async support.

@qaptoR I've added an @include command. It'll insert the file requested by the command as is, without any backticks fences.

the cursor This simplifies sending the function under the cursor as the chat context.

qaptoR · 2024-08-08T17:42:59Z

@Odie there are two things I want to make clear 1) i appreciate all the work you've done on this, and 2) all the work involved with the sqlite db is incredible, and I plan on diving in to how it works because I want to write a plugin that mimics 'dataview' for obsidian, where it searches through a project and indexes data for searching that other plugins can then tap into.

however, I do not forsee myself using the @code feature because gp.nvim already had the ability to select code and insert it into the conversation with a file path annotation, which i think is faster and easier for me to target. I also don't want to incur the cost of indexing (however small) at this time, and I just think your first initial implementation was so elegant and simple I'm just adapting it for myself.

i'm also implementing an @import command, which targets a file that can have @file or @include (or even more @import commands). so it's a recursive feature that allows for writing single file with all the commonly used includes. though I'm still trying to solve the situation where there is infinite recursion, but I think it would be hard to get into that situation if the user writes their command files carefully to not create reference loops where a imports b and b imports a kind of thing.

file/buffer the user is examining.

qaptoR · 2024-08-11T19:18:48Z

Just wanted to share this. it's the first time that I've used the import feature on a large set of large files.
as you can see, the recursive imports is working.

on the left of the image are the 'import files', top right is the 'import binder'.

Then in bottom right is the final import command which references the binder. Then I query the entire context.

Claude haiku says it is about 36K tokens of context, Chatgpt mini says 31K, and I can't figure out how to see that info on gemini.
Gotta figure out gemini flash who charged 4 cents, claude was ~1 cent, and chatgpt mini was <1 cent

Odie added 11 commits July 26, 2024 22:20

[WIP] Completion menu in markdown files

4c8cf18

Finally got menus to at least show up.

Now able to provide completion for @file:path/to/file

ca1d7ed

Don't try to configure sources multiple times

d97691f

Parse out context request and insert into user message transparently

d384fc7

Misc cleanup to minimize alteration in init.lua

09d3037

Extract function definitions from src using treesitter

9a1142b

Implements ts function defintion extraction given source filepath

8ed36b1

[WIP] Trying to get the fnlist into the databae

208478d

Able to insert all fn defs for a single src file

75f7496

Implements project-wide indexing

df84cbf

Set better log msg levels when building index

50855a6

Also applies minor fix to gp.logger to respect log levels when it comes to sending notifications.

Verifies we can at least locate all functions for gp.nvim itself

4862407

Odie added 8 commits August 1, 2024 15:53

Implements completion behavior for @code

76243a2

Implements index rebuilding for a single file

8173dea

Adds a metadata (KV store) table to the database

d6f7c71

Refactors scan_directory into Utils.walk_directory

3217e1c

Builds initial fn def index on ChatNew or ChatToggle

05f56af

Implements context insertion for @code commands

2bd9508

Renames the sqlite database file

40dabee

Cleans up debug prints in Context.insert_contexts

c247a47

Odie added 9 commits August 2, 2024 19:38

Adds python indexing support

13ed561

We're also now indexing symbols of different types: function, classes, and class methods

Renames functions_defs table to symbols table.

ec80e19

The symbols table is also now defined through the sqlite.lua ORM syntax

Remove stale symbols from the index

999d7ab

Only attach completeion source to the chat buffer instead of *.md

300bd0a

Fixes broken @file when cmd_split was rewritten

52e3ae5

Adds cmd RpRebuildIndex

89d5d6e

Can now re-index changed files only

e63c68e

We're also now discarding old src_files entries that no longer exist on disk

Periodically rebuild symbols for stale files

15cc803

Directory walk now respects .gitignore

1951cad

Odie added 4 commits August 4, 2024 15:28

Cleans up prints used for debugging

89e1545

Adds @include command

d8a1651

Behaves exactly like @file, but does not inject the file name or a backtick fence around the contents

Fixes bug where .git/* is indexed when .gitignore does not exist

dbc7df8

Merge remote-tracking branch 'upstream/main' into feature/insert-context

ab6a3d5

Adds "GpReferenceFunction" to add a @code command for the function under

bd30d7b

the cursor This simplifies sending the function under the cursor as the chat context.

Adds "GpReferenceCurrentFile" command to add a @file command the

24a2cae

file/buffer the user is examining.

qaptoR mentioned this pull request Sep 11, 2024

Send multiple files into the chat #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Insert context using @ commands #174

[WIP] Insert context using @ commands #174

Odie commented Jul 27, 2024 •

edited

Loading

qaptoR commented Jul 31, 2024 •

edited

Loading

Robitx commented Jul 31, 2024 •

edited

Loading

Odie commented Jul 31, 2024

qaptoR commented Jul 31, 2024

Robitx commented Jul 31, 2024 •

edited

Loading

Odie commented Jul 31, 2024

qaptoR commented Jul 31, 2024 •

edited

Loading

Robitx commented Jul 31, 2024 •

edited

Loading

qaptoR commented Aug 1, 2024

Robitx commented Aug 1, 2024

Odie commented Aug 2, 2024 •

edited

Loading

Odie commented Aug 4, 2024

Odie commented Aug 4, 2024

qaptoR commented Aug 8, 2024 •

edited

Loading

qaptoR commented Aug 11, 2024

[WIP] Insert context using @ commands #174

Are you sure you want to change the base?

[WIP] Insert context using @ commands #174

Conversation

Odie commented Jul 27, 2024 • edited Loading

What this PR does

Command completion

TODOs

qaptoR commented Jul 31, 2024 • edited Loading

Robitx commented Jul 31, 2024 • edited Loading

Odie commented Jul 31, 2024

qaptoR commented Jul 31, 2024

Robitx commented Jul 31, 2024 • edited Loading

Odie commented Jul 31, 2024

qaptoR commented Jul 31, 2024 • edited Loading

Robitx commented Jul 31, 2024 • edited Loading

qaptoR commented Aug 1, 2024

Robitx commented Aug 1, 2024

Odie commented Aug 2, 2024 • edited Loading

Function name Indexing

@code completion

New dependency

TODOs

Odie commented Aug 4, 2024

Odie commented Aug 4, 2024

Python support

Synchronous indexing

qaptoR commented Aug 8, 2024 • edited Loading

qaptoR commented Aug 11, 2024

Odie commented Jul 27, 2024 •

edited

Loading

qaptoR commented Jul 31, 2024 •

edited

Loading

Robitx commented Jul 31, 2024 •

edited

Loading

Robitx commented Jul 31, 2024 •

edited

Loading

qaptoR commented Jul 31, 2024 •

edited

Loading

Robitx commented Jul 31, 2024 •

edited

Loading

Odie commented Aug 2, 2024 •

edited

Loading

`@code` completion

qaptoR commented Aug 8, 2024 •

edited

Loading