Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt should support generating markup cells #285

Merged
merged 37 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a3181b0
Start generating markup cells
jlewi Oct 8, 2024
00bf0bf
Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup
jlewi Oct 12, 2024
6e30cdb
hacking on the prompt.
jlewi Oct 12, 2024
74a06ee
* Update the prompt to not respond with just one code cell
jlewi Oct 12, 2024
d868dc5
Bump the maximum number of characters in the response because otherwi…
jlewi Oct 14, 2024
ef4f68b
Measure the time it takes to run generate as part of an experiment.
jlewi Oct 14, 2024
6213f08
Fix: https://github.com/jlewi/foyle/issues/295
jlewi Oct 14, 2024
c4341d9
Evaluator shouldn't terminate if there is a timeout waiting for a blo…
jlewi Oct 14, 2024
ba5932e
Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup
jlewi Oct 14, 2024
1fa86ba
* Add a level 1 assertion to verify that request is non-empty
jlewi Oct 15, 2024
0547ca3
Compute an evaluation report.
jlewi Oct 15, 2024
4e22beb
Helper function to compute percentiles.
jlewi Oct 15, 2024
befe5d9
Add an experiment report compute assertions and build a report.
jlewi Oct 15, 2024
8404558
Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup
jlewi Oct 15, 2024
502d872
Tidy.
jlewi Oct 15, 2024
c7cc13c
Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup
jlewi Oct 16, 2024
01eb1e5
Update the protos.
jlewi Oct 16, 2024
bc2421e
Reset maxDocChars because it increases latency negatively.
jlewi Oct 16, 2024
0b58b18
Add retries to evaluator client
jlewi Oct 16, 2024
4ae254e
Fix empty doc generation.
jlewi Oct 17, 2024
dd1eb23
More files for the doc tailer PR.
jlewi Oct 17, 2024
2fd9e09
Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup
jlewi Oct 18, 2024
eceeca7
Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup
jlewi Oct 18, 2024
c6f2a06
Update the prompt to encourage it to reason about the output.
jlewi Oct 18, 2024
fced5fb
Define a trigger enum for specifying the event that triggered the cli…
jlewi Oct 18, 2024
d6f7277
Fix bug with logging the request; we need to use zapprot.
jlewi Oct 18, 2024
1cb849b
Start defining a K8s job and resources to automate releasing.
jlewi Oct 21, 2024
e0d65ed
* Update agent.go to limit the number of cells that get generated
jlewi Oct 22, 2024
4c38235
Create a cronjob to do the releases
jlewi Oct 23, 2024
d8eb908
Update the README
jlewi Oct 23, 2024
1be004a
Merge remote-tracking branch 'origin/main' into jlewi/ghostmarkup
jlewi Oct 23, 2024
dcdfd1d
Remove job.yaml; we will use a cronjob
jlewi Oct 23, 2024
bc89e36
postprocess should merge markup blocks; fix the test.
jlewi Oct 23, 2024
369fe30
Add a level1 assertion.
jlewi Oct 23, 2024
4effe4f
Fix tests.
jlewi Oct 23, 2024
6f7c485
Fix prompt test.
jlewi Oct 23, 2024
bdfe06e
Update prompt test.
jlewi Oct 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 48 additions & 20 deletions app/pkg/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,8 @@ func (a *Agent) Generate(ctx context.Context, req *v1alpha1.GenerateRequest) (*v
}

log.Info("Agent.Generate returning response", zap.Object("response", resp))

assertRequestResponse(ctx, req, resp)
return resp, nil
}

Expand Down Expand Up @@ -425,7 +427,7 @@ func (a *Agent) StreamGenerate(ctx context.Context, stream *connect.BidiStream[v
continue
}

log.Info("Received request", zap.Object("request", req))
log.Info("Received request", logs.ZapProto("request", req))
// Serialize the doc and make it available for processing
func() {
mu.Lock()
Expand Down Expand Up @@ -604,19 +606,10 @@ func (a *Agent) LogEvents(ctx context.Context, req *connect.Request[v1alpha1.Log

// postProcessBlocks is a helper function to post process the blocks generated by the agent.
func postProcessBlocks(blocks []*v1alpha1.Block) ([]*v1alpha1.Block, error) {
// Only return a single code block and only the code block.
// We do this because
// 1. Due https://github.com/jlewi/foyle/issues/168 we can't render markdown as ghostcells
// so we only want to return the code block.
// 2. We don't want to return multiple code blocks because that can be confusing. We can potentially relax that
// in the future if everything is working
// Post process the blocks
results := make([]*v1alpha1.Block, 0, len(blocks))
for _, block := range blocks {
if block.GetKind() != v1alpha1.BlockKind_CODE {
continue
}
// The model sometimes returns just the "</output>" tag but inside a coude block.
// The model sometimes returns just the "</output>" tag but inside a code block.
// We want to ignore such blocks.
if isOutputTag(block.Contents) {
continue
Expand All @@ -626,8 +619,24 @@ func postProcessBlocks(blocks []*v1alpha1.Block) ([]*v1alpha1.Block, error) {
if strings.TrimSpace(block.Contents) == "" {
continue
}

if len(results) > 0 && block.Kind == v1alpha1.BlockKind_MARKUP && results[len(results)-1].Kind == v1alpha1.BlockKind_MARKUP {
// If the previous block is a markup block we want to merge this with the previous block.
lastBlock := results[len(results)-1]
// TODO(jeremy): Do we need to add a newline?
lastBlock.Contents += "\n" + block.Contents
continue
}

results = append(results, block)
return results, nil

// Once we reach a code block drop any other code blocks. This is because showing multiple code blocks
// can be confusing.
// TODO(jeremy): Should we make this a configurable option so its easy to experiment?
if block.Kind == v1alpha1.BlockKind_CODE {
// TODO(jeremy): log a level 1 assertion here?
return results, nil
}
}
return results, nil
}
Expand Down Expand Up @@ -660,14 +669,7 @@ func (s *streamState) getContextID() string {

// shouldTrigger returns true if the agent should trigger a completion for the current document.
func shouldTrigger(doc *v1alpha1.Doc, selectedIndex int32) bool {
// We should trigger if the last cell is a code cell
if len(doc.Blocks) == 0 {
return false
}
// N.B. This is a bit of a hack to reduce costs because we are using so many tokens.
// For now only trigger completion if the selected cell is a markup cell.
selectedCell := doc.Blocks[selectedIndex]
return selectedCell.GetKind() == v1alpha1.BlockKind_MARKUP
return len(doc.Blocks) != 0
}

// dropResponse returns true if the response should be dropped rather than being sent to the client.
Expand All @@ -690,3 +692,29 @@ func preprocessDoc(req *v1alpha1.GenerateRequest) []*v1alpha1.Block {
cells := req.Doc.Blocks[:req.SelectedIndex+1]
return cells
}

// assertRequestResponse runs some assertions that depend on the generateRequest and the response.
func assertRequestResponse(ctx context.Context, req *v1alpha1.GenerateRequest, resp *v1alpha1.GenerateResponse) {
log := logs.FromContext(ctx)
assertMarkupAfterCode := &v1alpha1.Assertion{
Name: v1alpha1.Assertion_MARKUP_AFTER_CODE,
Result: v1alpha1.AssertResult_SKIPPED,
Id: ulid.GenerateID(),
}

selected := req.Doc.Blocks[req.SelectedIndex]
// Assertion only applies if the selected index is a code cell
if selected.Kind == v1alpha1.BlockKind_CODE {
if len(resp.Blocks) > 0 && resp.Blocks[0].Kind == v1alpha1.BlockKind_MARKUP {
assertMarkupAfterCode.Result = v1alpha1.AssertResult_PASSED
} else {
assertMarkupAfterCode.Result = v1alpha1.AssertResult_FAILED
}
}

if len(resp.Blocks) == 0 {
assertMarkupAfterCode.Result = v1alpha1.AssertResult_FAILED
}

log.Info(logs.Level1Assertion, "assertion", assertMarkupAfterCode)
}
63 changes: 60 additions & 3 deletions app/pkg/agent/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,17 @@ func Test_Generate(t *testing.T) {
},
maxResults: 0,
},
{
name: "test-gcloud-iam",
doc: &v1alpha1.Doc{
Blocks: []*v1alpha1.Block{
{
Contents: "How do I debug why workload identity isn't working for a deployment in GKE?",
},
},
},
maxResults: 0,
},
{
name: "prdiff",
doc: &v1alpha1.Doc{
Expand Down Expand Up @@ -109,7 +120,7 @@ func Test_Generate(t *testing.T) {
}

cfg.Agent.ModelProvider = api.ModelProviderOpenAI
cfg.Agent.Model = openai.GPT3Dot5Turbo0125
cfg.Agent.Model = openai.GPT4oMini

completer, err := oai.NewCompleter(*cfg, client)
if err != nil {
Expand Down Expand Up @@ -331,7 +342,7 @@ func Test_ShouldTrigger(t *testing.T) {
},
},
selectedIndex: 0,
expected: false,
expected: true,
},
}

Expand Down Expand Up @@ -373,6 +384,52 @@ func Test_PostProcessBlocks(t *testing.T) {
},
expected: []*v1alpha1.Block{},
},
{
name: "merge-markup-blocks",
blocks: []*v1alpha1.Block{
{
Kind: v1alpha1.BlockKind_MARKUP,
Contents: "first block",
},
{
Kind: v1alpha1.BlockKind_MARKUP,
Contents: "second block",
},
},
expected: []*v1alpha1.Block{
{
Kind: v1alpha1.BlockKind_MARKUP,
Contents: "first block\nsecond block",
},
},
},
{
name: "stop-at-code-block",
blocks: []*v1alpha1.Block{
{
Kind: v1alpha1.BlockKind_MARKUP,
Contents: "first block",
},
{
Kind: v1alpha1.BlockKind_CODE,
Contents: "echo hello",
},
{
Kind: v1alpha1.BlockKind_MARKUP,
Contents: "last block",
},
},
expected: []*v1alpha1.Block{
{
Kind: v1alpha1.BlockKind_MARKUP,
Contents: "first block",
},
{
Kind: v1alpha1.BlockKind_CODE,
Contents: "echo hello",
},
},
},
}

for _, c := range cases {
Expand All @@ -381,7 +438,7 @@ func Test_PostProcessBlocks(t *testing.T) {
if err != nil {
t.Fatalf("Error post processing blocks; %v", err)
}
if d := cmp.Diff(c.expected, actual); d != "" {
if d := cmp.Diff(c.expected, actual, cmpopts.IgnoreUnexported(v1alpha1.Block{})); d != "" {
t.Errorf("Unexpected diff:\n%s", d)
}
})
Expand Down
81 changes: 79 additions & 2 deletions app/pkg/agent/prompt.tmpl
Original file line number Diff line number Diff line change
@@ -1,12 +1,89 @@
Continue writing the markdown document by adding a code block with the commands a user should execute.
Continue writing the markdown document by adding markdown and code blocks with the commands a user should execute.

Follow these rules

* If the user is asking a question such as "How do I debug workload identity?" or "Why isn't my pod running?"
consider outputting a succinct explanation for how to debug the issue or answer any question
* For any command that needs to be executed by the user, put it inside a code block
* Set the language inside the code block to bash
* Use the text at the end of the document to determine what commands to execute next
* Use the existing text and code blocks in the document to learn phrases that are predictive of specific commands
* Only respond with a single code block
* You can put multiple commands into a code block
* If the text at the end of the document doesn't clearly describe a command to execute simply respond with the </output> tag
* If a user executed a command, the output of that command will be included in a code block with the language set to output
* Use the output of previous commands to determine what to do next

Here's an example:

<example>
<input>
# Count users
* Run a SQL query to count the number of users?
</input>
<output>
1. Fetch the schema for the database

```bash
sqlite3 /path/to/your/database/db.sqlite ".schema"
```

1. Run the following sql query to count the number of users

```bash
sqlite3 /path/to/your/database/db.sqlite "SELECT COUNT(DISTINCT customerId) FROM table_name;"
```
</output>
<reasoning>
The response intermixes markup and code cells providing the steps to count the number of users in a database.
</reasoning>
</example>

* You should look at the document to decide if the user is already in the midst of executing a sequence of steps
* If the user is in the middle of executing a sequence of steps, you should continue the sequence of steps
* You should continue the sequence by using the output of the previous command(s) to determine what to do next

* If the document ends with the a code block containing the output of a command, look at the markup preceding
the code block containing the commands to try to figure out what question/problem the command was trying to solve.
* In this case you should respond with markup answering the question based on the output of the commands.
an answer to that question based on the output or a suggestion about what to do next.

Here's an example:
<example>
<input>
1. Check the Kubernetes Service Account Configuration
Ensure that the Kubernetes service account is annotated with the correct Google Cloud service account.

```bash
kubectl get serviceaccount default -n default -o yaml
```

```output
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: developer@foyle-dev.iam.gserviceaccount.com
creationTimestamp: "2024-05-30T02:11:21Z"
name: default
namespace: default
resourceVersion: "155079105"
uid: 8c8fe74f-b23d-477c-b8b7-7a8937733fa3
```
</input>
<output>
The annotation `iam.gke.io/gcp-service-account` is correctly set with the Google Cloud service account.
Since the annoation is correctly set, the next thing to check is the IAM permissions for the
Google Cloud service account developer@foyle-dev.iam.gserviceaccount.com.
</output>
<reasoning>
* The input ends with the output of the command `kubectl get serviceaccount default -n default -o yaml`
* The markup preceding the command indicates that we are running this command to check if its annotated with
the correct service account
* So in this case you respond by analyzing the output to answer the question about the annotations
* Based on that analysis you suggest the next step to debug the issue
</reasoning>
</example>


{{if .Examples}}
Here are a bunch of examples of input documents along with the expected output.
Expand Down
81 changes: 79 additions & 2 deletions app/pkg/agent/test_data/examples.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,89 @@
Continue writing the markdown document by adding a code block with the commands a user should execute.
Continue writing the markdown document by adding markdown and code blocks with the commands a user should execute.

Follow these rules

* If the user is asking a question such as "How do I debug workload identity?" or "Why isn't my pod running?"
consider outputting a succinct explanation for how to debug the issue or answer any question
* For any command that needs to be executed by the user, put it inside a code block
* Set the language inside the code block to bash
* Use the text at the end of the document to determine what commands to execute next
* Use the existing text and code blocks in the document to learn phrases that are predictive of specific commands
* Only respond with a single code block
* You can put multiple commands into a code block
* If the text at the end of the document doesn't clearly describe a command to execute simply respond with the </output> tag
* If a user executed a command, the output of that command will be included in a code block with the language set to output
* Use the output of previous commands to determine what to do next

Here's an example:

<example>
<input>
# Count users
* Run a SQL query to count the number of users?
</input>
<output>
1. Fetch the schema for the database

```bash
sqlite3 /path/to/your/database/db.sqlite ".schema"
```

1. Run the following sql query to count the number of users

```bash
sqlite3 /path/to/your/database/db.sqlite "SELECT COUNT(DISTINCT customerId) FROM table_name;"
```
</output>
<reasoning>
The response intermixes markup and code cells providing the steps to count the number of users in a database.
</reasoning>
</example>

* You should look at the document to decide if the user is already in the midst of executing a sequence of steps
* If the user is in the middle of executing a sequence of steps, you should continue the sequence of steps
* You should continue the sequence by using the output of the previous command(s) to determine what to do next

* If the document ends with the a code block containing the output of a command, look at the markup preceding
the code block containing the commands to try to figure out what question/problem the command was trying to solve.
* In this case you should respond with markup answering the question based on the output of the commands.
an answer to that question based on the output or a suggestion about what to do next.

Here's an example:
<example>
<input>
1. Check the Kubernetes Service Account Configuration
Ensure that the Kubernetes service account is annotated with the correct Google Cloud service account.

```bash
kubectl get serviceaccount default -n default -o yaml
```

```output
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: developer@foyle-dev.iam.gserviceaccount.com
creationTimestamp: "2024-05-30T02:11:21Z"
name: default
namespace: default
resourceVersion: "155079105"
uid: 8c8fe74f-b23d-477c-b8b7-7a8937733fa3
```
</input>
<output>
The annotation `iam.gke.io/gcp-service-account` is correctly set with the Google Cloud service account.
Since the annoation is correctly set, the next thing to check is the IAM permissions for the
Google Cloud service account developer@foyle-dev.iam.gserviceaccount.com.
</output>
<reasoning>
* The input ends with the output of the command `kubectl get serviceaccount default -n default -o yaml`
* The markup preceding the command indicates that we are running this command to check if its annotated with
the correct service account
* So in this case you respond by analyzing the output to answer the question about the annotations
* Based on that analysis you suggest the next step to debug the issue
</reasoning>
</example>



Here are a bunch of examples of input documents along with the expected output.
Expand Down
Loading
Loading