-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement Evaluation Mode for Traces and Logs (#92)
echo "### PR Title: Implement Evaluation Mode for Traces and Logs ### Overview This pull request introduces the ability to distinguish between logs generated in evaluation mode and those created during normal operation. This differentiation is crucial for ensuring that logs meant for evaluation do not interfere with the learning process and data meant for production use. ### Changes 1. **API Adjustments**: New `EvalMode` methods and fields are introduced in various structures (`LogEntry`, `GenerateTrace`, `ExecuteTrace`, and `BlockLog`) across the API. These changes allow the identification and handling of logs and traces created in evaluation mode. 2. **Analyzer Logic Enhancement**: The analysis process now accounts for the evaluation mode flag when processing and combining log entries. This ensures that logs generated during evaluation carry the evaluation mode flag throughout the analysis pipeline, from individual log entries to compiled traces. 3. **Exclusion from Learning**: Blocks marked as created in evaluation mode are now excluded from the learning process. This is implemented in the learning logic to prevent evaluation data from being used as learning examples, thus maintaining the integrity of the training data. ### Testing - Extended unit tests to cover the changes in log analysis and learning processes, ensuring that evaluation mode logs are processed and handled as expected. - Included tests with scenarios that simulate operational and evaluation conditions to verify the robustness of the new logic.
- Loading branch information
Showing
9 changed files
with
202 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 2 additions & 0 deletions
2
app/pkg/analyze/test_data/execute_traces_lines_eval_mode.jsonl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"severity":"info","time":1713303870.400788,"caller":"executor/executor.go:43", "evalMode":true,"function":"github.com/jlewi/foyle/app/pkg/executor.(*Executor).Execute","message":"Executor.Execute","traceId":"43281a4e73cd76570e9851589207a8bd","blockId":"","request":{"block":{"kind":"CODE","language":"","contents":"gcloud logging read --project=foyle-dev\n","outputs":[],"trace_ids":[],"id":""}}} | ||
{"severity":"info","time":1713303872.587626,"caller":"executor/executor.go:61","function":"github.com/jlewi/foyle/app/pkg/executor.(*Executor).Execute","message":"Executed instructions","traceId":"43281a4e73cd76570e9851589207a8bd","instructionsError":"json: unsupported type: chan string","response":{"outputs":[{"items":[{"mime":"text/plain","text_data":"exitCode: 0"}]}]}} |
2 changes: 2 additions & 0 deletions
2
app/pkg/analyze/test_data/execute_traces_lines_eval_mode_false.jsonl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"severity":"info","time":1713303870.400788,"caller":"executor/executor.go:43", "evalMode":false,"function":"github.com/jlewi/foyle/app/pkg/executor.(*Executor).Execute","message":"Executor.Execute","traceId":"43281a4e73cd76570e9851589207a8bd","blockId":"","request":{"block":{"kind":"CODE","language":"","contents":"gcloud logging read --project=foyle-dev\n","outputs":[],"trace_ids":[],"id":""}}} | ||
{"severity":"info","time":1713303872.587626,"caller":"executor/executor.go:61","function":"github.com/jlewi/foyle/app/pkg/executor.(*Executor).Execute","message":"Executed instructions","traceId":"43281a4e73cd76570e9851589207a8bd","instructionsError":"json: unsupported type: chan string","response":{"outputs":[{"items":[{"mime":"text/plain","text_data":"exitCode: 0"}]}]}} |
5 changes: 5 additions & 0 deletions
5
app/pkg/analyze/test_data/generate_trace_lines_eval_mode.jsonl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{"severity":"info","time":1713303852.283372,"caller":"agent/agent.go:61", "evalMode": true,"function":"github.com/jlewi/foyle/app/pkg/agent.(*Agent).Generate","message":"Agent.Generate","traceId":"2c4efb9372de3bb51d66d3f7e9c3a76a","request":{"doc":{"blocks":[{"kind":"MARKUP","language":"markdown","contents":"Use gcloud to read the logs for the cluster dev in project foyle-dev; assume you aready logged in","outputs":[],"trace_ids":["","","",""],"id":""}]}}} | ||
{"severity":"info","time":1713303852.283869,"caller":"agent/agent.go:117","function":"github.com/jlewi/foyle/app/pkg/agent.(*Agent).completeWithRetries","message":"OpenAI:CreateChatCompletion","traceId":"2c4efb9372de3bb51d66d3f7e9c3a76a","request":{"model":"gpt-3.5-turbo-0125","messages":[{"role":"system","content":"You are a helpful AI assistant for software developers. You are helping software engineers write markdown documents to deploy\nand operate software. Your job is to help users reason about problems and tasks and come up with the appropriate\ncommands to accomplish them. You should never try to execute commands. You should always tell the user\nto execute the commands themselves. To help the user place the commands inside a code block with the language set to\nbash. Users can then execute the commands inside VSCode notebooks. The output will then be appended to the document.\nYou can then use that output to reason about the next steps.\n\nYou are only helping users with tasks related to building, deploying, and operating software. You should interpret\nany questions or commands in that context.\n"},{"role":"user","content":"Please continue writing this markdown document to deal with any tasks or issues listed\nin the document. The document is a markdown document. It will contain a description of the task\nor problem, I need your help with. It will then contain one or more code blocks containing commands\nto be executed to accomplish the task or obtain information needed to figure out the problem.\nIf a command has already been executed the output of the command will be provided in a code block\nwith the language `output`. Use the output to help you figure out the problem or complete the task.\nIf you need me to execute a command please provide the command in a code block and I will execute it\nand then add the output to the document.\n\nHere's the document:\n\nUse gcloud to read the logs for the cluster dev in project foyle-dev; assume you aready logged in\n"}],"max_tokens":2000,"temperature":0.9}} | ||
{"severity":"info","time":1713303855.224336,"caller":"agent/agent.go:132","function":"github.com/jlewi/foyle/app/pkg/agent.(*Agent).completeWithRetries","message":"OpenAI:CreateChatCompletion response","traceId":"2c4efb9372de3bb51d66d3f7e9c3a76a","resp":{"id":"chatcmpl-9Ekt2d8y2F8FKBEfPdVj6iFiV4ciL","object":"chat.completion","created":1713303852,"model":"gpt-3.5-turbo-0125","choices":[{"index":0,"message":{"role":"assistant","content":"To read the logs for the cluster `dev` in project `foyle-dev` using `gcloud`, you can use the following command:\n\n```bash\ngcloud container clusters get-credentials dev --project=foyle-dev\n```\n\nThis command will authenticate `kubectl` to the cluster `dev` in the `foyle-dev` project.\n\nAfter this, you can use `kubectl` to read the logs for the cluster. Here's the command to read the logs for a specific pod:\n\n```bash\nkubectl logs \u003cpod_name\u003e\n```\n\nReplace `\u003cpod_name\u003e` with the name of the pod for which you want to read the logs. If you are not sure about the pod name, you can list all pods in the cluster using the following command:\n\n```bash\nkubectl get pods\n```\n\nPlease execute the commands above and provide the output if you need further assistance."},"finish_reason":"stop"}],"usage":{"prompt_tokens":336,"completion_tokens":179,"total_tokens":515},"system_fingerprint":"fp_c2295e73ad"}} | ||
{"severity":"info","time":1713303855.225972,"caller":"agent/agent.go:76","function":"github.com/jlewi/foyle/app/pkg/agent.(*Agent).Generate","message":"Agent.Generate returning blocks","traceId":"2c4efb9372de3bb51d66d3f7e9c3a76a","blockIds":["10b11f2d-7c8d-4d58-bedc-7e2dd51a85dc","8594d742-39d7-473b-b4ad-901ff362fdb3","f1662328-e884-418c-a084-95dfb1a3f7fc","4feb6219-d050-4630-8ceb-d08ec149b60d","3893a0b6-8c84-49ca-a38c-fbf6d7adfcde","fd276a6f-f379-4f9c-9779-0ed07819d0f5","d507ce35-af59-4f92-8dec-6c37d7b26647"]} | ||
{"severity":"info","time":1713303855.226055,"caller":"agent/agent.go:83","function":"github.com/jlewi/foyle/app/pkg/agent.(*Agent).Generate","message":"Agent.Generate returning response","traceId":"2c4efb9372de3bb51d66d3f7e9c3a76a","response":{"blocks":[{"kind":"MARKUP","language":"","contents":"To read the logs for the cluster `dev` in project `foyle-dev` using `gcloud`, you can use the following command:","outputs":[],"trace_ids":[],"id":"10b11f2d-7c8d-4d58-bedc-7e2dd51a85dc"},{"kind":"CODE","language":"bash","contents":"gcloud container clusters get-credentials dev --project=foyle-dev\n","outputs":[],"trace_ids":[],"id":"8594d742-39d7-473b-b4ad-901ff362fdb3"},{"kind":"MARKUP","language":"","contents":"\n\nThis command will authenticate `kubectl` to the cluster `dev` in the `foyle-dev` project.\n\nAfter this, you can use `kubectl` to read the logs for the cluster. Here's the command to read the logs for a specific pod:","outputs":[],"trace_ids":[],"id":"f1662328-e884-418c-a084-95dfb1a3f7fc"},{"kind":"CODE","language":"bash","contents":"kubectl logs <pod_name>\n","outputs":[],"trace_ids":[],"id":"4feb6219-d050-4630-8ceb-d08ec149b60d"},{"kind":"MARKUP","language":"","contents":"\n\nReplace `<pod_name>` with the name of the pod for which you want to read the logs. If you are not sure about the pod name, you can list all pods in the cluster using the following command:","outputs":[],"trace_ids":[],"id":"3893a0b6-8c84-49ca-a38c-fbf6d7adfcde"},{"kind":"CODE","language":"bash","contents":"kubectl get pods\n","outputs":[],"trace_ids":[],"id":"fd276a6f-f379-4f9c-9779-0ed07819d0f5"},{"kind":"MARKUP","language":"","contents":"\n\nPlease execute the commands above and provide the output if you need further assistance.","outputs":[],"trace_ids":[],"id":"d507ce35-af59-4f92-8dec-6c37d7b26647"}]}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"blocks": [ | ||
{ | ||
"kind": "MARKUP", | ||
"language": "markdown", | ||
"contents": "Create a PR description" | ||
}, | ||
{ | ||
"kind": "CODE", | ||
"language": "bash", | ||
"contents": "git diff origin/main | llm --model=gpt-4-0125-preview -s \"Create a PR description from the following diff\"" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"blocks": [ | ||
{ | ||
"kind": "MARKUP", | ||
"language": "markdown", | ||
"contents": "Get the logs for building the image carabou" | ||
}, | ||
{ | ||
"kind": "CODE", | ||
"language": "bash", | ||
"contents": "gcloud logging read 'logName=\"projects/foyle-dev/logs/hydros\" jsonPayload.image=\"carabou\"' --freshness=1d --project=foyle-dev" | ||
} | ||
] | ||
} |