Eval #87

jlewi · 2024-04-29T19:03:46Z

See tn003_learning_eval.md for more description of how we are doing eval.

This is a first pass at evaluation.

Implement a distance metric based on edit distance
Implement the infrastructure to compute it.

Add an apply command and use it to run different experiments

Requires starting to move some of the Agent config into the API package because we want to reuse it in the experiment type.

Start an initial evaluation dataset.

API Updates

Added EvalResult structure to represent the evaluation outcome, including the actual commands generated, the expected commands, and the evaluation distance.
Introduced EvalResultStatus to indicate the status of an evaluation, such as DONE or ERROR.

Agent Updates

Updated the Agent service to support evaluation mode, allowing it to operate without impacting the learning process.

Executor Updates

Enhanced the Executor service to handle execution in evaluation mode, ensuring that execution traces are marked accordingly.

Evaluator Implementation

Implemented the Evaluator component responsible for orchestrating the evaluation process, loading evaluation examples, generating predictions with the Agent, calculating distances, and updating results.

Google Sheets Integration

Added functionality to export evaluation results to Google Sheets, enabling easy review and analysis of Foyle's performance.

CLI Tool Enhancements

Extended the CLI tool with commands for running evaluations

Miscellaneous

Added necessary protobuf definitions for new data structures related to evaluations.
Updated server setup to handle evaluation logic and integrate with the learning mechanism.
Provided sample evaluation datasets for initial testing and validation of the evaluation process.

netlify · 2024-04-29T19:04:02Z

✅ Deploy Preview for foyle ready!

Name	Link
🔨 Latest commit	`8fd8116`
🔍 Latest deploy log	https://app.netlify.com/sites/foyle/deploys/663576c35484c90008403dc2
😎 Deploy Preview	https://deploy-preview-87--foyle.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

… although looks like we may not need it because we can just use vscode.

jlewi added 2 commits April 29, 2024 12:03

Start a tech note on eval.

f2595d0

Start crafting an eval dataset.

be743a0

Add mathematical equation.

2551f6e

jlewi changed the title ~~Jlewi/mistakes~~ Eval Apr 29, 2024

jlewi added 19 commits April 29, 2024 19:21

First draft of the tech note. Create a hack program to render mathjax…

279d06c

… although looks like we may not need it because we can just use vscode.

Fix markdown.

e87f5e5

Fix line breaks.

f1b49fe

More fixes to markdown.

672ada9

more fixes.

da295b7

More fixes.

6f16346

Add alternatives.

3eb1897

Clean up mathjax

85cb9a4

Compute edit distance.

43de012

Define the distance metric.

f9785c7

Start using protos and creating an evaluator.

4de82ee

Add some code to write to Google Sheets.

7b71c32

Parameterize the code for writing to sheets so we don't hardcode things.

1ddabe3

Compute evaluation and load into a gsheet.

7efbdf9

Start defining an experiment.

78842c3

Update the code to use the new experiment resource.

4063f6f

Add an apply command.

cdddafc

tidy.

de1de31

Fix lint.

8fd8116

jlewi marked this pull request as ready for review May 3, 2024 23:45

jlewi merged commit 1be7110 into main May 3, 2024
5 checks passed

jlewi deleted the jlewi/mistakes branch May 3, 2024 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval #87

Eval #87

jlewi commented Apr 29, 2024 •

edited

Loading

netlify bot commented Apr 29, 2024 •

edited

Loading

Eval #87

Eval #87

Conversation

jlewi commented Apr 29, 2024 • edited Loading

API Updates

Agent Updates

Executor Updates

Evaluator Implementation

Google Sheets Integration

CLI Tool Enhancements

Miscellaneous

netlify bot commented Apr 29, 2024 • edited Loading

✅ Deploy Preview for foyle ready!

jlewi commented Apr 29, 2024 •

edited

Loading

netlify bot commented Apr 29, 2024 •

edited

Loading