Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval #87

Merged
merged 22 commits into from
May 3, 2024
Merged

Eval #87

merged 22 commits into from
May 3, 2024

Conversation

jlewi
Copy link
Owner

@jlewi jlewi commented Apr 29, 2024

See tn003_learning_eval.md for more description of how we are doing eval.

This is a first pass at evaluation.

  • Implement a distance metric based on edit distance
  • Implement the infrastructure to compute it.

Add an apply command and use it to run different experiments

  • Requires starting to move some of the Agent config into the API package because we want to reuse it in the experiment type.

Start an initial evaluation dataset.

API Updates

  • Added EvalResult structure to represent the evaluation outcome, including the actual commands generated, the expected commands, and the evaluation distance.
  • Introduced EvalResultStatus to indicate the status of an evaluation, such as DONE or ERROR.

Agent Updates

  • Updated the Agent service to support evaluation mode, allowing it to operate without impacting the learning process.

Executor Updates

  • Enhanced the Executor service to handle execution in evaluation mode, ensuring that execution traces are marked accordingly.

Evaluator Implementation

  • Implemented the Evaluator component responsible for orchestrating the evaluation process, loading evaluation examples, generating predictions with the Agent, calculating distances, and updating results.

Google Sheets Integration

  • Added functionality to export evaluation results to Google Sheets, enabling easy review and analysis of Foyle's performance.

CLI Tool Enhancements

  • Extended the CLI tool with commands for running evaluations

Miscellaneous

  • Added necessary protobuf definitions for new data structures related to evaluations.
  • Updated server setup to handle evaluation logic and integrate with the learning mechanism.
  • Provided sample evaluation datasets for initial testing and validation of the evaluation process.

Copy link

netlify bot commented Apr 29, 2024

Deploy Preview for foyle ready!

Name Link
🔨 Latest commit 8fd8116
🔍 Latest deploy log https://app.netlify.com/sites/foyle/deploys/663576c35484c90008403dc2
😎 Deploy Preview https://deploy-preview-87--foyle.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jlewi jlewi changed the title Jlewi/mistakes Eval Apr 29, 2024
@jlewi jlewi marked this pull request as ready for review May 3, 2024 23:45
@jlewi jlewi merged commit 1be7110 into main May 3, 2024
5 checks passed
@jlewi jlewi deleted the jlewi/mistakes branch May 3, 2024 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant