Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update retraining to work with RunMe #110

Closed
jlewi opened this issue May 16, 2024 · 7 comments · Fixed by #121
Closed

Update retraining to work with RunMe #110

jlewi opened this issue May 16, 2024 · 7 comments · Fixed by #121

Comments

@jlewi
Copy link
Owner

jlewi commented May 16, 2024

We need to update our retraining code to work with the data produced by RunMe

@jlewi
Copy link
Owner Author

jlewi commented May 26, 2024

Needed Changes

We need to update the Logs Analyzer to process the RunMe logs.

To Process the RunMe logs there's a couple things we need to do

  • We need to get the log entries associated with Execute Requests

  • We need to build a map from RunMe ULID to Foyle BlockIDs

  • buildTraces needs to support RunMe Execute Requests

    • Option 1: Reuse ExecuteTrace to represent a RunMe execute request
    • Option 2: Define a new RunmeExecuteTrace
    • I think Option 2 is more consistent with where we are headed. For a RunMe trace different protos are used and we should log those rather than doing some sort of Lossy conversion to Foyle protos. If we just add those protos as fields to ExecuteTrace then it becomes harder to tell what type of trace we have and therefore what fields should be set
    • Option 2 is more consistent with introducing a proto which will use a OneOf to combine types that are unique for each type of trace Use Pebble to store traces #91
    • For RunMe execute traces we could use the _id field. This is a ULID but that shouldn't matter.
  • We will need to change buildTraces because right now we iterate over all the traces (Generate and Execute) and add them to the associated block log

    • Now we need to do the following
    • Iterate over all the Generate Traces
      • Create a mapping from RunMe ULID to BlockID
    • Iterate over all the ExecuteTraces
      • Map ULID to BlockID
      • Lookup BlockID in BlockLog map.

Configuration

  • I think we should define a LearnerConfig and add it to our Config this should allow people to configure additional log locations.

@jlewi
Copy link
Owner Author

jlewi commented May 27, 2024

Alternative: Use ULIDs for BlocKIDs

  • An alternative is to change Foyle to use ULIDs for blockIDs.
  • I think having a single consistent ID will simplify things and avoid the headaches caused by constantly needing to map back and forth.
  • Its possible RunMe can work with UUIDs but we can just use ULIDs. IDs are string and we don't much care what they are as long as they are unique.

jlewi added a commit that referenced this issue May 27, 2024
* RunMe uses ULIDs for cell ids. If we switch to using ULIDs we can be consistent with what RunMe
  does and avoid having two different types of ids for blocks. This will avoid a lot of headaches.

Related to #110
jlewi added a commit that referenced this issue May 27, 2024
* RunMe uses ULIDs for cell ids. If we switch to using ULIDs we can be
consistent with what RunMe does and avoid having two different types of
ids for blocks. This will avoid a lot of headaches.

Related to #110
@jlewi
Copy link
Owner Author

jlewi commented May 27, 2024

Right now each type of trace is stored in a flat JSONL file

genEnc := json.NewEncoder(genFile)

We rely on the filename to know the type of trace.
If we don't fix #91 first; we will have to add another set of JSONL files.

jlewi added a commit that referenced this issue May 28, 2024
Switch to using Pebble as a Key Value store to store traces and block
logs

* To support that we create a Trace proto which uses a oneof field to
store the fields for the different types of traces
* This will make it easier to support using RunMe logs for training #110

Fix #91
@jlewi
Copy link
Owner Author

jlewi commented May 28, 2024

With #121 we should be code complete but will need to do some E2E testing to see if its all working.

jlewi added a commit that referenced this issue May 28, 2024
* Define a combineRunmeEntries function to generate the RunmeTrace
* Define a RunMe trace type
* Update Analyzer to parse the RunMe logs and construct traces
* Update Analayzer to update the BlockLog based on the RunMe trace
   * Set the executed command

Fix  #110
@jlewi jlewi reopened this May 28, 2024
@jlewi
Copy link
Owner Author

jlewi commented May 28, 2024

So how do we verify everything is working

  1. Use RunMe to generate a completion
  2. Correct the completion
  3. Train
  4. Verify example is added to the training set
  5. Generate a similar completion
  6. Verify the example is used in the prompt

Since the RunMe changes haven't been released yet. We need to use our locally built version of RunMe.

@jlewi
Copy link
Owner Author

jlewi commented May 28, 2024

No examples are produced based on the foyle logs.
When I looked at the blocklog for block 01HZ0N1228NJ7PSRYB6WEMH08M there was no exec id.

@jlewi
Copy link
Owner Author

jlewi commented May 28, 2024

I didn't update the learning code to include the directory of the RunMe logs

jlewi added a commit that referenced this issue May 29, 2024
* Fix training on RunMe logs #110 
* We need to allow Foyle to be configured with the directory of the
RunMe logs
   * Add LearnerConfig to Foyle config with the logs directory
* Fix a bug in analyze preventing BlockLogs from being updated with the
executed block when using RunMe
* The bug was that when looking for the last execution trace; we weren't
considering RunMe traces

* Update the RunMe developer guide

* Update the docs
@jlewi jlewi closed this as completed May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant