Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Handling Of Long Outputs #328

Merged
merged 6 commits into from
Oct 25, 2024
Merged

Improve Handling Of Long Outputs #328

merged 6 commits into from
Oct 25, 2024

Conversation

jlewi
Copy link
Owner

@jlewi jlewi commented Oct 25, 2024

Problem

Cell outputs can be very long. For example, if we run a query (gcloud, SQL, etc...) the output could be very verbose. This output could eat up the entire context allocated for the input document. As a result, we might not have sufficiently meaningful context to prompt the model.

There was another bug in our doc tailer. We were applying character limits to the rendered markdown. We were imposing this by tailing the lines. This could produce invalid markdown. For example, we might end up truncating the document in the middle of a code block so we wouldn't have the opening triple quotes for the code block. We might also include the output of the code block without including the code that it is output for.

Solution

First, we impose character limits in a way that is aware of cell boundaries. We move truncation into the Block to Markdown conversion. The conversion now takes the maximum length for the output string. The conversion routine then figures out how much to allocate to the contents of the cell and its outputs. This allows truncation to happen in a way that can respect cell boundaries.

Second, if we truncate the code block or output we output a string indicating that the output was truncated. We want the model to know that output was truncated. We update our prompt to tell the LLM to look for truncated output and to potentially deal with this by running commands that will provide less verbose output.

Fix #299

Copy link
Contributor

@standard-input standard-input bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues flagged.
Standard Input can make mistakes. Check important info.

Copy link

netlify bot commented Oct 25, 2024

Deploy Preview for foyle canceled.

Name Link
🔨 Latest commit b75a7ee
🔍 Latest deploy log https://app.netlify.com/sites/foyle/deploys/671c0b33e8efe30008b10ae2

@jlewi jlewi enabled auto-merge (squash) October 25, 2024 21:10
@jlewi jlewi merged commit 3f8fa1a into main Oct 25, 2024
5 checks passed
@jlewi jlewi deleted the jlewi/longoutputs branch October 25, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle long cell outputs intelligently
1 participant