Skip to content

v0.1.7: UI improvements to response inspector

Compare
Choose a tag to compare
@ianarawjo ianarawjo released this 21 Jun 14:25
· 141 commits to main since this release
ea3d730

We've made a number of improvements to the inspector UI and beyond.

Side-by-side comparison across LLM responses

Responses now appear side-by-side for up to five LLMs queried:

Screen Shot 2023-06-21 at 9 27 45 AM

Collapseable response groups

You can also collapse LLM responses grouped by their prompt template variable, for easier selective inspection. Just click on a response group header to show/hide:

collapsable-groups.mov

Accuracy plots by default

Boolean (true/false) evaluation metrics now use accuracy plots by default. For instance, for ChainForge's prompt injection example:

Screen Shot 2023-06-21 at 9 27 58 AM

This makes it extremely easy to see differences across models for the specified evaluation. Stacked bar charts are still used when a prompt variable is selected. For instance, here is plotting a meta-variable, 'Domain', across two LLMs, testing whether or not the code outputs had an import statement (another new feature):

Screen Shot 2023-06-21 at 10 22 51 AM

Added 'Inspect results' footer to both Prompt and Eval nodes

The tiny response previews footer in the Prompt Node has been changed to 'Inspect Responses' button that brings up a fullscreen response inspector. In addition, evaluation results can be easily inspected by clicking 'Inspect results':

Screen Shot 2023-06-21 at 10 12 34 AM

Evaluation scores appear in bold at the top of each response block:

Screen Shot 2023-06-21 at 10 13 54 AM

In addition, both Prompt and Eval nodes now load cache'd results upon initialization. Simply load an example flow and click the respective Inspect button.

Added asMarkdownAST to response object in Evaluator node

Given how often developers wish to parse markdown, we've added a function asMarkdownAST() to the ResponseInfo class that uses the mistune library to parse markdown as an abstract syntax tree (AST).

For instance, here's code which detects if an 'import' statement appeared anywhere in the codeblocks of a chat response:

Screen Shot 2023-06-21 at 10 19 51 AM