v0.1.7: UI improvements to response inspector
We've made a number of improvements to the inspector UI and beyond.
Side-by-side comparison across LLM responses
Responses now appear side-by-side for up to five LLMs queried:
Collapseable response groups
You can also collapse LLM responses grouped by their prompt template variable, for easier selective inspection. Just click on a response group header to show/hide:
collapsable-groups.mov
Accuracy plots by default
Boolean (true/false) evaluation metrics now use accuracy plots by default. For instance, for ChainForge's prompt injection example:
This makes it extremely easy to see differences across models for the specified evaluation. Stacked bar charts are still used when a prompt variable is selected. For instance, here is plotting a meta-variable, 'Domain', across two LLMs, testing whether or not the code outputs had an import
statement (another new feature):
Added 'Inspect results' footer to both Prompt and Eval nodes
The tiny response previews footer in the Prompt Node has been changed to 'Inspect Responses' button that brings up a fullscreen response inspector. In addition, evaluation results can be easily inspected by clicking 'Inspect results':
Evaluation scores appear in bold at the top of each response block:
In addition, both Prompt and Eval nodes now load cache'd results upon initialization. Simply load an example flow and click the respective Inspect button.
Added asMarkdownAST
to response
object in Evaluator node
Given how often developers wish to parse markdown, we've added a function asMarkdownAST()
to the ResponseInfo
class that uses the mistune
library to parse markdown as an abstract syntax tree (AST).
For instance, here's code which detects if an 'import' statement appeared anywhere in the codeblocks of a chat response: