[RFC] Add `LLMEvaluator` to create LLM-as-a-judge evaluators #831

agola11 · 2024-07-01T05:32:15Z

It's currently quite annoying to use LLM-as-a-judge evaluators in code, and there is a bit of a disconnect between the SDK and the UI.

Our off-the-shelf evaluators don't even use tool calling

In the UI, you can specify the prompt and output schema. With LangSmith, you have to use .with_structured_output within a custom function. This can be a lot of boilerplate for the user.

Additionally, a JSON schema or generic pydantic model is likely not the best interface for allowing people to specify the score format for their LLM evaluators. Opted for something more opinionated, ContinuousScoreConfig and CategoricalScoreConfig

Important detail: I map each score to a tool as opposed to each argument of a tool. This allows other attributes, like explanation to be extracted and mapped to the same feedback entry.

Future work:

Allow people to load these from a file
Create off-the-shelf evaluators based off of LLMEvaluator
async

hinthornw

Really like the idea of having an opinionated, easy evaluator def using tool calling

python/langsmith/evaluation/llm_evaluator.py

agola11 added 3 commits June 30, 2024 22:31

rfc -- LLMEvaluator

4108d8d

fix mypy

c287232

another fix

490c5a7

hinthornw reviewed Jul 1, 2024

View reviewed changes

python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved

python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved

python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved

agola11 added 5 commits July 2, 2024 00:33

add from_model

ce7ebb9

update based on comments

b1ec895

fix workflow test

7f741d8

Merge branch 'main' into ankush/06-30/add-llm-evaluator

723347d

update to human/system message

ba7ac7b

hinthornw reviewed Jul 11, 2024

View reviewed changes

python/langsmith/evaluation/llm_evaluator.py Show resolved Hide resolved

hinthornw and others added 4 commits July 11, 2024 16:36

Fix up contextvar propagation

dc003c6

Fix up contextvar propagation (#865)

1bf6427

merge and fix lint

96cfdfb

add optional explanation_description

726387d

samnoyes approved these changes Jul 19, 2024

View reviewed changes

agola11 merged commit c594628 into main Jul 19, 2024
7 of 8 checks passed

agola11 deleted the ankush/06-30/add-llm-evaluator branch July 19, 2024 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Add `LLMEvaluator` to create LLM-as-a-judge evaluators #831

[RFC] Add `LLMEvaluator` to create LLM-as-a-judge evaluators #831

agola11 commented Jul 1, 2024 •

edited

Loading

hinthornw left a comment

[RFC] Add LLMEvaluator to create LLM-as-a-judge evaluators #831

[RFC] Add LLMEvaluator to create LLM-as-a-judge evaluators #831

Conversation

agola11 commented Jul 1, 2024 • edited Loading

hinthornw left a comment

Choose a reason for hiding this comment

[RFC] Add `LLMEvaluator` to create LLM-as-a-judge evaluators #831

[RFC] Add `LLMEvaluator` to create LLM-as-a-judge evaluators #831

agola11 commented Jul 1, 2024 •

edited

Loading