Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add LLMEvaluator to create LLM-as-a-judge evaluators #831

Merged
merged 12 commits into from
Jul 19, 2024

Conversation

agola11
Copy link
Contributor

@agola11 agola11 commented Jul 1, 2024

It's currently quite annoying to use LLM-as-a-judge evaluators in code, and there is a bit of a disconnect between the SDK and the UI.

Our off-the-shelf evaluators don't even use tool calling

In the UI, you can specify the prompt and output schema. With LangSmith, you have to use .with_structured_output within a custom function. This can be a lot of boilerplate for the user.

Additionally, a JSON schema or generic pydantic model is likely not the best interface for allowing people to specify the score format for their LLM evaluators. Opted for something more opinionated, ContinuousScoreConfig and CategoricalScoreConfig

Important detail: I map each score to a tool as opposed to each argument of a tool. This allows other attributes, like explanation to be extracted and mapped to the same feedback entry.

Future work:

  • Allow people to load these from a file
  • Create off-the-shelf evaluators based off of LLMEvaluator
  • async

Copy link
Collaborator

@hinthornw hinthornw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the idea of having an opinionated, easy evaluator def using tool calling

python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved
python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved
python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved
@agola11 agola11 merged commit c594628 into main Jul 19, 2024
7 of 8 checks passed
@agola11 agola11 deleted the ankush/06-30/add-llm-evaluator branch July 19, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants