feat: use chunk data in NIAH and QA evals #1176

jalling97 · 2024-10-01T22:30:39Z

Description

This PR adds chunk data to the NIAH and QA evaluations to better evaluate the retrieval stage of RAG

Changes the NIAH retrieval metric to be based on the actual chunk data, not just the annotations
Adds chunk_rank metric to NIAH evals to evaluate how well ranked the chunks are
Adds 2 new LLM-as-judge metrics to QA evals: Contextual Relevancy and Faithfulness
Updates package versions
Fixes bug when not using padding data in NIAH evals

BREAKING CHANGES

NIAH Retrieval now measures slightly differently, meaning prior NIAH retrieval metrics should not be compared to these new values.

CHANGES

Replaces the current proxy annotation measure for NIAH retrievals with one based on the chunk data directly.

Related Issue

Relates to #1067

Checklist before merging

Tests, documentation, ADR added or updated as needed
Followed the Contributor Guide Steps

netlify · 2024-10-01T22:30:55Z

✅ Deploy Preview for leapfrogai-docs canceled.

Name	Link
🔨 Latest commit	`855cec5`
🔍 Latest deploy log	https://app.netlify.com/sites/leapfrogai-docs/deploys/67000ac85a9ad60008a999e9

jalling97 · 2024-10-02T21:39:43Z

Evaluation results from most recent run using new metrics:

Final Results:
INFO:root:Average Needle in a Haystack (NIAH) Retrieval: 1.0
INFO:root:Average Needle in a Haystack (NIAH) Response: 1.0
INFO:root:Average Needle in a Haystack (NIAH) Chunk Rank: 0.9600000000000001
INFO:root:Average Correctness (GEval): 0.82
INFO:root:Average Answer Relevancy: 0.9583333333333335
INFO:root:Average Contextual Relevancy: 0.504
INFO:root:Average Faithfulness: 0.9278174603174603
INFO:root:Average Annotation Relevancy: 0.9359999999999999
INFO:root:MMLU: 0.696969696969697
INFO:root:HumanEval: 0.95
INFO:root:Eval Execution Runtime (seconds): 1655.2433378696442

…data-in-niah-and-qa-evals

begin reading chunk data from annotations

3aade09

jalling97 linked an issue Oct 1, 2024 that may be closed by this pull request

chore: use chunk data in NIAH and QA evals #1067

Closed

jalling97 added 3 commits October 2, 2024 13:01

upgrade openai version

8b7e080

add chunk_rank metric to niah

1cde77e

update env.example

dc315cc

jalling97 self-assigned this Oct 2, 2024

jalling97 added 2 commits October 2, 2024 16:27

update readme with niah updates

e62cb74

add retrieval context to QA evals and new metrics

c057be2

jalling97 added chore enhancement New feature or request and removed chore labels Oct 3, 2024

jalling97 changed the title ~~chore: use chunk data in NIAH and QA evals~~ feat: use chunk data in NIAH and QA evals Oct 3, 2024

jalling97 added 5 commits October 3, 2024 16:49

Merge remote-tracking branch 'origin/main' into 1067-chore-use-chunk-…

dee460f

…data-in-niah-and-qa-evals

refactor to not use multiple api env vars

4dee3a9

Merge remote-tracking branch 'origin/main' into 1067-chore-use-chunk-…

22e1c90

…data-in-niah-and-qa-evals

update .env.example

bdbde55

remove eval_list from main

855cec5

jalling97 marked this pull request as ready for review October 4, 2024 15:49

jalling97 requested a review from a team as a code owner October 4, 2024 15:49

justinthelaw approved these changes Oct 4, 2024

View reviewed changes

CollectiveUnicorn approved these changes Oct 7, 2024

View reviewed changes

jalling97 merged commit ad697cd into main Oct 7, 2024
37 of 39 checks passed

jalling97 deleted the 1067-chore-use-chunk-data-in-niah-and-qa-evals branch October 7, 2024 19:30

github-actions bot mentioned this pull request Oct 7, 2024

chore(main): release 1.0.0 #1160

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use chunk data in NIAH and QA evals #1176

feat: use chunk data in NIAH and QA evals #1176

jalling97 commented Oct 1, 2024 •

edited

Loading

netlify bot commented Oct 1, 2024 •

edited

Loading

jalling97 commented Oct 2, 2024

feat: use chunk data in NIAH and QA evals #1176

feat: use chunk data in NIAH and QA evals #1176

Conversation

jalling97 commented Oct 1, 2024 • edited Loading

Description

BREAKING CHANGES

CHANGES

Related Issue

Checklist before merging

netlify bot commented Oct 1, 2024 • edited Loading

✅ Deploy Preview for leapfrogai-docs canceled.

jalling97 commented Oct 2, 2024

jalling97 commented Oct 1, 2024 •

edited

Loading

netlify bot commented Oct 1, 2024 •

edited

Loading