You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now the evaluation module uses ragas evaluate to always output:
answer relevance, faithfulness, context_recall, context_precision
Answer relevance and faithfulness are interesting metrics, however they work better for scenarios where you do not have a ground truth answer available.
I think we need a simple scoring that compares the ground truth answer with the generated answer as an alternative to these two. Since we have the ground truth, it might be more powerful, and it should be quite a bit faster. That can also easily be done with ragas, we just need to add the option.
Finally, right now you are forced to rerun answer_evals, even if we just want to benchmark the search itself, which I think we should also update.
The text was updated successfully, but these errors were encountered:
Right now the evaluation module uses ragas evaluate to always output:
answer relevance, faithfulness, context_recall, context_precision
Answer relevance and faithfulness are interesting metrics, however they work better for scenarios where you do not have a ground truth answer available.
I think we need a simple scoring that compares the ground truth answer with the generated answer as an alternative to these two. Since we have the ground truth, it might be more powerful, and it should be quite a bit faster. That can also easily be done with ragas, we just need to add the option.
Finally, right now you are forced to rerun answer_evals, even if we just want to benchmark the search itself, which I think we should also update.
The text was updated successfully, but these errors were encountered: