evaluation-metrics

Star

Here are 457 public repositories matching this topic...

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Dec 12, 2024
Python

AgentOps-AI / agentops

Star

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Dec 13, 2024
Python

xinshuoweng / AB3DMOT

Star

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

tracking machine-learning real-time computer-vision robotics evaluation evaluation-metrics multi-object-tracking kitti 3d-tracking 3d-multi-object-tracking 2d-mot-evaluation 3d-mot 3d-multi kitti-3d

Updated Apr 3, 2024
Python

huggingface / evaluation-guidebook

Star

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

machine-learning tutorial evaluation evaluation-metrics guidebook large-language-models llm

Updated Dec 4, 2024
Jupyter Notebook

huggingface / lighteval

Star

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluation evaluation-metrics evaluation-framework huggingface

Updated Dec 12, 2024
Python

google-research / rliable

Star

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

benchmarking machine-learning google reinforcement-learning rl evaluation-metrics

Updated Aug 12, 2024
Jupyter Notebook

MIND-Lab / OCTIS

Star

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

nlp natural-language-processing hyperparameter-optimization topic-modeling nlp-library bayesian-optimization hyperparameter-tuning latent-dirichlet-allocation evaluation-metrics neural-topic-models latent-semantic-analysis topic-models hyperparameter-search non-negative-matrix-factorization nlproc

Updated Jul 25, 2024
Python

jitsi / jiwer

Star

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

python3 automatic-speech-recognition speech-to-text evaluation-metrics wer word-error-rate

Updated Nov 1, 2024
Python

nekhtiari / image-similarity-measures

Star

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

processing machine-learning image metrics evaluation-metrics p1

Updated Aug 31, 2024
Python

Unbabel / COMET

Star

A Neural Framework for MT Evaluation

nlp machine-learning natural-language-processing machine-translation artificial-intelligence evaluation-metrics

Updated Dec 5, 2024
Python

AmenRa / ranx

Star

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

python information-retrieval evaluation comparison numba recommender-systems evaluation-metrics metasearch data-fusion score-fusion ranking-metrics information-retrieval-evaluation information-retrieval-metrics rank-fusion

Updated Jul 1, 2024
Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…