FM-Leaderboard-er

**⚠️ MAINTENANCE NOTICE **
This project is no longer actively maintained.
While the repository remains available for educational purposes, we recommend exploring more current alternatives for production use:

RAGAS - A comprehensive framework for RAG evaluation
Amazon Bedrock - A fully managed service for foundation models from Amazon
For a practical example of using these alternatives, check out our evaluation notebook using RAGAS and Bedrock.

Additionally, for Latency benchmarking check the code samples for Latency Benchmarking tools for Amazon Bedrock.

FM-Leaderboard-er

Create your own private LLM leaderboard! 📊

Introduction

There's no one-fit-all leaderboard. FM-Leaderboard-er will allow you to find the best LLM for your own business use case based on your own tasks, prompts, and data.

Features:

Tasks - Example notebooks for common tasks like Summarization, Classification, and RAG (coming soon).
Models - Amazon Bedrock, OpenAI, any API (with a code integration).
Metrics - Built-in metrics per task + custom metrics (via a code integration).
Latency - Latency metric per model
Cost - comparison.
Prompt - You could compare several prompts across one model

Getting Started

Prerequisits

AWS account with Amazon Bedrock access to selected models.
Hugging Face access token The code will download Dataset from Huggingface (https://huggingface.co/api/datasets/Salesforce/dialogstudio), this will require an access token, if you don't have one yet, follow these steps:

Signup to Hugging Face: https://huggingface.co
Generate an access token (save it for further use): https://huggingface.co/settings/tokens

Store the access token localy, by installing python lib huggingface_hub and execute from shell:

> pip install huggingface_hub
> python -c "from huggingface_hub.hf_api import HfFolder; HfFolder.save_token('YOUR_HUGGINGFACE_TOKEN')"

(Verify you now have: ~/.cache/huggingface)

Installation

Clone the repository:

git clone https://github.com/aws-samples/fm-leaderboarder.git

Usage

To get started, open the example-1 notebook and follow the instructions provided.

Architecture

Coming soon.

Dependency on third party libraries and services

This code can interact with the OpenAI service which has terms published here and pricing described here. You should be familiar with the pricing and confirm that your use case complies with the terms before proceeding.

This repository makes use of aws/fmeval Foundation Model Evaluations Library. Please review any license terms applicable to the dataset with your legal team and confirm that your use case complies with the terms before proceeding.

Security

See CONTRIBUTING for more information.

Contributing

Contributions to FM-Leaderboarder are welcome! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.

Contributors

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
resources		resources
tests		tests
utils		utils
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt
summariziation_example.ipynb		summariziation_example.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FM-Leaderboard-er

Introduction

Features:

Getting Started

Prerequisits

Installation

Usage

Architecture

Dependency on third party libraries and services

Security

Contributing

Contributors

License

About

Releases

Packages

Contributors 3

Languages

License

aws-samples/fm-leaderboarder

Folders and files

Latest commit

History

Repository files navigation

FM-Leaderboard-er

Introduction

Features:

Getting Started

Prerequisits

Installation

Usage

Architecture

Dependency on third party libraries and services

Security

Contributing

Contributors

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages