ai-evaluation

Star

Here are 10 public repositories matching this topic...

METR / vivaria

Star

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

ai elicitation ai-evaluation evals

Updated Dec 14, 2024
TypeScript

lechmazur / confabulations

Star

Hallucinations (Confabulations) Document-Based Benchmark for RAG

benchmark leaderboard gemini llama language-model claude rag hallucinations ai-evaluation llm llm-benchmarking gpt-4o o1-mini o1-preview confabulations

Updated Nov 5, 2024
HTML

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

nlp machine-learning gemini llama language-model model-evaluation ai-safety mistral claude disinformation ai-security ai-benchmarks ai-evaluation llm llm-benchmarking gpt4o

Updated Oct 22, 2024

dpc10ster / RJafrocRocBook

Star

ROC methodology explained with R-examples

book roc ai-evaluation

Updated Apr 25, 2024
TeX

dpc10ster / RJafrocFrocBook

Star

FROC methodology explained with R-examples

pdf r book ai-evaluation

Updated Dec 26, 2023
TeX

dpc10ster / RJafrocQuickStart

Star

RJafroc quick start for those already familiar with windows jafroc

r rjafroc ai-evaluation

Updated Dec 28, 2023
TeX

bigdata-ustc / CAT4AI

Star

Adaptive Testing Framework for AI Models (Psychometrics in AI Evaluation)

psychometrics adaptive-testing ai-evaluation

Updated Oct 1, 2024
Jupyter Notebook

dpc10ster / WindowsJAFROC

Star

Installation files for Windows JAFROC software

windows ai-evaluation jafroc

Updated Feb 8, 2023

gabrielhamalwa / magpie

Star

Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.

ai-safety personality-traits interpretability cognitive-bias explainability ai-evaluation gpt-models machine-psychology ai-bias psychometric-analysis lwda24

Updated Sep 23, 2024
TeX

dpc10ster / datasets

Star

ROC/FROC datasets from my collaborations

datasets ai-evaluation

Updated Aug 14, 2023

Improve this page

Add a description, image, and links to the ai-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-evaluation

Here are 10 public repositories matching this topic...

METR / vivaria

lechmazur / confabulations

lechmazur / deception

dpc10ster / RJafrocRocBook

dpc10ster / RJafrocFrocBook

dpc10ster / RJafrocQuickStart

bigdata-ustc / CAT4AI

dpc10ster / WindowsJAFROC

gabrielhamalwa / magpie

dpc10ster / datasets

Improve this page

Add this topic to your repo