Insight-Bench

Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Insight-Bench is a benchmark dataset designed to evaluate end-to-end data analytics by evaluating agents' ability to perform comprehensive data analysis across 31 diverse business use cases, featuring carefully curated insights, an evaluation mechanism based on LLaMA-3-Eval, and a data analytics agent, AgentPoirot.

Pre-requisites

Install the python libraries

pip install .

Specify the OpenAI key in your environment

export OPENAI_API_KEY="your-api-key"

All groundtruth notebooks are in data/notebooks.

An example notebook can be found here: data/notebooks/flag-1.ipynb

Quick Start

Run the following command to run AgentPoirot on one of the notebook flags

python run_agent.py -e <exp_group> -sb <savedir_base>

The variables <...> can be substituted with the following values:

<exp_group> : quick
<savedir_base>: path to where results will be saved

Experiment hyperparameters are defined in exp_groups.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
cba		cba
data/notebooks		data/notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
exp_groups.py		exp_groups.py
requirements.txt		requirements.txt
run_agent.py		run_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insight-Bench

Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Pre-requisites

Quick Start

About

Releases

Packages

Contributors 2

Languages

License

ServiceNow/insight-bench

Folders and files

Latest commit

History

Repository files navigation

Insight-Bench

Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Pre-requisites

Quick Start

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages