Braintrust eval action

This project enables you to run Braintrust evals as part of your CI/CD workflow in Github, using Github actions. To use this action, simply include the following step in an action file:

- name: Run Evals
  uses: braintrustdata/eval-action@v1
  with:
    api_key: ${{ secrets.BRAINTRUST_API_KEY }}
    runtime: node

You can configure the following variables:

api_key: Your Braintrust API key.
root: The root directory containing your evals (defaults to '.'). The root directory must either have node or python configured.
paths: Specific paths, relative to the root, containing evals you'd like to run.
runtime: Either node or python
use_proxy: Either true or false. If set, OPENAI_BASE_URL will be set to https://braintrustproxy.com/v1, which will automatically cache repetitive LLM calls and run your evals faster. Defaults to true.

Full example

name: Run pnpm evals

on:
  push:
    # Uncomment to run only when files in the 'evals' directory change
    # - paths:
    #     - "evals/**"

permissions:
  pull-requests: write
  contents: read

jobs:
  eval:
    name: Run evals
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        id: checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Node.js
        id: setup-node
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - uses: pnpm/action-setup@v3
        with:
          version: 8

      - name: Install Dependencies
        id: install
        run: pnpm install

      - name: Run Evals
        uses: braintrustdata/eval-action@v1
        with:
          api_key: ${{ secrets.BRAINTRUST_API_KEY }}
          runtime: node
          root: my_eval_dir

[!IMPORTANT] You must specify permissions for the action to leave comments on your PR. Without these permissions, you'll see Github API errors.

To see examples of fully configured templates, see the examples directory:

How it works

The action runs braintrust eval and collects experiment results, which are posted as a comment in the PR alongside a link to Braintrust. For example:

Example braintrust eval report

Say Hi Bot (HEAD-1714341466)

Score	Average	Improvements	Regressions
Levenshtein	0.83 (+3pp)	8 🟢	4 🔴
Duration	1s (0s)	16 🟢	1 🔴

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.devcontainer		.devcontainer
.github		.github
__tests__		__tests__
badges		badges
eval		eval
examples		examples
script		script
test-eval-py		test-eval-py
test-eval		test-eval
.eslintignore		.eslintignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.node-version		.node-version
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Braintrust eval action

Full example

How it works

Example braintrust eval report

About

Releases

Packages

Languages

License

loancrate/braintrust-eval-action

Folders and files

Latest commit

History

Repository files navigation

Braintrust eval action

Full example

How it works

Example braintrust eval report

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages