Skip to content

loancrate/braintrust-eval-action

 
 

Repository files navigation

Braintrust eval action

This project enables you to run Braintrust evals as part of your CI/CD workflow in Github, using Github actions. To use this action, simply include the following step in an action file:

- name: Run Evals
  uses: braintrustdata/eval-action@v1
  with:
    api_key: ${{ secrets.BRAINTRUST_API_KEY }}
    runtime: node

You can configure the following variables:

  • api_key: Your Braintrust API key.
  • root: The root directory containing your evals (defaults to '.'). The root directory must either have node or python configured.
  • paths: Specific paths, relative to the root, containing evals you'd like to run.
  • runtime: Either node or python
  • use_proxy: Either true or false. If set, OPENAI_BASE_URL will be set to https://braintrustproxy.com/v1, which will automatically cache repetitive LLM calls and run your evals faster. Defaults to true.

Full example

name: Run pnpm evals

on:
  push:
    # Uncomment to run only when files in the 'evals' directory change
    # - paths:
    #     - "evals/**"

permissions:
  pull-requests: write
  contents: read

jobs:
  eval:
    name: Run evals
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        id: checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Node.js
        id: setup-node
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - uses: pnpm/action-setup@v3
        with:
          version: 8

      - name: Install Dependencies
        id: install
        run: pnpm install

      - name: Run Evals
        uses: braintrustdata/eval-action@v1
        with:
          api_key: ${{ secrets.BRAINTRUST_API_KEY }}
          runtime: node
          root: my_eval_dir

[!IMPORTANT] You must specify permissions for the action to leave comments on your PR. Without these permissions, you'll see Github API errors.

To see examples of fully configured templates, see the examples directory:

How it works

The action runs braintrust eval and collects experiment results, which are posted as a comment in the PR alongside a link to Braintrust. For example:

Example braintrust eval report

Say Hi Bot (HEAD-1714341466)

Score Average Improvements Regressions
Levenshtein 0.83 (+3pp) 8 🟢 4 🔴
Duration 1s (0s) 16 🟢 1 🔴

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 89.1%
  • Shell 8.7%
  • Python 2.2%