Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: record and replay streaming LLM calls in e2e tests #130

Closed
wants to merge 4 commits into from

Conversation

codeincontext
Copy link
Collaborator

@codeincontext codeincontext commented Sep 12, 2024

Description

  • Playwright intercepts API calls to /chat and adds x-fixture-name and x-fixture-mode headers
  • Chat API handler looks for those headers and passes a custom LLMService to Aila
    • The FixtureRecord LLMService proxies the OpenAI service and saves the chunks to a file
    • The FixtureReplay LLMService streams the chunks in a fixture file

We aren't yet replaying moderation, RAG, or categorisation calls. I will add those in a follow up PR

How to test

  1. Check out the branch locally
  2. Pull doppler, or set AILA_FIXTURES_ENABLED=true in .env
  3. Run pnpm run test-e2e-ui
  4. Click through to aila-chat => full-romans.test.ts => Authenticated => "Full aila flow with Romans fixture"
  5. Run the test. If you make sure it's selected in the sidebar you should see the output on the right. Note that the auth setup will run first
  6. You should see the same content as the fixtures. Look at the .formatted.txt files to confirm

Copy link

vercel bot commented Sep 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
oak-ai-lesson-assistant ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 19, 2024 10:14am

Comment on lines +67 to +57
// TODO: the demo status doesn't seem to have been loaded yet so a demo modal is shown
await page.waitForTimeout(500);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will need a tweak in the demo code, but that's out of scope for right now

Copy link

github-actions bot commented Sep 12, 2024

Playwright e2e tests

Job summary

Download report

To view traces locally, unzip the report and run:

npx playwright show-report ~/Downloads/playwright-report

Comment on lines +1 to +13
import { test, expect } from "@playwright/test";

import { TEST_BASE_URL } from "../../config/config";
import { bypassVercelProtection } from "../../helpers";

test.describe("Unauthenticated", () => {
test("redirects to /sign-in", async ({ page }) => {
await bypassVercelProtection(page);
await page.goto(`${TEST_BASE_URL}/aila`);
await expect(page.locator("h1")).toContainText("Sign in");
});
});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unchanged, just extracted from apps/nextjs/tests-e2e/tests/aila-chat.test.ts

Comment on lines +17 to +23
async createChatCompletionStream(params: {
model: string;
messages: Message[];
temperature: number;
}): Promise<ReadableStreamDefaultReader<string>> {
return this._openAIService.createChatCompletionStream(params);
}
Copy link
Collaborator Author

@codeincontext codeincontext Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure about the distinction between these two methods. It looks like only one is used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry which two methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createChatCompletionStream and createChatCompletionObjectStream

It now looks like post-toggle, they represent whether we're using structured outputs

@@ -0,0 +1,57 @@
import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original aila e2e test. I think the pattern for being able to test any prompt without fixtures is useful, but I don't directly have a use for it right now

@@ -0,0 +1,100 @@
import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is the new approach, using fixtures

import { continueChat, isFinished, waitForGeneration } from "./helpers";

type FixtureMode = "record" | "replay";
const FIXTURE_MODE = "replay" as FixtureMode;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toggle this to overwrite the current fixtures

Comment on lines +82 to +97
await test.step("Go to downloads page", async () => {
// Open 'download resources' menu
const downloadResources = page.getByTestId("chat-download-resources");
await downloadResources.click();
page.waitForURL(/\aila\/.*\/download/);

// Click to download lesson plan
const downloadLessonPlan = page.getByTestId(
"chat-download-lesson-plan",
);
await downloadLessonPlan.click();

// Skip feedback form
await page.getByLabel("Skip").click();
page.getByRole("heading", { name: "Download resources" });
});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we will want this in a separate, more targeted test. At the moment we can't seed a lesson plan before the test

Comment on lines +3 to +16
export async function waitForGeneration(page: Page, generationTimeout: number) {
const loadingElement = page.getByTestId("chat-stop");
await expect(loadingElement).toBeVisible();
await expect(loadingElement).not.toBeVisible({ timeout: generationTimeout });
}

export async function continueChat(page: Page) {
await page.getByTestId("chat-continue").click();
}

export async function isFinished(page: Page) {
const progressText = await page.getByTestId("chat-progress").textContent();
return progressText === "10 of 10 sections complete";
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unchanged

Copy link

sonarcloud bot commented Sep 19, 2024

Copy link
Collaborator

@mantagen mantagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! ui-auth.setup.ts is timing out for me locally when i'm running it. Will follow up with you in Slack

Comment on lines +17 to +23
async createChatCompletionStream(params: {
model: string;
messages: Message[];
temperature: number;
}): Promise<ReadableStreamDefaultReader<string>> {
return this._openAIService.createChatCompletionStream(params);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry which two methods?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants