test: record and replay streaming LLM calls in e2e tests #130

codeincontext · 2024-09-12T14:39:42Z

Description

Playwright intercepts API calls to /chat and adds x-fixture-name and x-fixture-mode headers
Chat API handler looks for those headers and passes a custom LLMService to Aila
- The FixtureRecord LLMService proxies the OpenAI service and saves the chunks to a file
- The FixtureReplay LLMService streams the chunks in a fixture file

We aren't yet replaying moderation, RAG, or categorisation calls. I will add those in a follow up PR

How to test

Check out the branch locally
Pull doppler, or set AILA_FIXTURES_ENABLED=true in .env
Run pnpm run test-e2e-ui
Click through to aila-chat => full-romans.test.ts => Authenticated => "Full aila flow with Romans fixture"
Run the test. If you make sure it's selected in the sidebar you should see the output on the right. Note that the auth setup will run first
You should see the same content as the fixtures. Look at the .formatted.txt files to confirm

vercel · 2024-09-12T14:39:45Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
oak-ai-lesson-assistant	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 19, 2024 10:14am

codeincontext · 2024-09-12T14:44:25Z

apps/nextjs/tests-e2e/tests/aila-chat/full-romans.test.ts

+        // TODO: the demo status doesn't seem to have been loaded yet so a demo modal is shown
+        await page.waitForTimeout(500);


I think this will need a tweak in the demo code, but that's out of scope for right now

github-actions · 2024-09-12T14:52:57Z

Playwright e2e tests

Job summary

Download report

To view traces locally, unzip the report and run:

npx playwright show-report ~/Downloads/playwright-report

codeincontext · 2024-09-12T14:59:43Z

apps/nextjs/tests-e2e/tests/aila-chat/auth.test.ts

+import { test, expect } from "@playwright/test";
+
+import { TEST_BASE_URL } from "../../config/config";
+import { bypassVercelProtection } from "../../helpers";
+
+test.describe("Unauthenticated", () => {
+  test("redirects to /sign-in", async ({ page }) => {
+    await bypassVercelProtection(page);
+    await page.goto(`${TEST_BASE_URL}/aila`);
+    await expect(page.locator("h1")).toContainText("Sign in");
+  });
+});


This is unchanged, just extracted from apps/nextjs/tests-e2e/tests/aila-chat.test.ts

codeincontext · 2024-09-12T15:01:29Z

apps/nextjs/src/app/api/chat/fixtures/FixtureRecordLLMService.ts

+  async createChatCompletionStream(params: {
+    model: string;
+    messages: Message[];
+    temperature: number;
+  }): Promise<ReadableStreamDefaultReader<string>> {
+    return this._openAIService.createChatCompletionStream(params);
+  }


I'm a bit unsure about the distinction between these two methods. It looks like only one is used?

Sorry which two methods?

createChatCompletionStream and createChatCompletionObjectStream

It now looks like post-toggle, they represent whether we're using structured outputs

codeincontext · 2024-09-12T15:02:28Z

apps/nextjs/tests-e2e/tests/aila-chat/flexible-prompt.test.ts

@@ -0,0 +1,57 @@
+import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright";


This is the original aila e2e test. I think the pattern for being able to test any prompt without fixtures is useful, but I don't directly have a use for it right now

codeincontext · 2024-09-12T15:05:03Z

apps/nextjs/tests-e2e/tests/aila-chat/full-romans.test.ts

@@ -0,0 +1,100 @@
+import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright";


This file is the new approach, using fixtures

codeincontext · 2024-09-12T15:05:20Z

apps/nextjs/tests-e2e/tests/aila-chat/full-romans.test.ts

+import { continueChat, isFinished, waitForGeneration } from "./helpers";
+
+type FixtureMode = "record" | "replay";
+const FIXTURE_MODE = "replay" as FixtureMode;


Toggle this to overwrite the current fixtures

codeincontext · 2024-09-12T15:06:23Z

apps/nextjs/tests-e2e/tests/aila-chat/full-romans.test.ts

+      await test.step("Go to downloads page", async () => {
+        // Open 'download resources' menu
+        const downloadResources = page.getByTestId("chat-download-resources");
+        await downloadResources.click();
+        page.waitForURL(/\aila\/.*\/download/);
+
+        // Click to download lesson plan
+        const downloadLessonPlan = page.getByTestId(
+          "chat-download-lesson-plan",
+        );
+        await downloadLessonPlan.click();
+
+        // Skip feedback form
+        await page.getByLabel("Skip").click();
+        page.getByRole("heading", { name: "Download resources" });
+      });


In the future we will want this in a separate, more targeted test. At the moment we can't seed a lesson plan before the test

codeincontext · 2024-09-12T15:06:33Z

apps/nextjs/tests-e2e/tests/aila-chat/helpers.ts

+export async function waitForGeneration(page: Page, generationTimeout: number) {
+  const loadingElement = page.getByTestId("chat-stop");
+  await expect(loadingElement).toBeVisible();
+  await expect(loadingElement).not.toBeVisible({ timeout: generationTimeout });
+}
+
+export async function continueChat(page: Page) {
+  await page.getByTestId("chat-continue").click();
+}
+
+export async function isFinished(page: Page) {
+  const progressText = await page.getByTestId("chat-progress").textContent();
+  return progressText === "10 of 10 sections complete";
+}


This is unchanged

sonarcloud · 2024-09-19T10:11:04Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

mantagen

This looks great! ui-auth.setup.ts is timing out for me locally when i'm running it. Will follow up with you in Slack

mantagen · 2024-09-20T09:40:23Z

apps/nextjs/src/app/api/chat/fixtures/FixtureRecordLLMService.ts

+  async createChatCompletionStream(params: {
+    model: string;
+    messages: Message[];
+    temperature: number;
+  }): Promise<ReadableStreamDefaultReader<string>> {
+    return this._openAIService.createChatCompletionStream(params);
+  }


Sorry which two methods?

codeincontext force-pushed the test/e2e-fixtures branch from 556af5c to 80f4a62 Compare September 12, 2024 14:43

codeincontext commented Sep 12, 2024

View reviewed changes

vercel bot deployed to Preview September 12, 2024 14:51 View deployment

codeincontext commented Sep 12, 2024

View reviewed changes

codeincontext force-pushed the test/e2e-fixtures branch from 80f4a62 to cbdf801 Compare September 12, 2024 15:00

codeincontext commented Sep 12, 2024

View reviewed changes

vercel bot deployed to Preview September 12, 2024 15:04 View deployment

codeincontext force-pushed the test/e2e-fixtures branch from cbdf801 to 5b3186f Compare September 12, 2024 15:04

codeincontext commented Sep 12, 2024

View reviewed changes

vercel bot deployed to Preview September 12, 2024 15:07 View deployment

codeincontext marked this pull request as ready for review September 12, 2024 15:15

vercel bot deployed to Preview September 12, 2024 15:16 View deployment

codeincontext requested a review from a team September 12, 2024 15:55

vercel bot deployed to Preview September 12, 2024 15:56 View deployment

vercel bot deployed to Preview September 12, 2024 17:12 View deployment

codeincontext added 4 commits September 19, 2024 11:36

wip: add llm fixtures

d95cb87

feat: wrap up initial fixtures for streaming LLM

c170c59

Reinstate chatLlmService

7fc2a38

Fix paths

0c6721a

codeincontext force-pushed the test/e2e-fixtures branch from 45ac190 to 0c6721a Compare September 19, 2024 10:09

vercel bot deployed to Preview September 19, 2024 10:14 View deployment

mantagen reviewed Sep 20, 2024

View reviewed changes

codeincontext closed this Sep 20, 2024

codeincontext mentioned this pull request Sep 20, 2024

test: record and replay streaming LLM calls in e2e tests #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: record and replay streaming LLM calls in e2e tests #130

test: record and replay streaming LLM calls in e2e tests #130

codeincontext commented Sep 12, 2024 •

edited

Loading

vercel bot commented Sep 12, 2024 •

edited

Loading

codeincontext Sep 12, 2024

github-actions bot commented Sep 12, 2024 •

edited

Loading

codeincontext Sep 12, 2024

codeincontext Sep 12, 2024 •

edited

Loading

mantagen Sep 20, 2024

codeincontext Sep 20, 2024

codeincontext Sep 12, 2024

codeincontext Sep 12, 2024

codeincontext Sep 12, 2024

codeincontext Sep 12, 2024

codeincontext Sep 12, 2024

sonarcloud bot commented Sep 19, 2024

mantagen left a comment

mantagen Sep 20, 2024

		// TODO: the demo status doesn't seem to have been loaded yet so a demo modal is shown
		await page.waitForTimeout(500);

		@@ -0,0 +1,57 @@
		import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright";

		@@ -0,0 +1,100 @@
		import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright";

test: record and replay streaming LLM calls in e2e tests #130

test: record and replay streaming LLM calls in e2e tests #130

Conversation

codeincontext commented Sep 12, 2024 • edited Loading

Description

How to test

vercel bot commented Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Sep 12, 2024 • edited Loading

Playwright e2e tests

Choose a reason for hiding this comment

codeincontext Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Sep 19, 2024

Quality Gate passed

mantagen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeincontext commented Sep 12, 2024 •

edited

Loading

vercel bot commented Sep 12, 2024 •

edited

Loading

github-actions bot commented Sep 12, 2024 •

edited

Loading

codeincontext Sep 12, 2024 •

edited

Loading