-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: record and replay streaming LLM calls in e2e tests #130
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
556af5c
to
80f4a62
Compare
// TODO: the demo status doesn't seem to have been loaded yet so a demo modal is shown | ||
await page.waitForTimeout(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will need a tweak in the demo code, but that's out of scope for right now
Playwright e2e testsTo view traces locally, unzip the report and run: npx playwright show-report ~/Downloads/playwright-report |
import { test, expect } from "@playwright/test"; | ||
|
||
import { TEST_BASE_URL } from "../../config/config"; | ||
import { bypassVercelProtection } from "../../helpers"; | ||
|
||
test.describe("Unauthenticated", () => { | ||
test("redirects to /sign-in", async ({ page }) => { | ||
await bypassVercelProtection(page); | ||
await page.goto(`${TEST_BASE_URL}/aila`); | ||
await expect(page.locator("h1")).toContainText("Sign in"); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unchanged, just extracted from apps/nextjs/tests-e2e/tests/aila-chat.test.ts
80f4a62
to
cbdf801
Compare
async createChatCompletionStream(params: { | ||
model: string; | ||
messages: Message[]; | ||
temperature: number; | ||
}): Promise<ReadableStreamDefaultReader<string>> { | ||
return this._openAIService.createChatCompletionStream(params); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure about the distinction between these two methods. It looks like only one is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry which two methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createChatCompletionStream and createChatCompletionObjectStream
It now looks like post-toggle, they represent whether we're using structured outputs
@@ -0,0 +1,57 @@ | |||
import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the original aila e2e test. I think the pattern for being able to test any prompt without fixtures is useful, but I don't directly have a use for it right now
cbdf801
to
5b3186f
Compare
@@ -0,0 +1,100 @@ | |||
import { clerkSetup, setupClerkTestingToken } from "@clerk/testing/playwright"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is the new approach, using fixtures
import { continueChat, isFinished, waitForGeneration } from "./helpers"; | ||
|
||
type FixtureMode = "record" | "replay"; | ||
const FIXTURE_MODE = "replay" as FixtureMode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Toggle this to overwrite the current fixtures
await test.step("Go to downloads page", async () => { | ||
// Open 'download resources' menu | ||
const downloadResources = page.getByTestId("chat-download-resources"); | ||
await downloadResources.click(); | ||
page.waitForURL(/\aila\/.*\/download/); | ||
|
||
// Click to download lesson plan | ||
const downloadLessonPlan = page.getByTestId( | ||
"chat-download-lesson-plan", | ||
); | ||
await downloadLessonPlan.click(); | ||
|
||
// Skip feedback form | ||
await page.getByLabel("Skip").click(); | ||
page.getByRole("heading", { name: "Download resources" }); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we will want this in a separate, more targeted test. At the moment we can't seed a lesson plan before the test
export async function waitForGeneration(page: Page, generationTimeout: number) { | ||
const loadingElement = page.getByTestId("chat-stop"); | ||
await expect(loadingElement).toBeVisible(); | ||
await expect(loadingElement).not.toBeVisible({ timeout: generationTimeout }); | ||
} | ||
|
||
export async function continueChat(page: Page) { | ||
await page.getByTestId("chat-continue").click(); | ||
} | ||
|
||
export async function isFinished(page: Page) { | ||
const progressText = await page.getByTestId("chat-progress").textContent(); | ||
return progressText === "10 of 10 sections complete"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unchanged
45ac190
to
0c6721a
Compare
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! ui-auth.setup.ts is timing out for me locally when i'm running it. Will follow up with you in Slack
async createChatCompletionStream(params: { | ||
model: string; | ||
messages: Message[]; | ||
temperature: number; | ||
}): Promise<ReadableStreamDefaultReader<string>> { | ||
return this._openAIService.createChatCompletionStream(params); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry which two methods?
Description
/chat
and addsx-fixture-name
andx-fixture-mode
headersWe aren't yet replaying moderation, RAG, or categorisation calls. I will add those in a follow up PR
How to test
AILA_FIXTURES_ENABLED=true
in.env
pnpm run test-e2e-ui
.formatted.txt
files to confirm