test: record and replay streaming LLM calls in e2e tests #149

codeincontext · 2024-09-20T14:38:25Z

Description

This PR supersedes #130, as that PR didn't mock the moderation OpenAI calls

Playwright intercepts API calls to /chat and adds x-fixture-name and x-fixture-mode headers
Chat API handler looks for those headers and passes custom LLMService and modetaions OpenAi clients to Aila
- The FixtureRecord LLMService proxies the OpenAI service and saves the chunks to a file. Moderation has an equivalent
- The FixtureReplay LLMService streams the chunks in a fixture file. Moderation has an equivalent

Known issues:

There's (what I think is) a race condition which requires some explicit waits, otherwise the number of cpleted sections regresses between messages
- "These waits are an antipattern. If we don't allow enough time before sending the next message, the completed section count goes backwards. I think it's a race condition with the useEffects in the chat and fetching state from tRPC after streaming. It was happening before and after feat: allow the user to interact while moderation is in progress #147"
- I think we should just move forward and address that at a later date if it becomes important

How to test

Check out the branch locally
Pull doppler, or set AILA_FIXTURES_ENABLED=true in .env
Run pnpm run test-e2e-ui
Click through to aila-chat => full-romans.test.ts => Authenticated => "Full aila flow with Romans fixture"
Run the test. If you make sure it's selected in the sidebar you should see the output on the right. Note that the auth setup will run first
You should see the same content as the fixtures. Look at the .formatted.txt files to confirm

vercel · 2024-09-20T14:38:30Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
oak-ai-lesson-assistant	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 23, 2024 4:46pm

github-actions · 2024-09-20T14:50:42Z

Playwright e2e tests

Job summary

Download report

To view traces locally, unzip the report and run:

npx playwright show-report ~/Downloads/playwright-report

…ixtures

codeincontext · 2024-09-23T16:05:28Z

apps/nextjs/tests-e2e/tests/aila-chat/full-romans.test.ts

+      await page.waitForURL(/\/aila\/.+/);
+      await waitForGeneration(page, generationTimeout);
+      await expectSectionsComplete(page, 1);
+      await page.waitForTimeout(500);


These waits are an antipattern. If we don't allow enough time before sending the next message, the completed section count goes backwards. I think it's a race condition with the useEffects in the chat and fetching state from tRPC after streaming. It was happening before and after #147

codeincontext · 2024-09-23T16:05:59Z

apps/nextjs/tests-e2e/tests/aila-chat/full-romans.test.ts

+    await test.step("Go to downloads page", async () => {
+      // Open 'download resources' menu
+      const downloadResources = page.getByTestId("chat-download-resources");
+      await downloadResources.click();
+      page.waitForURL(/\aila\/.*\/download/);
+
+      // Click to download lesson plan
+      const downloadLessonPlan = page.getByTestId("chat-download-lesson-plan");
+      await downloadLessonPlan.click();
+
+      // Skip feedback form
+      await page.getByLabel("Skip").click();
+      page.getByRole("heading", { name: "Download resources" });
+    });


Don't worry about this too much, as I'm putting together a new PR to split it out into a separate test

sonarcloud · 2024-09-23T16:45:28Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

mantagen

Brilliant!

codeincontext added 5 commits September 19, 2024 11:36

wip: add llm fixtures

d95cb87

feat: wrap up initial fixtures for streaming LLM

c170c59

Reinstate chatLlmService

7fc2a38

Fix paths

0c6721a

test: add fixture mocks for moderation

9dfc03d

codeincontext marked this pull request as ready for review September 20, 2024 14:38

Merge branch 'main' into test/e2e-moderation-fixtures

4343f23

codeincontext force-pushed the test/e2e-moderation-fixtures branch from 2175477 to 4343f23 Compare September 20, 2024 14:43

vercel bot deployed to Preview September 20, 2024 14:48 View deployment

codeincontext added 5 commits September 23, 2024 15:59

Use json for formatted file

8b9c3b9

Use route.fallback instead of continue

0f488de

Update fixtures and await section completion before continuing

73325e7

Merge remote-tracking branch 'origin/main' into test/e2e-moderation-f…

c163d05

…ixtures

Add timeouts seemingly needed to fix race condition

7944090

vercel bot deployed to Preview September 23, 2024 14:28 View deployment

Fix paths

c88ccfe

vercel bot deployed to Preview September 23, 2024 15:08 View deployment

vercel bot deployed to Preview September 23, 2024 15:20 View deployment

codeincontext force-pushed the test/e2e-moderation-fixtures branch from f7609b5 to a4f487e Compare September 23, 2024 15:31

vercel bot deployed to Preview September 23, 2024 15:34 View deployment

codeincontext force-pushed the test/e2e-moderation-fixtures branch 2 times, most recently from 2c6f4a6 to 31146cb Compare September 23, 2024 15:45

vercel bot deployed to Preview September 23, 2024 15:50 View deployment

Increase race condition timeout for staging

f85f237

codeincontext force-pushed the test/e2e-moderation-fixtures branch from 31146cb to f85f237 Compare September 23, 2024 15:54

vercel bot deployed to Preview September 23, 2024 15:57 View deployment

codeincontext commented Sep 23, 2024

View reviewed changes

vercel bot deployed to Preview September 23, 2024 16:12 View deployment

Make the skip feedback button optional

ba70c21

codeincontext force-pushed the test/e2e-moderation-fixtures branch from 76b026d to ba70c21 Compare September 23, 2024 16:43

vercel bot deployed to Preview September 23, 2024 16:46 View deployment

codeincontext requested a review from a team September 23, 2024 17:28

mantagen approved these changes Sep 25, 2024

View reviewed changes

codeincontext merged commit ebf836f into main Sep 25, 2024
14 checks passed

codeincontext deleted the test/e2e-moderation-fixtures branch September 25, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: record and replay streaming LLM calls in e2e tests #149

test: record and replay streaming LLM calls in e2e tests #149

codeincontext commented Sep 20, 2024 •

edited

Loading

vercel bot commented Sep 20, 2024 •

edited

Loading

github-actions bot commented Sep 20, 2024 •

edited

Loading

codeincontext Sep 23, 2024

codeincontext Sep 23, 2024

sonarcloud bot commented Sep 23, 2024

mantagen left a comment

test: record and replay streaming LLM calls in e2e tests #149

test: record and replay streaming LLM calls in e2e tests #149

Conversation

codeincontext commented Sep 20, 2024 • edited Loading

Description

How to test

vercel bot commented Sep 20, 2024 • edited Loading

github-actions bot commented Sep 20, 2024 • edited Loading

Playwright e2e tests

codeincontext Sep 23, 2024

Choose a reason for hiding this comment

codeincontext Sep 23, 2024

Choose a reason for hiding this comment

sonarcloud bot commented Sep 23, 2024

Quality Gate passed

mantagen left a comment

Choose a reason for hiding this comment

codeincontext commented Sep 20, 2024 •

edited

Loading

vercel bot commented Sep 20, 2024 •

edited

Loading

github-actions bot commented Sep 20, 2024 •

edited

Loading