Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: record and replay streaming LLM calls in e2e tests #149

Merged
merged 14 commits into from
Sep 25, 2024

Conversation

codeincontext
Copy link
Contributor

@codeincontext codeincontext commented Sep 20, 2024

Description

This PR supersedes #130, as that PR didn't mock the moderation OpenAI calls

  • Playwright intercepts API calls to /chat and adds x-fixture-name and x-fixture-mode headers
  • Chat API handler looks for those headers and passes custom LLMService and modetaions OpenAi clients to Aila
    • The FixtureRecord LLMService proxies the OpenAI service and saves the chunks to a file. Moderation has an equivalent
    • The FixtureReplay LLMService streams the chunks in a fixture file. Moderation has an equivalent

Known issues:

  • There's (what I think is) a race condition which requires some explicit waits, otherwise the number of cpleted sections regresses between messages
    • "These waits are an antipattern. If we don't allow enough time before sending the next message, the completed section count goes backwards. I think it's a race condition with the useEffects in the chat and fetching state from tRPC after streaming. It was happening before and after feat: allow the user to interact while moderation is in progress #147"
    • I think we should just move forward and address that at a later date if it becomes important

How to test

  1. Check out the branch locally
  2. Pull doppler, or set AILA_FIXTURES_ENABLED=true in .env
  3. Run pnpm run test-e2e-ui
  4. Click through to aila-chat => full-romans.test.ts => Authenticated => "Full aila flow with Romans fixture"
  5. Run the test. If you make sure it's selected in the sidebar you should see the output on the right. Note that the auth setup will run first
  6. You should see the same content as the fixtures. Look at the .formatted.txt files to confirm

Copy link

vercel bot commented Sep 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
oak-ai-lesson-assistant ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 23, 2024 4:46pm

@codeincontext codeincontext marked this pull request as ready for review September 20, 2024 14:38
Copy link

github-actions bot commented Sep 20, 2024

Playwright e2e tests

Job summary

Download report

To view traces locally, unzip the report and run:

npx playwright show-report ~/Downloads/playwright-report

await page.waitForURL(/\/aila\/.+/);
await waitForGeneration(page, generationTimeout);
await expectSectionsComplete(page, 1);
await page.waitForTimeout(500);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These waits are an antipattern. If we don't allow enough time before sending the next message, the completed section count goes backwards. I think it's a race condition with the useEffects in the chat and fetching state from tRPC after streaming. It was happening before and after #147

Comment on lines 103 to 116
await test.step("Go to downloads page", async () => {
// Open 'download resources' menu
const downloadResources = page.getByTestId("chat-download-resources");
await downloadResources.click();
page.waitForURL(/\aila\/.*\/download/);

// Click to download lesson plan
const downloadLessonPlan = page.getByTestId("chat-download-lesson-plan");
await downloadLessonPlan.click();

// Skip feedback form
await page.getByLabel("Skip").click();
page.getByRole("heading", { name: "Download resources" });
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about this too much, as I'm putting together a new PR to split it out into a separate test

Copy link

sonarcloud bot commented Sep 23, 2024

Copy link
Collaborator

@mantagen mantagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant!

@codeincontext codeincontext merged commit ebf836f into main Sep 25, 2024
14 checks passed
@codeincontext codeincontext deleted the test/e2e-moderation-fixtures branch September 25, 2024 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants