Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add YouTube transcript extraction component and frontend integration #4502

Merged
merged 16 commits into from
Nov 12, 2024

Conversation

Cristhianzl
Copy link
Member

@Cristhianzl Cristhianzl commented Nov 11, 2024

This pull request adds a new component for extracting transcripts from YouTube videos and integrates it into the frontend. The most important changes include the addition of the YouTubeTranscriptsComponent, updates to the frontend to support the new component, and a new test to verify its functionality.

Backend Changes:

  • Added YouTubeTranscriptsComponent to import list and component registry in __init__.py (src/backend/base/langflow/components/tools/__init__.py) [1] [2].
  • Implemented YouTubeTranscriptsComponent class to handle YouTube transcript extraction, including methods for building transcripts and creating a structured tool (src/backend/base/langflow/components/tools/youtube_transcripts.py).

Frontend Changes:

  • Added YouTubeSvgIcon component to render the YouTube icon (src/frontend/src/icons/Youtube/index.tsx).
  • Imported YouTubeIcon and added it to the nodeIconsLucide export in styleUtils.ts (src/frontend/src/utils/styleUtils.ts) [1] [2].

Testing:

  • Added a new Playwright test to verify that users can use the YouTube transcripts component and extract transcripts from a video (src/frontend/tests/extended/integrations/youtube-transcripts.spec.ts).This pull request includes the addition of a new component to extract YouTube transcripts and various test cleanups. The most important changes include adding the YouTubeTranscriptsComponent, updating the __init__.py file to include this component, and removing redundant imports in several test files.

New Feature:

  • Added YouTubeTranscriptsComponent to extract spoken content from YouTube videos as transcripts. (src/backend/base/langflow/components/tools/youtube_transcripts.py)

Initialization Updates:

  • Updated __init__.py to include the new YouTubeTranscriptsComponent. (src/backend/base/langflow/components/tools/__init__.py) [1] [2]

@Cristhianzl Cristhianzl self-assigned this Nov 11, 2024
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Nov 11, 2024
@Cristhianzl Cristhianzl changed the title feat: Add YouTubeTranscriptsComponent for YouTube Transcript Extraction feat: Add YouTube transcripts extraction component Nov 11, 2024
Copy link

codspeed-hq bot commented Nov 11, 2024

CodSpeed Performance Report

Merging #4502 will degrade performances by 16.86%

Comparing cz/add-youtube-transcript-component (fe0cfb0) with main (7dfce1d)

Summary

⚡ 2 improvements
❌ 2 regressions
✅ 11 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main cz/add-youtube-transcript-component Change
test_build_flow_from_request_data 5,824.4 ms 785.2 ms ×7.4
test_successful_run_with_input_type_any 456.5 ms 531.3 ms -14.07%
test_successful_run_with_input_type_text 532.4 ms 455.4 ms +16.91%
test_successful_run_with_output_type_debug 379.2 ms 456.1 ms -16.86%

…nscripts component in the frontend to ensure user can interact with it successfully
@Cristhianzl Cristhianzl changed the title feat: Add YouTube transcripts extraction component feat: Add YouTube transcript extraction component and frontend integration Nov 11, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 11, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 11, 2024
@Cristhianzl Cristhianzl enabled auto-merge (squash) November 11, 2024 20:00
ogabrielluiz and others added 10 commits November 11, 2024 17:16
- Introduced `YoutubeApiSchema` for structured input validation.
- Updated `youtube_transcripts` method to use `TranscriptFormat` enum.
- Improved error handling by raising `ToolException` for transcript retrieval failures.
- Removed redundant `YoutubeApiSchema` class definition within `YouTubeTranscriptsComponent`.
…roper execution of drag and drop action in the test

🔧 (youtube-transcripts.spec.ts): introduce a loop to handle multiple instances of outdated components before proceeding with the test execution
@Cristhianzl Cristhianzl merged commit 84dd031 into main Nov 12, 2024
29 checks passed
@Cristhianzl Cristhianzl deleted the cz/add-youtube-transcript-component branch November 12, 2024 13:13
diogocabral pushed a commit to headlinevc/langflow that referenced this pull request Nov 26, 2024
…ation (langflow-ai#4502)

* add new youtube transcripts component

* [autofix.ci] apply automated fixes

* ✨ (youtube-transcripts.spec.ts): add integration test for youtube transcripts component in the frontend to ensure user can interact with it successfully

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants