-
Notifications
You must be signed in to change notification settings - Fork 15.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor(document_loaders): abstract page evaluation logic in Playwri…
…ghtURLLoader (#9995) This PR brings structural updates to `PlaywrightURLLoader`, aiming at making the code more readable and extensible through the abstraction of page evaluation logic. These changes also align this implementation with a similar structure used in LangChain.js. The key enhancements include: 1. Introduction of 'PlaywrightEvaluator', an abstract base class for all evaluators. 2. Creation of 'UnstructuredHtmlEvaluator', a concrete class implementing 'PlaywrightEvaluator', which uses `unstructured` library for processing page's HTML content. 3. Extension of 'PlaywrightURLLoader' constructor to optionally accept an evaluator of the type 'PlaywrightEvaluator'. It defaults to 'UnstructuredHtmlEvaluator' if no evaluator is provided. 4. Refactoring of 'load' and 'aload' methods to use the 'evaluate' and 'evaluate_async' methods of the provided 'PageEvaluator' for page content handling. This update brings flexibility to 'PlaywrightURLLoader' as it can now utilize different evaluators for page processing depending on the requirement. The abstraction also improves code maintainability and readability. Twitter: @ywkim
- Loading branch information
Showing
2 changed files
with
153 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters