Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This commit focuses on optimizing the utility modules in the codebase… #775

Merged
merged 16 commits into from
Oct 30, 2024

Conversation

U-C4N
Copy link
Contributor

@U-C4N U-C4N commented Oct 28, 2024

… for better performance and maintainability. Key improvements include: - More efficient HTML processing with combined regex operations and optimized tag handling - Enhanced deep copy functionality with better type handling and optimized recursion - Refactored web search with improved error handling and modular helper functions The changes maintain all existing functionality while improving code quality, performance, and maintainability. Documentation and type hints have been enhanced throughout.

Optimize utils modules for better performance and maintainability

  • Improve HTML cleanup and minification:

    • Combine regex operations for better performance
    • Add better error handling for HTML processing
    • Optimize tag removal and attribute filtering
  • Enhance deep copy functionality:

    • Add special case handling for primitive types
    • Improve type checking and error handling
    • Optimize recursive copying for collections
  • Refactor web search functionality:

    • Add input validation and error handling
    • Split search logic into separate helper functions
    • Improve proxy handling and configuration
    • Add better timeout and error management
    • Optimize URL filtering and processing

Technical improvements:

  • Better type hints and documentation
  • More efficient data structures
  • Improved error handling and validation
  • Reduced code duplication
  • Better separation of concerns

No breaking changes - all existing functionality maintained

VinciGit00 and others added 15 commits October 24, 2024 15:28
## [1.27.0](ScrapeGraphAI/Scrapegraph-ai@v1.26.7...v1.27.0) (2024-10-26)

### Features

* add conditional node structure to the smart_scraper_graph and implemented a structured way to check condition ([cacd9cd](ScrapeGraphAI@cacd9cd))
* add integration with scrape.do ([ae275ec](ScrapeGraphAI@ae275ec))
* add model integration gpt4 ([51c55eb](ScrapeGraphAI@51c55eb))
* implement ScrapeGraph class for only web scraping automation ([612c644](ScrapeGraphAI@612c644))
* Implement SmartScraperMultiParseMergeFirstGraph class that scrapes a list of URLs and merge the content first and finally generates answers to a given prompt. ([3e3e1b2](ScrapeGraphAI@3e3e1b2))
* refactoring of export functions ([0ea00c0](ScrapeGraphAI@0ea00c0))
* refactoring of get_probable_tags node ([f658092](ScrapeGraphAI@f658092))
* refactoring of ScrapeGraph to SmartScraperLiteGraph ([52b6bf5](ScrapeGraphAI@52b6bf5))

### Bug Fixes

* fix export function ([c8a000f](ScrapeGraphAI@c8a000f))
* fix the example variable name ([69ff649](ScrapeGraphAI@69ff649))
* remove variable "max_result" not being used in the code ([e76a68a](ScrapeGraphAI@e76a68a))

### chore

* fix example ([9cd9a87](ScrapeGraphAI@9cd9a87))

### Test

* Add scrape_graph test ([cdb3c11](ScrapeGraphAI@cdb3c11))
* Add smart_scraper_multi_parse_merge_first_graph test ([464b8b0](ScrapeGraphAI@464b8b0))

### CI

* **release:** 1.26.6-beta.1 [skip ci] ([e0fc457](ScrapeGraphAI@e0fc457))
* **release:** 1.27.0-beta.1 [skip ci] ([9266a36](ScrapeGraphAI@9266a36))
* **release:** 1.27.0-beta.10 [skip ci] ([eee131e](ScrapeGraphAI@eee131e))
* **release:** 1.27.0-beta.2 [skip ci] ([d84d295](ScrapeGraphAI@d84d295))
* **release:** 1.27.0-beta.3 [skip ci] ([f576afa](ScrapeGraphAI@f576afa))
* **release:** 1.27.0-beta.4 [skip ci] ([3d6bbcd](ScrapeGraphAI@3d6bbcd))
* **release:** 1.27.0-beta.5 [skip ci] ([5002c71](ScrapeGraphAI@5002c71))
* **release:** 1.27.0-beta.6 [skip ci] ([94b9836](ScrapeGraphAI@94b9836))
* **release:** 1.27.0-beta.7 [skip ci] ([407f1ce](ScrapeGraphAI@407f1ce))
* **release:** 1.27.0-beta.8 [skip ci] ([4f1ed93](ScrapeGraphAI@4f1ed93))
* **release:** 1.27.0-beta.9 [skip ci] ([fd57cc7](ScrapeGraphAI@fd57cc7))
… for better performance and maintainability. Key improvements include: - More efficient HTML processing with combined regex operations and optimized tag handling - Enhanced deep copy functionality with better type handling and optimized recursion - Refactored web search with improved error handling and modular helper functions The changes maintain all existing functionality while improving code quality, performance, and maintainability. Documentation and type hints have been enhanced throughout.

Optimize utils modules for better performance and maintainability

- Improve HTML cleanup and minification:
  - Combine regex operations for better performance
  - Add better error handling for HTML processing
  - Optimize tag removal and attribute filtering

- Enhance deep copy functionality:
  - Add special case handling for primitive types
  - Improve type checking and error handling
  - Optimize recursive copying for collections

- Refactor web search functionality:
  - Add input validation and error handling
  - Split search logic into separate helper functions
  - Improve proxy handling and configuration
  - Add better timeout and error management
  - Optimize URL filtering and processing

Technical improvements:
- Better type hints and documentation
- More efficient data structures
- Improved error handling and validation
- Reduced code duplication
- Better separation of concerns

No breaking changes - all existing functionality maintained
@VinciGit00 VinciGit00 changed the base branch from main to pre/beta October 30, 2024 07:32
@VinciGit00 VinciGit00 merged commit bb2373d into ScrapeGraphAI:pre/beta Oct 30, 2024
1 check passed
Copy link

🎉 This PR is included in version 1.28.0-beta.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

github-actions bot commented Nov 1, 2024

🎉 This PR is included in version 1.28.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants