This is a Python-based web scraper that extracts and counts the occurrences of common words from a given URL. The script supports the use of proxies for scraping. It utilizes the requests
and BeautifulSoup
libraries for web scraping and parsing HTML content.
You can download this project by either cloning the repository or downloading it as a ZIP file.
pip install requests beautifulsoup4
git clone https://github.com/gamemaster123356/CommonScraper.git
After downloading, you can navigate to the project directory.
cd CommonScraper
OR
You can also download the ZIP file by clicking on the green "Code" button in the GitHub repository and then selecting "Download ZIP".
After downloading and extracting, you can navigate to the project directory.
cd CommonScraper-main
To use the Common Words Scraper, run the script common_scraper.py
and follow the instructions on the command line. The script allows you to specify a URL and choose the HTML elements to scrape.
python commonscraper.py
You can specify proxies by providing the --proxies
argument, followed by one or more proxies separated by commas.
python commonscraper.py --proxies=http://myproxy.com,https://myproxy.com
The script requires the following Python packages:
requests
beautifulsoup4
You can install the required packages using the following command:
pip install requests beautifulsoup4
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
The script is for educational and informational purposes only. Make sure to comply with ethical web scraping practices and respect the terms of service of the websites you are scraping.