GitHub - shine-jayakumar/Web-Scraping-With-Python: Script to extract customer reviews from a webpage while bypassing bot challenge

Web-Scraping with Python

In this project, we’re going to programmatically extract customer reviews from Walmart’s webpage using Python. The script will automatically navigate to the next page simulating user interaction. We’d also see how to bypass the webpage’s bot-challenge

Table of Contents

Packages
Bypass Bot Challenge
Script Link

Packages

Selenium
Pandas

Bypass Bot Challenge

The page implements bot-challenge in two ways:

The bot-challenge page loads on top of the main page and prevents users from interacting with the webpage until the user presses and hold the mouse button for 4-5 second. Once the action is complete, the script loads the main page and allows users to interact with the page.

The bot-challenge script blocks the main url and doesn’t load the main page at all.

Solution:

Scenario 1: We’re going to use JavaScript to locate and remove the DIV element which contains the bot challenge. This would let us have access to the main page, however, the page wouldn’t still allow the users to interact with the page yet. For this, we need to remove the CLASS tag from the BODY element. Since the bot challenge shows up on every page, it would be sensible to create a function to perform this operation.

Scenario 2: To tackle the second scenario, We're simply going to refresh the webpage until the url is unblocked

Script Link

Link: scraper.py

Disclaimer: This script and information provided in this project is for educational purposes only

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Bot-challenge.JPG		Bot-challenge.JPG
Bot-challenge1.JPG		Bot-challenge1.JPG
LICENSE		LICENSE
README.md		README.md
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Scraping with Python

Packages

Bypass Bot Challenge

Solution:

Script Link

About

Releases

Packages

Languages

License

shine-jayakumar/Web-Scraping-With-Python

Folders and files

Latest commit

History

Repository files navigation

Web-Scraping with Python

Packages

Bypass Bot Challenge

Solution:

Script Link

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages