Skip to content

JustWatch, a widely-used platform, enables users to discover movies and TV shows across multiple streaming services like Netflix, Amazon Prime, and Hulu. Scraping data from JustWatch via Selenium, Python, and BeautifulSoup is required for this task. Utilize Pandas for filtering and analysis, and save results as a CSV file.

Notifications You must be signed in to change notification settings

Wolverine-Shiva/Web-Scraping-www.JustWatch.com

Repository files navigation

Web-Scraping-www.JustWatch.com

image

Image Courtesy : https://m.media-amazon.com/images/I/A12qIaW9s1L.png

Website:

JustWatch - https://www.justwatch.com/in/movies?release_year_from=2000

Description:

JustWatch is a popular platform that allows users to search for movies and TV shows across multiple streaming services like Netflix, Amazon Prime, Hulu, etc. For this assignment, you will be required to scrape movie and TV show data from JustWatch using Selenium, Python, and BeautifulSoup. Extract data from HTML, not by directly calling their APIs. Then, perform data filtering and analysis using Pandas, and finally, save the results to a CSV file.

Tasks:

1. Web Scraping:

Use BeautifulSoup to scrape the following data from JustWatch:

a. Movie Information:

  - Movie title
  - Release year
  - Genre
  - IMDb rating
  - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
  - URL to the movie page on JustWatch

b. TV Show Information:

  - TV show title
  - Release year
  - Genre
  - IMDb rating
  - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
  - URL to the TV show page on JustWatch

c. Scope:

 ` - Scrape data for at least 50 movies and 50 TV shows.
   - You can choose the entry point (e.g., starting with popular movies,
     or a specific genre, etc.) to ensure a diverse dataset.`

2. Data Filtering & Analysis:

After scraping the data, use Pandas to perform the following tasks:

a. Filter movies and TV shows based on specific criteria:

   - Only include movies and TV shows released in the last 2 years (from the current date).
   - Only include movies and TV shows with an IMDb rating of 7 or higher.

b. Data Analysis:

   - Calculate the average IMDb rating for the scraped movies and TV shows.
   - Identify the top 5 genres that have the highest number of available movies and TV shows.
   - Determine the streaming service with the most significant number of offerings.
   

3. Data Export:

   - Dump the filtered and analysed data into a CSV file for further processing and reporting.

   - Keep the CSV file in your Drive Folder and Share the Drive link on the colab while keeping view access with anyone.

Submission:

- Submit a link to your Colab made for the assignment.

- The Colab should contain your Python script (.py format only) with clear
  comments explaining the scraping, filtering, and analysis process.

- Your Code shouldn't have any errors and should be executable at a one go.

- Before Conclusion, Keep your Dataset Drive Link in the Notebook.

Note:

  1. Properly handle errors and exceptions during web scraping to ensure a robust script.

  2. Make sure your code is well-structured, easy to understand, and follows Python best practices.

  3. The assignment will be evaluated based on the correctness of the scraped data, accuracy of data filtering and analysis, and the overall quality of the Python code.

Data Set:

https://drive.google.com/drive/folders/1LS9-du5o6tZRGaipWBaIJBTrgIWAhXp_?usp=drive_link

About

JustWatch, a widely-used platform, enables users to discover movies and TV shows across multiple streaming services like Netflix, Amazon Prime, and Hulu. Scraping data from JustWatch via Selenium, Python, and BeautifulSoup is required for this task. Utilize Pandas for filtering and analysis, and save results as a CSV file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published