Web-Scraping-www.JustWatch.com

Image Courtesy : https://m.media-amazon.com/images/I/A12qIaW9s1L.png

Website:

JustWatch - https://www.justwatch.com/in/movies?release_year_from=2000

Description:

JustWatch is a popular platform that allows users to search for movies and TV shows across multiple streaming services like Netflix, Amazon Prime, Hulu, etc. For this assignment, you will be required to scrape movie and TV show data from JustWatch using Selenium, Python, and BeautifulSoup. Extract data from HTML, not by directly calling their APIs. Then, perform data filtering and analysis using Pandas, and finally, save the results to a CSV file.

Tasks:

1. Web Scraping:

Use BeautifulSoup to scrape the following data from JustWatch:

a. Movie Information:

  - Movie title
  - Release year
  - Genre
  - IMDb rating
  - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
  - URL to the movie page on JustWatch

b. TV Show Information:

  - TV show title
  - Release year
  - Genre
  - IMDb rating
  - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
  - URL to the TV show page on JustWatch

c. Scope:

 ` - Scrape data for at least 50 movies and 50 TV shows.
   - You can choose the entry point (e.g., starting with popular movies,
     or a specific genre, etc.) to ensure a diverse dataset.`

2. Data Filtering & Analysis:

After scraping the data, use Pandas to perform the following tasks:

a. Filter movies and TV shows based on specific criteria:

   - Only include movies and TV shows released in the last 2 years (from the current date).
   - Only include movies and TV shows with an IMDb rating of 7 or higher.

b. Data Analysis:

   - Calculate the average IMDb rating for the scraped movies and TV shows.
   - Identify the top 5 genres that have the highest number of available movies and TV shows.
   - Determine the streaming service with the most significant number of offerings.

3. Data Export:

   - Dump the filtered and analysed data into a CSV file for further processing and reporting.

   - Keep the CSV file in your Drive Folder and Share the Drive link on the colab while keeping view access with anyone.

Submission:

- Submit a link to your Colab made for the assignment.

- The Colab should contain your Python script (.py format only) with clear
  comments explaining the scraping, filtering, and analysis process.

- Your Code shouldn't have any errors and should be executable at a one go.

- Before Conclusion, Keep your Dataset Drive Link in the Notebook.

Note:

Properly handle errors and exceptions during web scraping to ensure a robust script.
Make sure your code is well-structured, easy to understand, and follows Python best practices.
The assignment will be evaluated based on the correctness of the scraped data, accuracy of data filtering and analysis, and the overall quality of the Python code.

Data Set:

https://drive.google.com/drive/folders/1LS9-du5o6tZRGaipWBaIJBTrgIWAhXp_?usp=drive_link

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Datasets of JustWatch		Datasets of JustWatch
Movies_Data.csv		Movies_Data.csv
Movies_filtered_data.csv		Movies_filtered_data.csv
README.md		README.md
Tv_Show_Data.csv		Tv_Show_Data.csv
Tv_filttered_data.csv		Tv_filttered_data.csv
Web_Scraping_assign_for_Numerical_Prog_in_Python_2.ipynb		Web_Scraping_assign_for_Numerical_Prog_in_Python_2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Scraping-www.JustWatch.com

Website:

Description:

Tasks:

Data Set:

About

Releases

Packages

Languages

Wolverine-Shiva/Web-Scraping-www.JustWatch.com

Folders and files

Latest commit

History

Repository files navigation

Web-Scraping-www.JustWatch.com

Website:

Description:

Tasks:

Data Set:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages