Skip to content

In the first section, I pulled up the tmdb datasets. Next, then I cleaned the data, fixed the data type, split the rows which carry "|", remove null values, and remove duplicate rows. Before cleaning the datasets, I had 10,866 rows and 21 columns. After cleaning the datasets, In the exploratory section, I looked for the findings. For instance, a…

Notifications You must be signed in to change notification settings

ShakhawatHassan/TMDB-Movies_Data_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 

Repository files navigation

Data Analysis on TMDB-Movies

by Shakhawat Hassan

Dataset

This project is a data analysis on TMDB-movies datasets. By analyzing the datasets, I will have an idea of which genres are performing well or not. In order for me to do this analysis, I have to do the followings,

  • Introduction
  • Preliminary Wrangling
    • Data imported from themoviedb.org
  • Data Wrangling
    • Data Cleaning, Fixing the null values
  • Data Visualization
    • Statistical Models/resuls, Exploratory Data Analysis
  • Summary
    • Conclusion, Limitation

I will try to answer these questions based on the datasets. Which movies earned the most? Which genres are more popular than other genres? Which properties generated the most revenues? Which genres are performing well over the years? Which are the 10 highest-grossing movies of all time?

Summary of Findings

In the exploratory section, I looked for the findings. For instance, action, adventure, drama, comedy, and thriller genres are most popular from year to year (1960-2015). Action and adventure take the highest place for genres. Avatar is the highest-grossing movie of all time (rev 2.78 Billion). Furthermore, The mean revenue was about 107 Million, and the highest budget for a film was about 425 million. As the year goes on, the number of movies is increasing.

On the same hand, genres, casts, directors are the main properties that heavily depend on a film's revenue. For example, Avatar is an action-packed film which is directed by James Cameron, and it's the highest-grossing film of all time (as of 2015). The top 3 directors are Steven Spielberg, James Cameron, and Peter Jackson. Steven Spielberg is the number one director among all of them. Moreover, Harrison Ford, Tom Cruise, and Tom Hanks are the ones who have gotten the highest revenues. Lastly, there hasn't been any correlation found between budget and revenue, and revenue and runtime.

Tools Used for this Project

  • Packages: Anaconda, Conda
  • Libraries: Pandas, NumPy, Matplotlb, Seaborn
  • Web Application: Jupyter Notebook
  • Programming Language: Python 3

About

In the first section, I pulled up the tmdb datasets. Next, then I cleaned the data, fixed the data type, split the rows which carry "|", remove null values, and remove duplicate rows. Before cleaning the datasets, I had 10,866 rows and 21 columns. After cleaning the datasets, In the exploratory section, I looked for the findings. For instance, a…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published