This project is a data analysis on TMDB-movies datasets. By analyzing the datasets, I will have an idea of which genres are performing well or not. In order for me to do this analysis, I have to do the followings,
- Introduction
- Preliminary Wrangling
- Data imported from themoviedb.org
- Data Wrangling
- Data Cleaning, Fixing the null values
- Data Visualization
- Statistical Models/resuls, Exploratory Data Analysis
- Summary
- Conclusion, Limitation
I will try to answer these questions based on the datasets. Which movies earned the most? Which genres are more popular than other genres? Which properties generated the most revenues? Which genres are performing well over the years? Which are the 10 highest-grossing movies of all time?
In the exploratory section, I looked for the findings. For instance, action, adventure, drama, comedy, and thriller genres are most popular from year to year (1960-2015). Action and adventure take the highest place for genres. Avatar is the highest-grossing movie of all time (rev 2.78 Billion). Furthermore, The mean revenue was about 107 Million, and the highest budget for a film was about 425 million. As the year goes on, the number of movies is increasing.
On the same hand, genres, casts, directors are the main properties that heavily depend on a film's revenue. For example, Avatar is an action-packed film which is directed by James Cameron, and it's the highest-grossing film of all time (as of 2015). The top 3 directors are Steven Spielberg, James Cameron, and Peter Jackson. Steven Spielberg is the number one director among all of them. Moreover, Harrison Ford, Tom Cruise, and Tom Hanks are the ones who have gotten the highest revenues. Lastly, there hasn't been any correlation found between budget and revenue, and revenue and runtime.
- Packages: Anaconda, Conda
- Libraries: Pandas, NumPy, Matplotlb, Seaborn
- Web Application: Jupyter Notebook
- Programming Language: Python 3