Skip to content

Analyzing a dataset containing information about 10,000 movies from 1960 to 2015, collected from The Movie Database (TMDb) and determining factors affecting movie profitability and commercial success

Notifications You must be signed in to change notification settings

Fuenj/Exploratory-Data-Analysis-EDA-with-Python-The-Movie-Database-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Investigate-TMDb-Movie-Database

Every year, thousands of movies get released but only a percentage of those become successful. The aim of this work is to analyze determining factors affecting movie profitability and commercial success.

Project overview

The main objective of this project is to analyze a dataset containing information about 10,000 movies from 1960 to 2015, collected from The Movie Database (TMDb) and including all films details such as the production cost, the revenue generated, rating information, actors and directors, etc. This work also tries to find answers to the questions below:

Research questions

1- What kinds of properties are associated with most and less successful movies?

  • Which Movie had the highest or lowest profit?
  • Which year the movie industry** made the highest profit?
  • Which Month the movie industry made the highest profit?
  • Do popular movies get higher profit?
  • What were the most or least expensive movies?
  • What is the statistical relationship between budget and profit?
  • Do movies with highest budget get highest rating?
  • Which Movie had the highest or lowest revenue?
  • Is there any statistical relationship between revenue and profit/ revenue and budget?
  • What is the movie length most liked by the audience?
  • Which movie was high or less rated?
  • Do high rated movies get higher profit?
2- What are the Top 10 movies according to different features ? in particular :
  • Profit
  • Budget
  • Revenue
  • Popularity
3- Which genres are most popular and profitable overall and overtime?
  • Which genres are more profitable overall?
  • Which genres are more profitable from year to year?
  • Which genres are more popular overall?
  • What is the evolution of the genres according to popularity from year to year?
4- What are top 10 Casts, Directors and production companies ?

Project Objectives

This is a project that I was working on for Udacity Data Analyst Nanodegree. In this project, i'll go through the data analysis process and see how everything fits together. I will use the Python libraries NumPy, pandas, and Matplotlib to make my analysis easier.

Loading project requirements

You will need an installation of Python, plus the following libraries:

  • pandas
  • NumPy
  • Matplotlib
  • csv

I recommend installing Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

Conclusion

What I learned

  • All the steps involved in a typical data analysis process;
  • Comfortable posing questions that can be answered with a given dataset and then answering those questions;
  • Investigate problems in a dataset and wrangle the data;
  • Communicating the results of my analysis;
  • Use vectorized operations in NumPy and pandas to speed up data analysis code;
  • More familiar with pandas' Series and DataFrame objects;
  • Use Matplotlib to produce plots showing your findings.

Evaluation

My project was reviewed by a Udacity reviewer. All criteria found in the rubric must be meeting specifications for me to pass.

About

Analyzing a dataset containing information about 10,000 movies from 1960 to 2015, collected from The Movie Database (TMDb) and determining factors affecting movie profitability and commercial success

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published