🎬 Netflix Data Analysis & Database Design 🎥

Welcome to the Netflix Data Analysis & Database Normalization project! This project explores the process of cleaning, normalizing, and analyzing Netflix data using SQL and database best practices. It also dives into database design principles to ensure optimized data storage and retrieval.

📑 Project Overview

The original dataset contained two key tables:

titles: Information on unique shows and movies.
credits: Details on the cast and crew involved in each show or movie.

Problem

We discovered discrepancies between the two tables:

The credits table contained more unique show IDs than the titles table, leading to inconsistencies.

Solution

We created a unified view by selecting only the common records from both tables, ensuring data consistency throughout the analysis. Afterward, we applied database normalization techniques to split the data into smaller, well-organized tables.

🗂️ Database Design & Normalization

Key Benefits:

Less Data Duplication: Improved storage efficiency by reducing redundancy.
Increased Data Integrity: Accurate and consistent data across all tables.
Improved Query Performance: Faster and more efficient queries through proper indexing and structure.
Enhanced Security: More controlled access to sensitive information.

Database Design

Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate.

After cleaning the Netflix data in Part 1, we obtained two tables - 'titles' containing information about unique shows/movies and 'credits' containing information about the castings in different shows/movies. The data is now distributed in these two tables.

When we counted the unique shows in each of the tables (since both have id column which corresponds to unique shows), we found out that the number of unique shows in credits table is higher than the titles table.

Process Overview:

Conceptual Data Model: High-level view of key entities and relationships.
Logical Data Model: Detailed relationships and entity specifications.
Physical Data Model: Actual implementation of the tables, ensuring optimal performance.

🔍 Data Analysis Highlights

Insights:

We performed Exploratory Data Analysis (EDA) to uncover trends in popular genres, actor appearances, and the distribution of shows across different ratings.
The normalized tables made it easy to run complex queries on specific data points, providing deeper insights into Netflix's vast content library.

Queries:

Some of the SQL queries we explored:

Most Frequent Actors: Identify which actors appear most often in Netflix shows.
Genre Popularity: Analyze which genres dominate Netflix’s catalog.
Rating Distributions: Understand how shows and movies are rated across various regions.

Visualization:

We used tools like Tableau and Power BI to visualize the findings. Here's an example of how the data looks post-normalization:

💡 Why Normalize Your Database?

Normalization is crucial for:

Ensuring data consistency across related tables.
Eliminating redundancy, so each piece of data is stored only once.
Making your database scalable, easier to manage, and more flexible for future changes.

🚀 Getting Started

Prerequisites:

SQL (PostgreSQL, MySQL)
Python (for additional data analysis)
Tableau/Power BI (for visualizations)

Steps:

Clone the repository to your local environment:

git clone https://github.com/mayankyadav23/Netflix-Data-Analysis.git

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Cleaned Datasets		Cleaned Datasets
Datasets		Datasets
Queries		Queries
LICENSE		LICENSE
Netflix Data Analysis Dashboard.png		Netflix Data Analysis Dashboard.png
Netflix Project Proposal.pptx		Netflix Project Proposal.pptx
Netflix.twbx		Netflix.twbx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Netflix Data Analysis & Database Design 🎥

📑 Project Overview

Problem

Solution

🗂️ Database Design & Normalization

Key Benefits:

Database Design

Process Overview:

🔍 Data Analysis Highlights

Insights:

Queries:

Visualization:

💡 Why Normalize Your Database?

🚀 Getting Started

Prerequisites:

Steps:

About

Releases

Packages

License

mayankyadav23/Netflix-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

🎬 Netflix Data Analysis & Database Design 🎥

📑 Project Overview

Problem

Solution

🗂️ Database Design & Normalization

Key Benefits:

Database Design

Process Overview:

🔍 Data Analysis Highlights

Insights:

Queries:

Visualization:

💡 Why Normalize Your Database?

🚀 Getting Started

Prerequisites:

Steps:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages