The goal of this notebook is to share data visualization tools and techniques along to communicate practiques and methods to an efficient and interesting data storytelling. Join me on this funny journey โ
The Netflix Movies and TV Shows Dataset from Bansal, S. (2021), is a tabular dataset consists of listings of all the movies and tv shows available on Netflix, along with details such as - cast, directors, ratings, release year, duration, etc.
Netflix is one of the most popular media and video streaming platforms. They have over 8000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. [1]
- Type (movie or TV show)
- Title
- Director
- Cast
- Country
- Date added
- Release year
- Rating (TV-MA, TV-14, TV-PG, etc)
- Duration (in minutes if it's movie or seasons if it's TV show)
- Listed in (category)
- Description
As we can observe, the Netflix catalog has approximately 70% movies over 30% TV shows. Also we can see that the catalog contains 6131 movies and 2676 TV shows, that's 8807 elements of content!!
Now, we have interest in see how is the distribution of content in different countries, so we choose 8 arbitrary countries. Notice how every country keeps the relation between movies and TV shows, except for India and Japan. India has almost 92% of movies, in the other hand Japan has almost 63% for TV shows (maybe cause its affinity to anime )The attriburte Director is the one with most missing data, but as a exercise with wanted to show the top 15 directors who appear the most.
The attriburte Cast is the third with most missing data, but also, the attribute with the mst number of elements, it contains almost 40000 actors, so as a exercise with wanted to show the top 40 actors who appear the most.
In the following chart, we can see the elements that every country has, as expected, the US has the most ammount of elements, followed by India and UK, the following countries has a simillar proportion with each other.
Ih the following plot, we can observe the ammount of elements that were added to the Netflix catalog, notice that in gray color are the movies and in red color are the TV shows, also the black vertical bars indicate a different year. From this plot we can notice several thigs:
- the ammount of items per movie and per TV shows has increase over the years
- the months where the elements are added tend to be in the beginning and ending of the year
- the porportion movies/TV shows remains over the years
In this plot we can infer several things:
- in the 2000 decade the amount of movies/TV shows increased considerably
- after 2018 the amount decrease, maybe because un updated data or maybe because pandemic
- the plot follows an exponential distributtion
In this pie chart, we can observe the rating distributtion taht the Betflix database has, as we can notice, almost 40% is for TV-MA audidence, follow by TV-14 and TV-PG, as we expected, because the goal of Netflix is to capture the attention of a global audience. Of course, it has content restricted for adults and in counterpart, content dedicated for kids, bith in less proportions.
In ths chart of TV shows seasons we can see the almost all the TV shows has 1 seasons, followed by 2, 3, and so on.
In the case of Movies, the average tend to be 100 minutes which is approximately 1 hour 40 minutes.
For the category, we can observe the top categories in a wordcloud, the bigger the word the most appear in the catalog.
In addition, we wanted to know the top 4 categories in 8 arbitrary countries. As we can expect, every country have different tastes for each category.
Finally, we made an interesting wordcloud: We take the synopsis of all the 8807 elements ad extract the 150 most repeated words, those words where placed over the Netflix logo and the result was the following wordcloud!!
[1] Bansal, S. (2021). Netflix Movies and TV Shows. Kaggle. https://www.kaggle.com/datasets/shivamb/netflix-shows