Exercise 1 - Data Stewardship (2020S)

The purpose of this repository is to extract useful information about the IMDB's (https://www.imdb.com) Top 250 list. Said list comprises the 250 highest rated movies on the platform and we are using a 2019 snapshot of it, which was created by Nigel Cox and can be found here.

The particular tasks were to create plots that illustrate the distribution of the movies per decade, the most popular actors in the list and also the most popular genres.

The data transformation and visualization was done using the Python programming language, more specifically Python version 3.7.4. In order to replicate the results please make sure you are on this version, e.g. by using a virtual environment or a compatible Docker image.

If you are in the correct folder (src/) and have installed all required dependencies using

$ pip3 install -r requirements.txt

(or in some cases pip may be used instead of pip3), you can generate the plots using the following command:

$ python3 plots.py

Generated plots

Figure 1: Bar plot of the distribution of decades movies were released in IMDB's top 250

Figure 2: Lollipop plot of the 10 most popular actresses and actors in IMDB's top 250

Figure 3: Bar plot of the distribution of genres in IMDB's top 250

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
documentation		documentation
plots		plots
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exercise 1 - Data Stewardship (2020S)

Generated plots

About

Releases 4

Packages

Contributors 2

Languages

License

matthiasweiss/dsue1-2020s

Folders and files

Latest commit

History

Repository files navigation

Exercise 1 - Data Stewardship (2020S)

Generated plots

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages