Movie Recommendation System

This is a Python project to build a movie recommendation system using data extracted from a movie database API.
The project follows the guided blueprint provided by Ploomber, focusing on writing professional, modular, and well-documented code with thorough docstrings and exception handling within an OOP framework.
Additionally I have added a simple frontend using Streamlit. The entire application is containerized using Docker for easy setup and deployment.

Description

The project involves the following components:

🎬 Extracting movie data by calling TheMovieDB API
💾 Storing the data in a DuckDB database
📊 Performing exploratory data analysis with SQL in Jupyter Notebooks
🤖 Developing a movie recommendation system that uses TF-IDF and cosine similarity to generate reccomendations
🎞️ Takes a movie title as input and returns similar movie recommendations
⚙️ Packaging the notebooks and Python scripts into an end-to-end workflow using Ploomber
⚡ Building a FastAPI web application to serve the recommendation results via API
🐳 Dockerizing the application for easy deployment

Requirements

Python 3.10+ 🐍
Poetry 📦
DuckDB 🦆
Jupyter 💻
Pandas 🐼
Scikit-Learn 🔬
FastAPI ⚡️
Docker 🐳

See the pyproject.toml file for the full list of dependencies.

How to Run it 🛫

Click me

Clone the repository

git clone https://github.com/MagnusS0/movie-rec-system.git

Navigate to the directory where you downloaded the repository

cd movie_rec_system

Run - with Docker 🐳

Remember to add your own API key to .env

docker-compose up --build

Run - locally 💻

Remember to add your own API key to .env

Make sure you have Poetry innstalled in your enviornment

pip install poetry

Install dependencies

poetry lock
poetry install

Build the pipline with Ploomber build

poetry run ploomber build

Run the app

 uvicorn app.app:app

Run the frontend (optional)

Make sure you are in the right dir frontend

streamlit run frontend_app.py

Data 📊

The data is extracted from TheMovieDB API and stored in a DuckDB database movies_data.duckdb. It contains information on movies like title, overview, genres, ratings, etc.

The main tables are:

movies - contains movie info
genres - contains genre definitions
movie_genre_data - joins movies and genres into a single table

Recommendation Methodology 🤖

The movie recommendation system is built using TF-IDF (Term Frequency-Inverse Document Frequency) and cosine similarity. Essentily building a content filtering reccomendation system.
TF-IDF is used to convert the movie (overviews+ (genres*2)) into numerical vectors, representing the significance of specific terms in each movie’s overview. Then, cosine similarity is computed between these vectors to determine the similarity between different movies. Based on this similarity score, the system recommends movies that are most similar to the given input movie title.

Modules

frontend/frontend_app.py contains the Streamlit application code
app/app.py - contains the FastAPI application code
app/recommender.py - generates movie recommendations
app/recommenderhelper.py - contains helper functions for the recommender
etl/extract.py - extracts data from API
etl/eda.ipynb - notebook for exploratory data analysis
products/ - contains notebooks packaged by Ploomber
tests/ - contains tests for the application

Results

Running the application provides movie recommendations in JSON format for a given movie title. It also returns metrics on the popularity, ratings, and vote count of the recommendations.

Sample Output:

{
  "movie": "oppenheimer",
  "recommendations": [
    "schindler's list",
    "resistance",
    "to end all war: oppenheimer & the atomic bomb",
    "midway",
    "1917",
    "emancipation",
    "13 hours: the secret soldiers of benghazi",
    "defiance",
    "the imitation game",
    "hacksaw ridge"
  ],
  "metrics": {
    "popularity": 373.829,
    "vote_avg": 0.834,
    "vote_count": 6699.44
  }
}

Credits 👏

This project was created by @MagnusS0

Guided by: Ploomber's Movie Recommendation Project

Powered by:

TheMovieDB API
Ploomber
FastAPI
DuckDB
Poetry
Docker

License 📄

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

I have modified the original code/structure from Ploomber's blueprint, while keeping some parts the same. Thank you to Ploomber for making their blueprint openly available!

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
app		app
etl		etl
frontend		frontend
products		products
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml
pipeline.yaml		pipeline.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommendation System

Table of Contents

Description

Requirements

How to Run it 🛫

Run - with Docker 🐳

Run - locally 💻

Data 📊

Recommendation Methodology 🤖

Modules

Results

Credits 👏

License 📄

About

Releases

Packages

Languages

License

MagnusS0/movie-rec-system

Folders and files

Latest commit

History

Repository files navigation

Movie Recommendation System

Table of Contents

Description

Requirements

How to Run it 🛫

Run - with Docker 🐳

Run - locally 💻

Data 📊

Recommendation Methodology 🤖

Modules

Results

Credits 👏

License 📄

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages