Skip to content

The project extracts movie data using TheMovieDB API, processes it using TF-IDF and cosine similarity for generating recommendations, and stores the data in a DuckDB database. The system is encapsulated within a FastAPI web application and can be deployed using Docker. It provides movie recommendations in JSON format.

License

Notifications You must be signed in to change notification settings

MagnusS0/movie-rec-system

Repository files navigation

Movie Recommendation System

image

This is a Python project to build a movie recommendation system using data extracted from a movie database API.
The project follows the guided blueprint provided by Ploomber, focusing on writing professional, modular, and well-documented code with thorough docstrings and exception handling within an OOP framework.
Additionally I have added a simple frontend using Streamlit. The entire application is containerized using Docker for easy setup and deployment.

Movie reccomender example (1)

Table of Contents

Description

The project involves the following components:

  • 🎬 Extracting movie data by calling TheMovieDB API
  • 💾 Storing the data in a DuckDB database
  • 📊 Performing exploratory data analysis with SQL in Jupyter Notebooks
  • 🤖 Developing a movie recommendation system that uses TF-IDF and cosine similarity to generate reccomendations
  • 🎞️ Takes a movie title as input and returns similar movie recommendations
  • ⚙️ Packaging the notebooks and Python scripts into an end-to-end workflow using Ploomber
  • ⚡ Building a FastAPI web application to serve the recommendation results via API
  • 🐳 Dockerizing the application for easy deployment

Requirements

  • Python 3.10+ 🐍
  • Poetry 📦
  • DuckDB 🦆
  • Jupyter 💻
  • Pandas 🐼
  • Scikit-Learn 🔬
  • FastAPI ⚡️
  • Docker 🐳

See the pyproject.toml file for the full list of dependencies.

How to Run it 🛫

Click me

Clone the repository

git clone https://github.com/MagnusS0/movie-rec-system.git

Navigate to the directory where you downloaded the repository

cd movie_rec_system

Run - with Docker 🐳

Remember to add your own API key to .env

docker-compose up --build

Run - locally 💻

Remember to add your own API key to .env

  1. Make sure you have Poetry innstalled in your enviornment
pip install poetry
  1. Install dependencies
poetry lock
poetry install
  1. Build the pipline with Ploomber build
poetry run ploomber build
  1. Run the app
 uvicorn app.app:app
  1. Run the frontend (optional)

Make sure you are in the right dir frontend

streamlit run frontend_app.py

Data 📊

The data is extracted from TheMovieDB API and stored in a DuckDB database movies_data.duckdb. It contains information on movies like title, overview, genres, ratings, etc.

The main tables are:

  • movies - contains movie info
  • genres - contains genre definitions
  • movie_genre_data - joins movies and genres into a single table

Recommendation Methodology 🤖

The movie recommendation system is built using TF-IDF (Term Frequency-Inverse Document Frequency) and cosine similarity. Essentily building a content filtering reccomendation system.
TF-IDF is used to convert the movie (overviews+ (genres*2)) into numerical vectors, representing the significance of specific terms in each movie’s overview. Then, cosine similarity is computed between these vectors to determine the similarity between different movies. Based on this similarity score, the system recommends movies that are most similar to the given input movie title.

Modules

  • frontend/frontend_app.py contains the Streamlit application code
  • app/app.py - contains the FastAPI application code
  • app/recommender.py - generates movie recommendations
  • app/recommenderhelper.py - contains helper functions for the recommender
  • etl/extract.py - extracts data from API
  • etl/eda.ipynb - notebook for exploratory data analysis
  • products/ - contains notebooks packaged by Ploomber
  • tests/ - contains tests for the application

Results

Running the application provides movie recommendations in JSON format for a given movie title. It also returns metrics on the popularity, ratings, and vote count of the recommendations.

Sample Output:

{
  "movie": "oppenheimer",
  "recommendations": [
    "schindler's list",
    "resistance",
    "to end all war: oppenheimer & the atomic bomb",
    "midway",
    "1917",
    "emancipation",
    "13 hours: the secret soldiers of benghazi",
    "defiance",
    "the imitation game",
    "hacksaw ridge"
  ],
  "metrics": {
    "popularity": 373.829,
    "vote_avg": 0.834,
    "vote_count": 6699.44
  }
}

Credits 👏

This project was created by @MagnusS0

Guided by: Ploomber's Movie Recommendation Project

Powered by:

TheMovieDB API
Ploomber
FastAPI
DuckDB
Poetry
Docker

License 📄

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

I have modified the original code/structure from Ploomber's blueprint, while keeping some parts the same. Thank you to Ploomber for making their blueprint openly available!

About

The project extracts movie data using TheMovieDB API, processes it using TF-IDF and cosine similarity for generating recommendations, and stores the data in a DuckDB database. The system is encapsulated within a FastAPI web application and can be deployed using Docker. It provides movie recommendations in JSON format.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published