This Jupyter Notebook demonstrates how to generate product recommendations for Amazon apparels using content-based recommendation techniques. We explore different text-based approaches to create meaningful recommendations for users based on the textual data associated with each product. The following approaches are covered:
- Bag of Words (BoW)
- Term Frequency - Inverse Document Frequency (TF-IDF)
- Word2Vec
- Weighted Word2Vec
The dataset used in this project consists of Amazon apparel product information, including features such as ASIN, brand, color, image URL, product type, title, and formatted price. The dataset is available in the form of a JSON file.
To run this Jupyter Notebook, you need to have the following Python libraries installed:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- nltk
- gensim
- PIL
The notebook is divided into the following sections:
- Data Loading: Load the dataset using pandas and explore the features.
- Data Preprocessing: Clean and preprocess the data by removing duplicates, handling missing values, and reducing the dataset size.
- Feature Engineering: Extract useful features from the data and create a clean and structured dataset.
- Recommendation Techniques: Implement and compare different recommendation techniques based on textual data, such as BoW, TF-IDF, Word2Vec, and Weighted Word2Vec.
To use this notebook, download the dataset and update the file path in the notebook accordingly. Follow the steps in each section and execute the code cells to generate recommendations for Amazon apparel products using the various techniques.
This project is inspired by the "Amazon Fine Food Reviews" dataset from Kaggle and the "Applied AI Course" by Applied Roots.