In this Project we analyse and preprocess the Book Crossing Dataset collected by Cai-Nicolas Ziegler and apply Machine Learning to recommend different books from a book you previously read. Whole code below is in Python using various libraries. Open source library Scipy is used for preprocessing and Scikit-Learn is used for creating the model.
-
Total approach towards the project can be seen on kaggle
- Exploratory Data Analysis
- Different ways of building Recommendation system
- Model and flask Api
- Packages : Pandas, Numpy, Matplotlib, Seaborn, Word-cloud, Scikit-Learn etc.
- Dataset : https://www.kaggle.com/mohitnirgulkar/book-recommendation-data
-
Visualising Explicit Rating Counts (for 1-10 rating value)
-
Visualising top 30 most read books
-
Visualising top 30 most read books with there average ratings
-
Visualising top 30 years with most book being published
-
Visualising top 30 authors with most books
-
Visualising the age distribution of the users
-
Extra Analysis
- Some of the Plots and wordclouds which aren't present here can be found in Notebook
-
Popularity-based
These simply recommend the most popular items to users. Popularity-based systems are simplest of all and have minimal computational requirements. However, as these systems do not make personalized recommendations based on specific user’s likes & behaviors, they tend to be less accurate than content-based or collaborative filtering based systems. This type of recommendation is performed in the notebook, the output i.e. 10 most popular books is
-
Content-based
Content-based systems depend on external information for creating user and item profiles and this information might not be easily available. Also, these do not take users behavioral information into account and discount the fact that user interest and preferences may change over time.
-
Collaborative Filtering
-
Memory-based/ Neighborhood-based
Memory Based recommendation systems can again be divided into two categories i.e. User Based and Item Based which can easily be implemented using similarity measure like Cosine similarity, Pearson similarity are used to find most similar items according to the Data
-
Model-based/Matrix Factorization
Model-based Collaborative Filtering approach employs dimensionality reduction techniques like matrix factorization (Singular Value Decomposition — SVD, Principal Component Analysis- PCA and Latent Factor models) to discover hidden concepts and their relationship with users and items.
-
Hybrid Approach
Memory-based and model-based collaborative filtering approaches can be combined in practice to exploit the benefits each of the approaches provide. Also, content-based and collaborative filtering approaches can be combined in various ways to achieve greater synergies between them.
-
-
Model :-
Scikit-Learn's Nearest Neighbors model is build under collaborative filtering approach. Also we use the Scientific computing library for creating compressed sparse row matrix(csr matrix) from pivot table and is used for modelling with a brute algorithm and cosine as metric
-
Flask Api :-
- Clone the Project and download Book_names_with_urlM.csv from the output section and put it in the directory containing model
git clone https://github.com/raklugrin01/Book-Recommendation-with-EDA
- Install Flask
pip install flask
- Run the python file
python api.py
-
Testing result :-
-
We can see that for a Book Title as input the api returned us 10 books as the recommendations
Please do ⭐ the repository, if it helped you in anyway.