The dataset is IMDB top 250 English movies, it can be downloaded from:
In this dataset there are 250 movies (rows) and 38 attributes (columns).
I have used Rapid Automatic Keyword Extraction (RAKE) library, it is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
This project is Content-based Recommender Using Natural Language Processing (NLP).
Count Vectorizer + Cosine Similarity
- Count Vectorizer : for converting sentences into vectors
- Cosine Similarity : calculates similarity by measuring the cosine of angle between two vectors.