The following Algorithms of Data Mining are implemented using Pyhton Spark (pySpark). The data here is very big with millions of rows in the files.
HW1: How Apache Spark Works, apache spark transformations and actions with RDD.
HW2: SON Algorithm: The SON algorithm impart itself well to a parallel – computing environment.
HW3:
Part-1
implement Min-Hash and Locality Sensitive Hashing (LSH) to find similar businesses efficiently.
Part-2
Recommended System (Context Based Recommended System as well as Collaborative Filtering Recommended System)
HW4: K-Means as well as BFR Algorithms.