This repo is aimed at working on the grouplens data to solve simple problems using the different applications in the Hadoop Ecosystem.
The big data tests are based on the open dataset found in ml-100k. The idea behind using only 100K records is that since it is being tested on an individial machine, it makes it simple for computation and validation.
The Big data setup was done using the Hortonworks Sandbox, setup on a Oracle VM.