Prajwal Rao | prajwal.rao@outlook.com.au
The code has been written in a python notebook(.ipynb) in Google Colab for better explainability. Highly recommended to upload this to a Google Colab workspace (https://colab.research.google.com/) and run it.
In case an environment with the below specs is available, the python source file can be used.
- Java 8: openjdk-8-jdk-headless (https://packages.debian.org/stretch/openjdk-8-jdk-headless)
- Apache Spark 2.4.5: spark-2.4.5-bin-hadoop2.7.tgz (https://spark.apache.org/downloads.html)
The given dataset has been uploaded to the repository as well.