Repository for MTA final Msc. project: Distributed RNTN.
The purpose of this project is to implement the Recurssive Neural Tensor Network (RNTN) for sentiment analysis as described in the paper by R. Socher in a distributed manner using Apache Spark.
We are following the Downpour paradigm described by Jeffrey Dean from google and implemented by Dirk Neumann's DeepDist project.
Please bare in mind: This is a work in progress! This is, by no means, a download-and-run project.
-
RNTN
-
Download/clone the forked semantic-rntn project to every node on your cluster. This is based on the original semantic-rntn project. The only difference is that I have taken the existing project and turned it into a module, thus enabling it to be installed and managed on all nodes of the cluster.
-
Install by running:
python setup.py install
-
DeepDist
-
At the moment, some updates are needed in order to run RNTN using DeepDist. Those updates are available from my forked Deepdist project. Until my pull requests are approved, Download/clone the forked DeepDist project to every node on your cluster.
-
Install by running:
python setup.py install
-
Spark
-
Follow the instructions on Downloading and installing Spark from the documentation. Make sure you know the paths to pyspark and py4j.
-
rntn-spark
-
Download/clone the rntn-spark project (this).
-
In the configuration file: update the paths to Spark's python and py4j paths and set the app name.
-
Update the sparkrunner.sh script with your master address and port.
-
Run:
```sh sparkrunner.sh``
Please use github's issues to report troubles.