Code used to perform some Hadoop job predictions experiments using OpenStack Sahara.
The master and slaves of the clusters had a flavor with the following configurations:
- 2 VCPUs
- 45 GB HD
- 4 GB RAM
- 4 GB swap
To run this experiment first you must:
- Get the input
- Compile classes in source folder and get the jar file. Or get the jar file avalaible in this repository.
- Put what is need in Sahara
- Make a Json configuration file with the same structure of "configuration_default.json"
- Let this awesome experiment running and go have some good time, it will email you when it's done ;)
- Generate graphs for a more visual result
Input file used in the experiment can be accessed at this link.
This file was generate by TeraSortGen of Hadoop 1.2.1 examples, doing the following command:
bin/hadoop jar hadoop-examples-1.2.1.jar terasortgen 50000000
If you want to generate the file yourself, you must:
- Install hadoop 1.2.1
- Unpack the downloaded Hadoop. Edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
- Then run :
bin/hadoop jar hadoop-examples-1.2.1.jar terasortgen 50000000
- In case of doubts: more information about TeraSortGen here
To compile the classes you should:
- have hadoop 2.6.0 installed, you can have more information about how to do it here
- Download the source folder, and put it in the same directory hadoop is installed(you can put the source folder somewhere else, but it makes it easier if everything is in the same place, and you can delete it when you're done if you don't want this in the hadoop folder)
- Once you got the source folder(and is in the same folder of hadoop), compile the classes with thess commands:
```
$export JAVA_HOME=/usr/java/default
$export PATH=$ {JAVA_HOME}/bin:${PATH}$export HADOOP_CLASSPATH=$ {JAVA_HOME}/lib/tools.jar
$ bin/hadoop com.sun.tools.javac.Main source/*.java
```
- Now create the jar running: ```
$jar cf experiment.jar source/*.class
```
If you have any doubts about topics 2 and 3 you can have more information about it here.
- Create a key_pair, if you already have one you can use it (The local path of your public and private key will be needed)
- You'll have to put the jar as a job binary in Sahara and create a job template of type JavaAction for each job(PiEstimator, TeraSort and WordCount). You can have acces to a similar proccess here.
- You'll have to create a master and woker node group template, and a cluster template with(3,4,5,...10) nodes. A similar proccess can be seen here.
- create a volume and put the 5GB file in it. You can contact me if you need help in this proccess, I plan to do some post about it, and when I do I'll put in here!
You can get all these informations trough Horizon, except: public_keypair_path, private_keypair_path and private_keypair_name that only you have access.
Now should all be ready to run ❤️! You can run this experiment in 2 different ways:
- Running
$python runExperiment.py <number of executions> <configuration path> <output file name>
With number of executions = 8 - Running
$python runExperimentIndividuall.py <number of executions> <number of cluster nodes> <configuration path> <output file name>
With number of executions = 8 and number of cluster = [3,10]
Now that you have the ouput files, the final step is generate the graphs.
If you used runExperimentIndividually.py you must concatenate all files in one, you can do that by:
$ cat <output_1_node> <output_2_nodes> <output_3_nodes> ... > output_exp
ATENTION: before run scripts change the input_file and output_file names.
Also check if files are in the same folder or change path in the beggining of the script with the command:
setwd("your_path")
Then, go to analysis folder and do the following:
- Execute filtrate.R. It will generate a new file named as output_name.
- Run KNN.R in the file previously generated, and it will generate a new file.
- Run graphs_cost.R and graphs_prediction.R with the input = KNN.R output. They will generate graphs in pdf format.
And now you have some awesome graphs 😎 !!!