cd Process/Demeter
HDFS configurations are saved in new-gc.conf. Jars include the Machine Learning libraries and also the pipeline itself (by Prem).
Run:
sh set_java_path.sh
Place data to hdfs:
hdfs fs -put [DATA.csv] [HDFS_PATH]
To run spark on HDFS for data with specified number of cores:
mprof run --include-children python sparkSubmitYarn.py [core] [data_name]
This will generate the mprof file which contains the memory usage per second. Elapsed time and CPU time can be obtained from mprof files. Elapsed time will also be generated via python time package and written into TimeCal folder.
To get disk usage:
sh get_disk_usage.sh
To check if job finished for data, core, date:
python checkResults.py
and answer the questions as poped up in the terminal.
cd Process/Minerva/
Install LargeGOPred (https://github.com/linhuawang/LargeGOPred).
Run LargeGOPred for all data, core, round of experiments.
To get disk usage:
python minerva_disk.py [data_path] [data_name]
To get memory usage and computational time:
python minerva_memory_time.py [stdout_path] [arff_path] [data_name]
cd Analysis/
-
Computational time
i. Use notebook "Computational time.ipynb" to analyze computaitonal time.
ii. Minerva time usage is saved in: Minerva_results/minerva-all-usage.csv.
iii. PNGs for all time usage is saved in Minerva_results/individual_data_time.
iv. PDFs for figure in the paper is saved at paper_figures/Figure_2a/b.pdf. -
Disk usage
i. Minerva disk usage is manually calculated using linux du -hs command.
ii. Demeter disk usage is calculated using 'hdfs dfs -ls -R' command.
iii. Jupyter notebook "Disk usage.ipynb" is used to generate the barplot.
iv. Plot saved in paper_figures/Figure3_disk_usage.pdf. -
Memory usage
i. For Minerva, raw data is saved in Minerva_results/minerva-all-usage.csv, unit is MB. Plot is saved as paper_results/Figure_4a_Minerva_memory.pdf.
ii. For Demeter, raw data is saved in Demeter_results/demeter_spark_comprehensive_stats_all_data. csv, unit is MB. Plot is saved as paper_results/Figure_4b_Demeter_memory.pdf. -
Classifiers
i. For Minerva, classifiers and parameters are saved in classifiers.txt.
ii. Spark classifiers are corresponding classifiers from the mllib.