Skip to content

How to run a program

winddd edited this page Feb 2, 2018 · 4 revisions

Run a program in husky-45123 is similar to husky. You need to start a master (which is ClusterManagerMainWithContext here) and then one or more workers (the applications). The exec.sh script is used to launch multiple workers on multiple machines using pssh. A template can be found here.

Here is an example about how to run a program in my cluster.

Assume you have followed the readme to build an application.

Now go to your project home directory (e.g., husky-45123/).

Add a machine file named machine.cfg:

proj5
proj6
proj7
proj8
proj9

Save and exit.

Create an exec.sh file:

MACHINE_CFG=machine.cfg
time pssh -t 0 -P -h ${MACHINE_CFG} -x "-t -t" "export LIBHDFS3_CONF=/data/opt/course/hadoop/etc/hadoop/hdfs-site.xml  \
    && cd /data/opt/tmp/yuzhen/tmp/husky-45123  \
    && ls ./ debug/ conf/ > /dev/null \
    && ./$@"

Note that this is my setting. Change it accordingly for your environment (e.g. the project home path, etc.). The ls command is used to refresh the folder as I am using an NFS in which worker may not be able to see the latest files without refreshing.

Note that the mf_als only exists in the dev branch of this project, so use git checkout to switch to that branch(or you can just use the dev branch to build the whole project at the very beginning).

Make sure you have password-free access to all workers. To achieve this, you should have an account on each of the workers first. Then:

  1. Use ssh-keygen to generate your ssh public key if you haven't id_rsa.pub in your ~/.ssh directory;
  2. Use ssh-copy-id to copy your public key to other workers. For example, ssh-copy-id jzhang@proj5.

Then, in one console, run:

./debug/ClusterManagerMainWithContext -C examples/mf_als/als.conf

In another console, run:

./exec.sh debug/ALS -C examples/mf_als/als.conf

Make sure you have built ALS and ClusterManagerMainWithContext in the debug/ folder. You can also change the configuration file in examples/mf_als/als.conf accordingly.

Clone this wiki locally