Skip to content

Launch Your Mapreducer

Yangqing Jia edited this page Aug 13, 2013 · 4 revisions

The mincepie.launcher module provides multiple ways to start a mapreduce process. To run a script, import/implement everything in your python script, and then call launcher.launch(), preferably at the end of the python file:

if __name__ == "__main__":
    mincepie.launcher.launch()

The multiple arguments of mincepie are given by commandline arguments, and are explained below.

Mapper, Reducer, Reader and Writer

These four arguments are specified by --mapper, --reducer, --reader, and --writer flags. For example, use

--mapper=MyAwesomeMapper

to use your own awesome mapper (you'll have to register it in your code, see the mapper and reducer documentation).

Alternatively, you can use REGISTER_DEFAULT_MAPPER (and REDUCER ... as well) to specify the default mapper to use, so you do not need to explicitly specify them in the commandline arguments.

Input and Output

You can specify your input using the --input flag. It can be a string that your reader understands - for example, if the reader is BasicReader, you should pass a file pattern that is understandable by the python glob module.

Similarly, you can specify your output using the --output flag.

Launch your Mapreduce locally

If you just want to run Mapreduce on one machine, you can use the --launch=local flag. In this case, python runs the Mapreduce server, and spawns a set of subprocesses as workers to carry out the map and reduce operations. The number of workers is specified by the --num_clients flag. For example, to run wordcount locally with 2 clients and print results on the screen, run:

python wordcount.py --num_clients=2 --input=zen.txt

Launch your Mapreduce with MPI

If you have OpenMPI and mpi4py installed, you can use mpirun -n num_mpi_nodes to start your script with command-line argument --launch=mpi. In this case, the MPI root node (rank=0) will serve as the Mapreduce server, and other nodes as the Mapreduce workers. For example, to run wordcount with 1 server and 2 clients with mpi, run:

mpirun -n 3 python wordcount.py --input=zen.txt --launch=mpi

Launch your Mapreduce with slurm

If you have slurm on your cluster, you can use --launch=slurm to start the server and submit a set of jobs to the slurm queue as workers. I am not sure if this will work universally for all cluster settings (I made it mainly for the slurm setting at ICSI), but you can check out the docstring of launch.py for possible arguments for slurm.

Less Frequently Used Flags

To be written.