-
Notifications
You must be signed in to change notification settings - Fork 54
Launch Your Mapreducer
The mincepie.launcher
module provides multiple ways to start a mapreduce process. To run a script, import/implement everything in your python script, and then call launcher.launch()
, preferably at the end of the python file:
if __name__ == "__main__":
mincepie.launcher.launch()
The multiple arguments of mincepie are given by commandline arguments, and are explained below.
These four arguments are specified by --mapper
, --reducer
, --reader
, and --writer
flags. For example, use
--mapper=MyAwesomeMapper
to use your own awesome mapper (you'll have to register it in your code, see the mapper and reducer documentation).
Alternatively, you can use REGISTER_DEFAULT_MAPPER
(and REDUCER ... as well) to specify the default mapper to use, so you do not need to explicitly specify them in the commandline arguments.
You can specify your input using the --input
flag. It can be a string that your reader understands - for example, if the reader is BasicReader
, you should pass a file pattern that is understandable by the python glob module.
Similarly, you can specify your output using the --output
flag.
If you just want to run Mapreduce on one machine, you can use the --launch=local
flag. In this case, python runs the Mapreduce server, and spawns a set of subprocesses as workers to carry out the map and reduce operations. The number of workers is specified by the --num_clients
flag. For example, to run wordcount locally with 2 clients and print results on the screen, run:
python wordcount.py --num_clients=2 --input=zen.txt
If you have OpenMPI and mpi4py installed, you can use mpirun -n num_mpi_nodes
to start your script with command-line argument --launch=mpi
. In this case, the MPI root node (rank=0) will serve as the Mapreduce server, and other nodes as the Mapreduce workers. For example, to run wordcount with 1 server and 2 clients with mpi, run:
mpirun -n 3 python wordcount.py --input=zen.txt --launch=mpi
If you have slurm on your cluster, you can use --launch=slurm
to start the server and submit a set of jobs to the slurm queue as workers. I am not sure if this will work universally for all cluster settings (I made it mainly for the slurm setting at ICSI), but you can check out the docstring of launch.py for possible arguments for slurm.
To be written.