Skip to content

Example programs with Prince

goossaert edited this page Sep 14, 2010 · 2 revisions

This section presents the example programs coded with Prince. Along with the classic word count, you can find a total count, a merge sort, and even single source shortest path. You are more than welcome to add your own example programs: drop me an email and I’ll put them here.

Word count | sources

Count the number of words in the given input file(s).

Total count | sources

Count the total number of words in the given input file(s), while balancing the work load amongst slave processes. The one iteration solution that consists in reducing all items to the same key requires a heavy work load on only one reducer. This solution balances the work load amongst possibly multiple reducers, however it requires two iterations.

Import count | sources

Use the word count example to show how to import Python files in a program that uses Prince.

Merge sort | sources and data sets

This is an implementation of a distributed merge sort on integer numbers. It is quite inefficient because of Python, because Prince handles keys and values as strings, and because reducers have to handle increasing number of values as the sort goes on. However, it is interesting as it shows how to code such a sort algorithm using the MapReduce paradigm. O(n lg n) iterations are necessary as for any merge sort. Two additional iterations are used, one to detect termination, and another one to write the result file. Along with the source code, data sets of three different sizes are provided.

Dijkstra’s single-source shortest path | sources and data sets

Explore a graph considering all weights are one, and find all shortest distances from a given source node. Each iteration requires three MapReduce tasks, one to compute the new frontier, another one to compute the changes in distances, and a last one to check if the search is over.

PageRank algorithm | sources and data sets

Compute the PageRank values for the nodes of a graph. The algorithm uses a uniform probability distribution to set the initial values: each value is equal to 1 / N, where N is the number of nodes in the whole graph. Dandlings nodes (ie: nodes that do not have any out-links, and therefore have an empty adjacency list) are connected to the whole graph in order to take advantage of their PageRank values. The computation of the PageRank values is stopped when the error in quadratic norm converges under a chosen threshold.