Normalisation scheme in simplified PageRank algorithm #76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Joel,
from my point of view your implementation of the simplified PageRank algorithm does not follow the protocol outlined in the book. I only have the first edition at hand, where it says:
There is a total of 1.0 PageRank in the network.
this should be true even at the end of the calculation, but is violated by your implementation. The total PageRank at the end of the calculation in your script amounts to 1.0425
Initially this PageRank is equally distributed among nodes.
At each step, a large fraction of each node's PageRank is distributed evenly among its outgoing links.
At each step, the remainder of each node's PageRank is distributed evenly among all nodes.
This point is missing in your implementation
I am proposing a version which implements point 4 in a straight forward way. There might be more elegant ways, but this one is easy to understand.
The results get numerically very close to the implementation in networkX. See numbers below.
Original results
user id: PageRank
0: 0.1,
1: 0.1,
2: 0.1,
3: 0.1,
4: 0.14250000000000002,
5: 0.1,
6: 0.1,
7: 0.1,
8: 0.1,
9: 0.1
NetworkX results
import networkx as nx
G = nx.DiGraph()
G.add_nodes_from([user.id for user in users])
G.add_edges_from(endorsements)
pr_nx=nx.pagerank(G, 0.85)
user id: PageRank
0: 0.09499151348469306,
1: 0.10547758964858775,
2: 0.10547758964858775,
3: 0.09499151348469306,
4: 0.1593177423515437,
5: 0.10200959185661473,
6: 0.07857495588955458,
7: 0.07857495588955458,
8: 0.10200959185661472,
9: 0.07857495588955458
New results
0: 0.0949906958425375,
1: 0.10547659652084887,
2: 0.10547659652084887,
3: 0.0949906958425375,
4: 0.1593168333463994,
5: 0.10201123958329422,
6: 0.07857536758674652,
7: 0.07857536758674652,
8: 0.10201123958329422,
9: 0.07857536758674652