This application would run on an HDFS cluster and output a list of webpages ranked in order of their calculated pageranks. The crawled data would be used from common crawl repository on AWS. The project uses Apache Spark's GraphX API. The sample input files are taken from hyperlink graph provided by Web Data Commons at http://webdatacommons.org/hyperlinkgraph/
-
Notifications
You must be signed in to change notification settings - Fork 0
gohilankit/PageRankCluster
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published