Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 987 Bytes

README.md

File metadata and controls

25 lines (21 loc) · 987 Bytes

goalexa

A concurrent go crawler.

Dependencies

Languages

Packages

Crawler

The crawler reads the alexa-1m website list and crawls the data and saves them in boltdb database. Using the cronjob we run the crawler daily and collect the data. Steps to run the crawler are given below.

  1. Load the sites to crawl from the .csv.gz file:
    cd crawler/go/src/goalexa/
    go run main.go load.go goalexa.go cache
    
  2. Start crawling, by defualt the crawler uses 100 parallel jobs but you can specify using -j JOBS (upto 256):
    go run main.go load.go goalexa.go start -j 100