Skip to content

posaninagendra/goalexa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

goalexa

A concurrent go crawler.

Dependencies

Languages

Packages

Crawler

The crawler reads the alexa-1m website list and crawls the data and saves them in boltdb database. Using the cronjob we run the crawler daily and collect the data. Steps to run the crawler are given below.

  1. Load the sites to crawl from the .csv.gz file:
    cd crawler/go/src/goalexa/
    go run main.go load.go goalexa.go cache
    
  2. Start crawling, by defualt the crawler uses 100 parallel jobs but you can specify using -j JOBS (upto 256):
    go run main.go load.go goalexa.go start -j 100
    

About

A concurrent go crawler.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages