GitHub - bsgreenb/mechanize_crawler: Web Crawler written in Ruby Mechanize

README¶ ↑

Synchronous web crawler written in Ruby.

Threading or queueing so it can go faster
Limits on the depth of the crawl, to avoid endless spider traps.
Better duplication avoidance (e.g. www vs not).
Offline tests (save a wget crawl of the site, and use Fakeweb during tests) to avoid hitting the site live during tests.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
test		test
vendor/assets		vendor/assets
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.rdoc		README.rdoc
Rakefile		Rakefile
config.ru		config.ru
lets_roll		lets_roll