Home

So, why write a spider in Node.js?

Well, really there might not be a good reason to do this. I woke up one Saturday morning and decided it might be fun to explore what was possible. Node.js might lend itself to some interesting new emergent behavior in web crawling …and then again it might not.

Caveat Emptor

I write web applications and do not have a background in data mining. I did not start this adventure by researching web crawler implementations. I simply defined what I thought might be useful and went about implementing that functionality.

Phase 1

Node Spider should accept a URL and identify all the that links on that page.

Known Deficiencies

Spider does not have a user agent string.
Does not respect robots.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally