A spider for crawling images on the website.
# clone git repository
git clone https://github.com/luofei2011/image-spider.git
cd image-spider
# install nodejs packages
npm install
# add test.js
touch test.js
vim test.js
# insert
var Spider = require('./spider');
var spider = new Spider('http://poised-flw.com', {
level: 3,
maxSockets: 4,
downloadImage: true
});
spider.start();
# save & quit
# then. excute this file
node test.js
useAgent
: the ua of spider.
maxSockets
: the concurrent number of spider.
level
: the crawling depth of spider.
onlyHost
: whether the spider only crawl the same domain website, default true
.
downloadImage
: whether download the images, when crawling. default false
.
-
The images src will be written to
$(pwd)/log/images_log
. you can download them usedownload.sh
, or setdownloadImage: true
. -
You can expand this tool to deal with js/css/html etc. files.
If there has any problem, Please let me know. thanks~
You can only use this for learning nodejs.