This repository has been archived by the owner on Jun 10, 2024. It is now read-only.
v0.3.2
Scheduler
- The size of task queue is more accurate now, you can use it to determine all done status of scheduler.
Fetcher
- Fix tornado loss cookies while doing 30x redirects
- You can use cookies with cookie header at same time now
- Fix proxy not working bug.
- Enable proxy by default.
- Proxy now support username and password authorization. @soloradish
- Etag and Last-Modified header will be disabled while last crawl is failed.
Databases
- MySQL default engine changed to InnoDB @laapsaap
- MySQL, larger result column size, changed to MEDIUMBLOB(up to 16M) @laapsaap
WebUI
- WebUI will use same arguments as the fetcher, fix proxy not word for webui bug.
- Results will be sorted in the order of updatetime.
One Mode
- Script exception logs would be printed to screen
New Command send_message
You can use the command pyspider send_message [project] [message]
to send a message to project via command-line.
Other
- Using localhosted test web pages
- Remove version specify of lxml, you can use apt-get to install any version of lxml