This repository has been archived by the owner on Jun 10, 2024. It is now read-only.
Releases: binux/pyspider
Releases · binux/pyspider
First PyPI Release
- A lot of bug fixed.
- Make pyspider as a single top-level package. (thanks to zbb, iamtew and fmueller from HN)
- Python 3 support!
- Use click to create a better command line interface.
- Postgresql Supported via SQLAlchemy (with the power of SQLAlchemy, pyspider also support Oracle, SQL Server, etc).
- Benchmark test.
- Documentation & tutorial: http://docs.pyspider.org/
- Flake8 cleanup (thanks to @jtwaleson)
Base
- Use messagepack instead of pickle in message queue.
- JSON data will encoding as base64 string when content is binary.
- Rabbitmq lazy limit for better performance.
Scheduler
- Never re-crawl a task with a negative age.
Fetcher
proxy
parameter supportip:port
format.- increase default fetcher poolsize to 100.
- PhantomJS will return JS script result in
Response.js_script_result
.
Processor
- Put multiple new tasks in one package. performance for rabbitmq.
- Not store all of the headers when success.
Script
- Add an interface to generate taskid with task object.
get_taskid
- Task would be de-duplicated by project and taskid.
Webui
- Project list sortable.
- Return 404 page when dump a not exists project.
- Web preview support image
First Working Release
Base
- mysql, mongodb backend support, and you can use a database uri to setup them.
- rabbitmq as Queue for distributed deployment
- docker supported
- support for Windows
- support for python2.6
- a resultdb, result_worker and WEBUI is added.
Scheduler
- cronjob task supported
- delete project supported
Fetcher
- a phantomjs fetcher is added. now you can fetch pages with javascript/ajax technology!
Processor
send_message
api to send message to other projects- now you can import other project as module via
from projects import xxxx
@config
helper for setting configs for a callback
WEBUI
- a css selector helper is added to debugger.
- a option to switch JS/CSS CDN.
- a page of task history/config
- a page of recent active tasks
- pages of results
- a demo mode is added for http://demo.pyspider.org/
Others
- bug fixes
- more tests, coverage is used.
First Runnable Release
finish a basic runnable system with:
- sqlite3 task & project database
- runnable scheduler & fetcher & processor
- basic dashboard and debugger