Skip to content

Commit

Permalink
Use CrawlerProcess to run multiple scrapy crawlers simultaneously
Browse files Browse the repository at this point in the history
  • Loading branch information
NathanWorkman committed Apr 3, 2018
1 parent a44e48b commit a227db1
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 6 deletions.
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,20 +67,24 @@ Navigate to the django admin to view your results.
- [ ] Celery Beat - run spiders on a schedule.

#### Spiders
Want a spider not listed here? Feel free to open a pull request and add it to the list or implement the spider yourself.
Want a spider not listed here? Feel free to open a pull request and add it to the list or implement the spider yourself.

- [x] [Stack Overflow](https://www.stackoverflow.com/jobs)
- [ ] [Indeed](https://www.indeed.com)
- [ ] [Dice](http://dice.com)
- [x] [Indeed](https://www.indeed.com)
- [ ] [Angel.co](https://angel.co/)
- [ ] [RemotePython](https://www.remotepython.com)
- [ ] [DjangoJobs](https://djangojobs.net/jobs/)
- [ ] [DjangoGigs](https://djangogigs.com)
- [ ] [Jobspresso](http://jobspresso.co)
- [ ] [Authentic Jobs](http://authenticjobs.com/)
- [ ] [We Work Remotely](https://weworkremotely.com/)
- [ ] [Remotive](https://remotive.io)
- [ ] [Python.org](https://www.python.org/jobs/)

- [ ] [Working Nomads](https://www.workingnomads.co/jobs)
- [ ] [Remote Work Hub](https://remoteworkhub.com)
- [ ] [Telecommunity](http://remotejobs.telecommunity.net/#s=1)
- [ ] [Remote Base](https://remotebase.io/)
- [ ] [WFH](https://www.wfh.io)
- [ ] [Remote Ok](https://remoteok.io)
- [ ] [Remotely Awesome Job](https://www.remotelyawesomejobs.com/remote-django-jobs)



Expand Down
12 changes: 12 additions & 0 deletions seeker/crawl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess

setting = get_project_settings()
process = CrawlerProcess(setting)
# https://doc.scrapy.org/en/latest/topics/api.html#scrapy.crawler.CrawlerProcess

for spider in process.spiders.list():
print("Running spider %s" % (spider))
process.crawl(spider)

process.start()

0 comments on commit a227db1

Please sign in to comment.