Use CrawlerProcess to run multiple scrapy crawlers simultaneously

NathanWorkman · Apr 3, 2018 · a227db1 · a227db1
1 parent a44e48b
commit a227db1
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -67,20 +67,24 @@ Navigate to the django admin to view your results.
 - [ ] Celery Beat - run spiders on a schedule.
 
 #### Spiders
-Want a spider not listed here? Feel free to open a pull request and add it to the list or implement the spider yourself. 
+Want a spider not listed here? Feel free to open a pull request and add it to the list or implement the spider yourself.
+
 - [x] [Stack Overflow](https://www.stackoverflow.com/jobs)
-- [ ] [Indeed](https://www.indeed.com)
-- [ ] [Dice](http://dice.com)
+- [x] [Indeed](https://www.indeed.com)
 - [ ] [Angel.co](https://angel.co/)
 - [ ] [RemotePython](https://www.remotepython.com)
 - [ ] [DjangoJobs](https://djangojobs.net/jobs/)
 - [ ] [DjangoGigs](https://djangogigs.com)
 - [ ] [Jobspresso](http://jobspresso.co)
-- [ ] [Authentic Jobs](http://authenticjobs.com/)
 - [ ] [We Work Remotely](https://weworkremotely.com/)
-- [ ] [Remotive](https://remotive.io)
 - [ ] [Python.org](https://www.python.org/jobs/)
-
+- [ ] [Working Nomads](https://www.workingnomads.co/jobs)
+- [ ] [Remote Work Hub](https://remoteworkhub.com)
+- [ ] [Telecommunity](http://remotejobs.telecommunity.net/#s=1)
+- [ ] [Remote Base](https://remotebase.io/)
+- [ ] [WFH](https://www.wfh.io)
+- [ ] [Remote Ok](https://remoteok.io)
+- [ ] [Remotely Awesome Job](https://www.remotelyawesomejobs.com/remote-django-jobs)
 
 
 

diff --git a/seeker/crawl.py b/seeker/crawl.py
@@ -0,0 +1,12 @@
+from scrapy.utils.project import get_project_settings
+from scrapy.crawler import CrawlerProcess
+
+setting = get_project_settings()
+process = CrawlerProcess(setting)
+# https://doc.scrapy.org/en/latest/topics/api.html#scrapy.crawler.CrawlerProcess
+
+for spider in process.spiders.list():
+    print("Running spider %s" % (spider))
+    process.crawl(spider)
+
+process.start()