Skip to content

sorny/galaxusScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GalaxusScrapper

Version 0.1

Galaxusscrapper is web-scrapping spider based on scrapy. Its dockerized and offers a REST api based on scrapyrt for convenience :)

Main Features:

  • Get the latest 250 discounted products on galaxus.ch, their current price and other metadata Example response:
{
	"status": "ok",
	"items": [
		{
			"name": <name of the product>,
			"product_type": <product category or type>,
			"price": <current price>,
			"orig_price": <original price>,
			"discount": <discount in %>,
			"image_src": <image src>,
			"link": <link to article>
		},
        ...
	],
	"items_dropped": [
	],
	"stats": {
	},
	"spider_name": "galaxus"
}

Tech

GalaxusScrapper uses open source libs and open data to work properly:

  • scrapy - a fast high-level web crawling & scraping framework for Python
  • scrapyrt - Scrapy realtime
  • Unidecode - ASCII transliterations of Unicode text

Installation

  1. Build the docker container
docker build -t galaxusscrapper .
  1. Run the container
docker run -d \
  --name galaxusscrapper \
  -p 9080:9080 \
  galaxusscrapper:latest
  1. Enjoy scrapping and gettin the latest discounted products :)
curl "http://localhost:9080/crawl.json?start_requests=true&spider_name=galaxus"

Have fun and use with care, don't flood galaxus with price polls every second! Pretty please!

License

MIT

Free Software, Hell Yeah!

About

GalaxusScrapper for discounted products

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published