Skip to content

Python web scraper takes list of URLs and XPaths and outputs CSV of values

Notifications You must be signed in to change notification settings

hamosapiens/scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Scraper

A Python scraper that takes a list of URLs and XPaths and returns the associated data in CSV format. To use, update the config.yml file with:

  • A source CSV with a column called 'URL' containing full hrefs
  • Fields listing the specific XPaths to search against.
  • Name of desired output CSV

Requirements

  • csv
  • pandas
  • Queue
  • threading
  • urllib2
  • re
  • lxml
  • requests
  • PyYAML

About

Python web scraper takes list of URLs and XPaths and outputs CSV of values

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%