There is an unprecedented amount of information on the internet that could usefully be harvested in order to build social science research datasets.
This half-day course will showcase suitable techniques for web scraping.
The value, logic and process of capturing data stored on websites will be described in detail, and practical examples and exercises will be demonstrated using the Python programming language.
It is most suited to empirical social science researchers but will be of value to researchers from a wide range of disciplines (e.g., digital humanities).
This repository houses the materials underpinning a half-day SGSSS course on web scraping run by Dr Diarmuid McDonnell, University of the West of Scotland. The course was first run on 2024-06-05.
The course programme can be viewed here.
The training materials can be found in the following folders:
- code - Jupyter Notebooks containing executable Python code for the web scraping lessons.
- installation - Guidance on installing Python Jupyter Notebooks.
- presentations - PDF versions of the course lectures.
- reading - lists of interesting and relevant web scraping online articles.
I am grateful to the Scottish Graduate School of Social Sciences (SGSSS) for funding this course and its continued committment to high quality methods training for social scientists.
Please do not hesitate to get in contact if you have queries, criticisms or ideas regarding these materials: Dr Diarmuid McDonnell