Skip to content

Latest commit

 

History

History
176 lines (131 loc) · 7.37 KB

README.md

File metadata and controls

176 lines (131 loc) · 7.37 KB

Station to Station:
Linking and Enriching Historical British Railway Data

DOI link to this repository Link to paper in Computational Humanities Research proceedings 2021 Link to dataset in the British Library Research Repository License

Table of contents


Introduction

This repository provides underlying code and materials for the paper 'Station to Station: Linking and Enriching Historical British Railway Data'.

The StopsGB dataset is available on the British Library research repository.

Installation

conda create -n py39station python=3.9
  • Activate the environment:
conda activate py39station
  • Clone the repository:
git clone https://github.com/Living-with-machines/station-to-station.git
  • Install the requirements:
cd /path/to/my/station-to-station
pip install -r requirements.txt
  • Install python-levenshtein separately with conda:
conda install -c conda-forge python-levenshtein
  • To allow the newly created py39station environment to show up in the notebooks, run:
python -m ipykernel install --user --name py39station --display-name "Python (py39station)"

Directory structure

Our code assumes the following directory structure:

station-to-station/
├── processed/
│   ├── deezymatch/
│   ├── quicks/
│   ├── ranklib/
│   ├── resolution/
│   └── wikidata/
├── resources/
│   ├── deezymatch/
│   ├── geonames/
│   ├── geoshapefiles/
│   ├── quicks/
│   ├── ranklib/
│   ├── wikidata/
│   ├── wikigaz/
│   └── wikipedia/
├── quicks/
├── wikidata/
├── deezymatch/
└── linking/
    └── tools/

Content overview

This is a summary of the contents of each folder:

  • Resources, inputs and outputs:
    • resources/: folder where resources required to run the experiments are stored.
    • processed/: folder where processed data, resources, and results are stored.
  • Processing code:
    • quick/: code for parsing and processing Quick's Chronology.
    • wikidata: code for processing Wikidata, to be used in the linking experiments.
    • deezymatch: code to create the DeezyMatch datasets and models used for linking.
  • Linking code:
    • linking: code for reproducing the experiments and for linking StopsGB to Wikidata.

Option 1: Reproducing the experiments

To run the linking experiments, follow the instructions in this order:

  1. Prepare the resources → resources readme.
  2. Process Wikidata → Wikidata readme.
  3. Create DeezyMatch datasets and models → DeezyMatch readme.
  4. Reproduce the linking experiments → Readme: reproduce linking experiments.

Option 2: Creating StopsGB from scratch

⚠️ You will only be able to create StopsGB from scratch if you have a copy of the MS Word version of Railway Passenger Stations in Great Britain: a Chronology by Michael Quick.

To create the full StopsGB, follow the instructions in this order:

  1. Prepare the resources folder → resources readme.
  2. Process Wikidata → Wikidata readme.
  3. Create DeezyMatch datasets and models → DeezyMatch readme.
  4. Process Quick's Chronology into StopsGB → Quicks readme.
  5. Resolve and georeference StopsGB → Readme: create StopsGB.

Citation

Please acknowledge our work if you use the code or derived data, by citing:

Mariona Coll Ardanuy, Kaspar Beelen, Jon Lawrence, Katherine McDonough, Federico Nanni, Joshua Rhodes, Giorgia Tolfo, and Daniel C.S. Wilson. "Station to Station: Linking and Enriching Historical British Railway Data." In Computational Humanities Research (CHR2021). 2021.
@inproceedings{lwm-station-to-station-2021,
    title = "Station to Station: Linking and Enriching Historical British Railway Data",
    author = "Coll Ardanuy, Mariona and
      Beelen, Kaspar and
      Lawrence, Jon and
      McDonough, Katherine and
      Nanni, Federico and
      Rhodes, Joshua and
      Tolfo, Giorgia and
      Wilson, Daniel CS",
    booktitle = "Computational Humanities Research",
    year = "2021",
}

Author contributions

  • Conceptualization: Katherine McDonough, Jon Lawrence and Daniel C. S. Wilson.
  • Methodology: Mariona Coll Ardanuy, Federico Nanni and Kaspar Beelen.
  • Implementation: Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen and Giorgia Tolfo.
  • Reproducibility: Federico Nanni and Mariona Coll Ardanuy.
  • Historical Analysis: Kaspar Beelen, Katherine McDonough, Jon Lawrence, Joshua Rhodes and Daniel C. S. Wilson.
  • Data Acquisition and Curation: Daniel C. S. Wilson, Mariona Coll Ardanuy, Giorgia Tolfo and Federico Nanni.
  • Annotation: Jon Lawrence and Katherine McDonough.
  • Project Management: Mariona Coll Ardanuy.
  • Writing and Editing: all authors.

Acknowledgements

Original data from Railway Passenger Stations in Great Britain: a Chronology by Michael Quick. Used with permission from The Railway and Canal Historical Society ©.

Work for this paper was produced as part of Living with Machines. This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.

License

The source code is licensed under MIT License.

Copyright © 2021 The Alan Turing Institute, British Library Board, Queen Mary University of London, University of Exeter, University of East Anglia and University of Cambridge.